Crowl is capable of extracting all links from your website, which enables you to get data on both your internal and external-bound links.
--links to your command line.
For instance, here is a simple crawl of this website, with default values, and links extraction:
python crowl.py -u https://www.crowl.tech/ -b crowltech --links
Crowl will grab all links and grab:
It will also add flags for nofollow links (
nofollow) and internal links to pages that are blocked by the robots.txt file (
Finally, Crowl will also add a weight to each link, stored in the
The weight associated with each link is calculated using the order of links in the source code: the higher a link, the more weight it gets.
The actual formula is:
weight = 1 - c / n
c is the id of the link in the list of links (starts with 0), and
n the total number of links on the page.
In the future, we will try and add other methods of calculation.