Crowl is capable of extracting all links from your website, which enables you to get data on both your internal and external-bound links.
Simply set LINKS to True in your project file.
Crowl will save links either in a separate CSV file (for CSV mode) or a separate MySQL table (for MySQL mode).
Historically, Crowl removes duplicate links from exports.
This means by default, only the first link from page A to page B is saved.
You can deactivate this behavior if you wish to keep all links by using the LINKS_UNIQUE option in your configuration file.
Crowl will grab all links and grab:
source)target)text)It will also add flags for nofollow links (nofollow) and internal links to pages that are blocked by the robots.txt file (disallow).
Finally, Crowl will also add a weight to each link, stored in the weight column.
The weight associated with each link is calculated using the order of links in the source code: the higher a link, the more weight it gets.
The actual formula is:
weight = 1 - c / n
Where c is the id of the link in the list of links (starts with 0), and n the total number of links on the page.
In the future, we will try and add other methods of calculation.