Crowl is capable of extracting all links from your website, which enables you to get data on both your internal and external-bound links.
Simply set LINKS
to True
in your project file.
Crowl will save links either in a separate CSV file (for CSV mode) or a separate MySQL table (for MySQL mode).
Historically, Crowl removes duplicate links from exports.
This means by default, only the first link from page A to page B is saved.
You can deactivate this behavior if you wish to keep all links by using the LINKS_UNIQUE
option in your configuration file.
Crowl will grab all links and grab:
source
)target
)text
)It will also add flags for nofollow links (nofollow
) and internal links to pages that are blocked by the robots.txt file (disallow
).
Finally, Crowl will also add a weight to each link, stored in the weight
column.
The weight associated with each link is calculated using the order of links in the source code: the higher a link, the more weight it gets.
The actual formula is:
weight = 1 - c / n
Where c
is the id of the link in the list of links (starts with 0), and n
the total number of links on the page.
In the future, we will try and add other methods of calculation.