Here is the list of the data that Crowl grabs from pages by default:
We’ll probably add more items to this list in the near future, as well as other content extraction methods. Feel free to suggest your ideas!
However, if you want to get some other information that is not in this list, we offer a feature to extract the whole page content.
You can grab the page content with Crowl by adding
--content to your command.
For example, here is how to crawl this website and scrap all its content:
python crowl.py -u https://www.crowl.tech/ -b crowltech --content
This will store the entire source code, in order for you to retrieve informations post-crawl.