Releases Notes

Here are the detailed release notes for each versions of Crowl.

v0.3.1 (2021-07-27)

Small update to:

  • resolve a bug with blank pages that send a HTTP 200 response code
  • update some outdated dependencies

To update to this version:

git pull origin master   
pip install -r requirements.txt  


v0.3 (2021-04-30)

This new version comes with a few resolved bugs, and some new features. I tried my best to list these below, but I’m quite sure I forgot some ;)

New features Crawl configuration

  • Exclusion pattern: enables you to exclude some URL patterns from the crawl. Learn more.
  • Proxies: you can now provide a list of proxies to be used by the crawler. Learn more.
  • Rotating User-agents: you can provide a list of user-agents to be used by the crawler. Learn more.
  • Disable referer: remove referer from HTTP request headers. Learn more.

Auth

  • New section added to the config files
  • HTTP authentication: simple user/password HTTP authentication. Learn more.

Extraction

Coming soon (hopefully)

  • custom settings
  • include only links
  • list
  • regex & css extractors

Also new As suggested by some, you can now support Crowl on Buy me a coffee!
Either with a one-time paiment or by becoming a member, you’ll help raise funds for this project. And more funds means more time to add awesome features!


v0.2 (2020-02-25)

New features
Configuration file: old global configuration file has been replaced with project-scoped files, and settings are now changed using these project files. See https://www.crowl.tech/documentation/configuration/.
CSV export: a new pipeline to export data into CSV files. See https://www.crowl.tech/documentation/csv/.

Additional data
More data is scraped during crawl:

  • number of title tags
  • number of meta robots tags
  • number of h1 tags
  • number of h2 tags
  • http date (from headers)
  • file size
  • link rel prev & next
  • html lang
  • hreflang tags (exported as JSON)
  • microdata & json-ld markup (using extruct lib)


v0.1 (2018-08-05)

First release


Get Connected

  • Buy Me A Coffee