As it’s based on Scrapy, Crowl offers the option to stop and resume a crawl.
To stop a running crawl, use ctrl
+c
on most UNIX systems.
Be sure to let the crawl stop safely, otherwise you won’t be able to resume.
To resume a crawl, you’ll need to use the output basename (project name + timestamp) that’s logged at the end of the crawl, and the --resume
command line argument.
Here’s an example:
# Launch a crawl
python crowl.py --conf project.ini
# Stop it using ctrl+c
# Resume crawl
python crowl.py --conf project.ini --resume project_20200118-010101