Crowl

What is Crowl?

A crawler made by SEOs for SEOs. And this means a lot.

Free and open-source

Crowl is distributed under the GNU GPL v3. This means you can use, distribute and modify the source code for private or commercial use, as long as you share your code under the same licence. This also means we do not offer any warranty.

Designed by SEOs

Most of the people developping this crawler are professionnal SEOs, experts in the technical aspects of the job. We've been using crawlers on a daily basis for years, and we know what to expect from such a tool.

Generic AND customizable

Crowl is quite a basic crawler yet, but still evolving!
We aim at providing state-of-the-art functionality and customization, for a perfect fit whichever kind of website you're working on.

Community-based

Our goal is to provide a smart and efficient tool for all SEOs. Feel free to join our gang of un-paid volunteers!
How to contribute

Python & Scrapy

Crowl is developped using Python, and Scrapy. We chose this language both because we like it and it is widely used. If you can code, please give us a hand!
How to contribute

Open roadmap

We have a loooong list of features in mind for this project. However, your ideas and opinions are welcome to help us prioritize the next feature to develop.
View the backlog and contribute

Features

Works with Windows, Mac OS and Linux

Custom user-agent

Extract all links and anchor texts

Extract page content

Unlimited URLs

Stop and resume crawls

Out of the box CSV export or MySQL storage

Entirely customizable to your own needs

Get Started

Set things up

Crowl works best with Python 3.6+.
We also recommend you use pyenv.

First clone the git repository or download the source-code.

You can then install Python dependencies by executing this command in your terminal:


    pip install -r requirements.txt

And that should be all!

Configure your crawl

Copy the config.sample.ini file to yourproject.ini and set your own settings.
The required settings are PROJECT_NAME and START_URL. You can keep other default settings for now ;)
Check out the docs for more configuration options.

Launch your first crawl

To launch your first crawl with default settings, simply run:


    python crowl.py --conf yourproject.ini

Enjoy!