Installation Guide

Requirements

Python

Crowl runs on Python 3 or above. It works best on UNIX-like systems (Linux and macOS), but will run on Windows too.

To check which version of Python your system is running, open a terminal and execute the following:

python --version  

You should get something like Python 3.*. If that’s the case, you can now install Crowl.

If the output isn’t something like Python 3.*, try this:

python3 --version  

If this didn’t work either, please download and install the latest Python 3 version.

If you do have python3 installed but not as the default python interpreter, here are your options:

We recommend using virtual environments to split your different projects dependecies and avoid conflicts.

You can for instance use pyenv.
Once pyenv is installed, you’ll be able to quickly create environments:

mkdir crowltech  
cd crowltech    
pyenv virtualenv 3.6.4 crowltech  
pyenv local crowltech    
python --version  

Using an alias to set python3 as the default interpreter

You can replace Python 2 as the default Python interpreter on your system by using aliases.
On UNIX-like systems (Linux & macOS), edit your ~/.bash_profile file and add the following:

alias python=python3  
alias pip=pip3  

Save the changes, then run:

source ~/.bash_profile  

Using python3

We really dont advise to do so, but if you don’t want (or can’t) change your default Python interpreter, you can simply replace python and pip commands with respectively python3 and pip3.

A few more tips

You might find that Python can be very useful in a daily basis. Learn a few tips in this post.

Install Crowl

Download the source code

We recommend using git as it will be a lot easier to upgrade.
Simply clone the repository:

git clone https://gitlab.com/crowltech/crowl.git  
cd crowl  

Not using git

If you’re not comfortable using git, you can download a zip archive or a tar.gz archive directly.

In console:

wget https://gitlab.com/crowltech/crowl/-/archive/master/crowl-master.tar.gz  
tar -xzvf crowl-master.tar.gz  
mv crowl-master crowl
cd crowl  

Install dependecies

Once into the crowl directory, install dependencies using pip:

pip install -r requirements.txt  

This will download and install all python dependencies.
You are now ready to start crawling.

Optional: download Fasttext language detection model

Crowl can try and determine the language of content on each crawled page using Fasttext language identification model.
In order to use this feature, you need to activate it in your config file, and to download said model into the data folder:

mkdir data  
cd data  
wget https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin  

Create your first project configuration file

Copy the config.sample.ini file to yourproject.ini and set your own settings.

The required settings are PROJECT_NAME and START_URL.
The list of all configuration options is available here.

Start crawling

Simply launch Crowl from the command line:

python crowl.py --conf yourproject.ini  

If you kept the default settings, data will be saved to CSV files.

Upgrade Crowl

If you installed Crowl using git, simply download the latest version:

git pull origin master  

If you didn’t use git, save your configuration files, delete any other files and replace with those from the new version.

You might also have to update the Python dependencies by running:

pip install -r requirements.txt  

Remember to checkout the release notes for the list of new features.

Get Connected

  • Buy Me A Coffee