Deprecated: AIL v5.0 crawler has been upgraded to Lacus
AIL crawlers are using a splash crawler to fetch and render a domain.
The purpose of this Flask server is to simplify the installation and manage them:
- Create, launch, relaunch splash dockers.
- handle proxies
- check crawler status
git clone https://github.com/ail-project/ail-splash-manager.git
cd ail-splash-manager
./install.sh
./LAUNCH.sh -l
./LAUNCH.sh -k
./LAUNCH.sh -t
The tor proxy from the Ubuntu package is installed by default.
This package is outdated: Some v3 onion address are not resolved.
*/!\ Install the tor proxy provided by The torproject to solve this issue./!*
Note: Ubuntu Install, add torrc in apt sources:
sudo sh -c 'echo "deb https://deb.torproject.org/torproject.org $(lsb_release -sc) main" >> /etc/apt/sources.list.d/tor-project.list'
Once installed, we need to allow all splash dockers to reach this proxy. You can use the configure_tor
script or configure it yourself.
- Install Script
cd ail-splash-manager
./configure_tor.sh
- Manual configuration:
- Allow Tor to bind to any interface or to the docker interface (by default binds to 127.0.0.1 only) in
/etc/tor/torrc
SocksPort 0.0.0.0:9050
orSocksPort 172.17.0.1:9050
- Add the following line
SocksPolicy accept 172.17.0.0/16
in/etc/tor/torrc
(for a linux docker, the localhost IP is 172.17.0.1; Should be adapted for other platform) - Restart the tor proxy:
sudo service tor restart
- Allow Tor to bind to any interface or to the docker interface (by default binds to 127.0.0.1 only) in
- Splash-Manager API key
- Splash-Manager URL
- Number of crawlers to launch
- https://github.com/ail-project/ail-framework/blob/master/HOWTO.md#configuration
Edit config/proxies_profiles.cfg
:
[section_name]:
proxy name, each section describe a proxy.host:
proxy host
(for a linux docker, the localhost IP is 172.17.0.1; Should be adapted for other platform)port:
proxy porttype:
proxy type,SOCKS5
orHTTP
description:
proxy descriptioncrawler_type:
crawler type (tor or i2p or web)
[default_tor] # section name: proxy name
host=172.17.0.1
port=9050
type=SOCKS5
description=tor default proxy
crawler_type=tor
Edit config/containers.cfg
:
[section_name name]:
splash name, each section describe a splash container.proxy_name:
proxy name (defined in proxies_profiles.cfg)port:
single port or port range (ex: 8050 or 8050-8052),
A port range is used to launch multiple Splash Dockerscpu:
max number of cpu allocatedmemory:
max RAM (Go) allocateddescription:
Splash descriptionnet:
network type (bridge, host...)
[default_splash_tor] # section name: splash name
proxy_name=default_tor
port=8050-8052
cpu=1
memory=1
maxrss=2000
description= default splash tor
net=bridge
Go on i2p website and follow the installation instruction
- Edit
config/containers.cfg
:net:
need to be host to work
[default_splash_i2p] # section name: splash name
proxy_name=default_i2p
port=8053-8055
cpu=1
memory=1
maxrss=2000
description=default splash i2p
net=host
- Add a new proxy in
config/proxies_profiles.cfg
:host:
need to be 127.0.0.1 to work
[default_i2p]
host=127.0.0.1
port=4444
type=HTTP
description=i2p default proxy
crawler_type=i2p
-
Edit
/etc/squid/squid.conf
:acl localnet src 172.17.0.0/16 # Docker IP range http_access allow localnet
-
Add a new proxy in
config/proxies_profiles.cfg
:[squid_proxy] host=172.17.0.1 port=3128 type=HTTP description=squid web proxy crawler_type=web
-
Bind this proxy to a Splash docker in
config/containers.cfg
api/v1/ping
api/v1/version
api/v1/get/session_uuid
api/v1/get/proxies/all
api/v1/get/splash/all
api/v1/splash/restart
api/v1/splash/kill