Table of Contents
First, clone the repository to your local machine:
git clone https://github.com/vitorfs/woid.git
Install the requirements:
pip install -r requirements/dev.txt
Apply the migrations:
python manage.py migrate
Load the initial data:
python manage.py loaddata services.json
Finally, run the development server:
python manage.py runserver
The site will be available at 127.0.0.1:8000.
Currently Woid crawl the following services to collect top stories:
- Hacker News
hn
- Reddit
reddit
- GitHub
github
- The New York Times
nytimes
- Product Hunt
producthunt
You can run the crawlers manually to collect the top stories using the following command:
python manage.py crawl reddit
You can pass multiple services at once:
python manage.py crawl reddit hn nytimes
Valid values: hn
, reddit
, github
, nytimes
, producthunt
.
To crawl The New York Times you will need an API key.
You can register one application at developer.nytimes.com.
Product Hunt require an API key to consume their API.
You can register one application at api.producthunt.com/v1/docs
You can set up cron jobs to execute the crawlers periodically. Here is what my crontab looks like:
*/5 * * * * /home/woid/venv/bin/python /home/woid/woid/manage.py crawl reddit hn producthunt >> /home/woid/logs/cron.log 2>&1
*/30 * * * * /home/woid/venv/bin/python /home/woid/woid/manage.py crawl nytimes github >> /home/woid/logs/cron.log 2>&1
The source code is released under the Apache 2.0 license.