A generic classifier for texts from the web for official statistics
This repo is part of the WEB-FOSS-NL project on statistical scraping. More info on statistical scraping here
- Install all required packages using
pip install -r requirements.txt
- Create a
config.yaml
file usingconfig_template.yaml
- Place your input file with URLs according to your input configuration
- Configure your variables in the configured input file
- Start the script with
python src/main.py
- Find your output in the configured output directory
This repository develops on the prototype created during a meeting in Lisbon for AIML4OS WP12,