News Recommendation Coursework by Puneet Singh and Karanjot Singh.
Note: [optional] steps must only be run if you wish to scrape new data, use different GloVe model, and generate new clickstream data. To skip all the optional steps, download data from here and extract data
directory parallel to main.py
.
If you wish to run the project end-to-end, i.e. generate clickstream data, split data, train a Hybrid Collaborative Filtering Recommender System, and interact with it., change news_env
in ./run.bat
file with your conda environment with all the dependencies installed. Now run ./run.bat
using CLI to generate data, split data, and train a Hybrid Collaborative Filtering Recommender System. To interact with the trained model; change the configuration in config.yaml
under the NeuMF config
comment accordingly, and run ./run_app.bat
file. If you wish to run the project with more granular strategy, follow the following steps:
- Install all the dependencies using
pip install -r requirements.txt
orconda env create -f news_env.yml
- Change the parameters in
config.yaml
file to intended values. - [optional] Run the
src/news_scraping/*.py
scripts to scrape news articles from the following websites:- BBC News:
src/news_scraping/BBC_scraper.py
- Times Of India News:
src/news_scraping/TOI_scraper.py
- Yahoo! News:
src/news_scraping/YHNW_scraper.py
- BBC News:
- [optional] Merge all the data scraped in last step into a csv file.
- [optional] Download a GloVe model into
data/GloVe
, alternatively, use given custom trained GloVe vectors. - [optional] Run the script
src/text_preprocessing.py
to:- preprocess the scraped news article text
- create vector representation of the articles
- create clusters of news articles from these vectors
- [optional] Run the script
src/data_generator/generator.py
to generate clickstream data - [optional] Run the script
src/data_manager.py
to split the clickstream data into train and test set - Run
main.py
file to:- Train a hybrid Recommender System
- or Generate recommendations from (and finetune) a hybrid Recommender System