EasyRec is a versatile Python library carefully designed to streamline the process of configuring and building Sequential Recommender Systems models using the power and robust capabilites of PyTorch Lightning and PyTorch models.
-
User-Friendly Configuration: EasyRec revolutionizes the way you work with Sequential Recommender Systems by providing a seamless configuration-based interface. All your settings can be easily defined in YAML files, making it effortless to customize and fine-tune your experiments.
-
Three Essential Utilities:
- 📊 Data: EasyRec simplifies data handling, allowing you to load and preprocess data effortlessly.
- 📝 Experiments: Define, track, and save experiments with unique IDs to prevent duplication. Keep your work organized and accessible.
- ⚙️ Torch Integration: Seamlessly integrate PyTorch models, train, test, and save your models with minimal effort. EasyRec handles the heavy lifting, so you can focus on innovation.
Below is an outline of key files and folders you'll find in this project, along with their purposes:
- easy_exp
- Includes functions for experiment handling, such as creating experiment IDs, saving and loading experiments, and managing experiment logs.
- easy_rec
- This is the code for the paper Sequential Recommender Systems Reproducibility Analysis.
- Includes functions for data pre-processing and dataloaders creation
- Defines Sequential Recommendation Models
- Contains IR metrics, such as NDCG, Recall, and MRR
- Provides losses for model's training and evaluation
- easy_torch
- Includes functions for metrics, loading models, and creating trainers in PyTorch Lightning.
- Defines steps, loss, optimizer, and other parameters to use.
- Sets callbacks and dataloaders.
- Also includes utilities for training and testing the model, as well as saving and reading logs.
- cfg
- Contains the configurations used in the testing phase of this repo: these are YAML files nested inside each other. The main one is config_rec.yaml, where you specify the name of the experiment.
- data_cfg: loading and preprocessing parameters of the dataset.
- model: optimizer, metrics and name of the model used.
- trainer_params_cfg: accelerator, number of epochs, logger to save files.
- loader_params_cfg: batch size, number of workers, number of negatives.
- emission_tracker: configuration for CodeCarbon.
- flops_profiler: configuration for DeepSpeed.
- rec_models: specific configurations for each model.
- Contains the configurations used in the testing phase of this repo: these are YAML files nested inside each other. The main one is config_rec.yaml, where you specify the name of the experiment.
- ntb
- Houses a notebook for training and testing a Sequential Recommender System on a dataset.
- out
- Includes metrics and energy consumption for each experiments. Also includes the saved parameters of the best model per single run.
To run our code follow the next steps:
- Download and install easy_lightning.
pip3 install --upgrade --force-reinstall git+https://github.com/PokeResearchLab/easy_lightning.git
- Download this repo.
git clone https://github.com/antoniopurificato/recsys_repro_conf && cd recsys_repro_conf
- Install the necessary requirements.
pip3 install -r requirements.txt
- Download the data;
cd ntb && bash download_data.bash
- Run a simple experiment. By default, these files are set to run SASRec on ML-1M dataset.
python3 main.py
@article{betello2024reproducible,
title={A Reproducible Analysis of Sequential Recommender Systems},
author={Betello, Filippo and Purificato, Antonio and Siciliano, Federico and Trappolini, Giovanni and Bacciu, Andrea and Tonellotto, Nicola and Silvestri, Fabrizio},
journal={IEEE Access},
year={2024},
publisher={IEEE}
}
Federico Siciliano, Filippo Betello, Antonio Purificato, Giulia Di Teodoro, Maria Diana Calagaru, Erica Luciani, Federico Carmignani.