This repository contains code and notebooks for analyzing emotions in text data using various machine learning models.
This project aims to detect and analyze emotions in text data from different sources (Twitter and GoEmotions dataset) using multiple machine learning approaches. The models range from traditional machine learning techniques like SVM and Logistic Regression to more advanced deep learning models such as BERT, DistilBERT, RoBERTa, and CNN.
.
├── data_scripts/
│ ├── data_processing_test.py
│ ├── data_processing.py
│ ├── Emotion_DAIR_Analysis.ipynb
│ └── GoEmotions_Analysis.ipynb
├── models/
│ ├── bert/
│ │ ├── BERT_Twitter.ipynb
│ │ └── CS6120_BERT_GoEmotions.ipynb
│ ├── cnn/
│ │ ├── CNN_goemotions.ipynb
│ │ └── CNN_twitter.ipynb
│ ├── distilbert/
│ │ ├── DistilBERT_FINAL_GoEmotions.ipynb
│ │ └── DistilBERT_Twitter.ipynb
│ ├── logistic_regression/
│ │ ├── LR_goemotions.py
│ │ └── LR_twitter.py
│ ├── roberta/
│ │ ├── RoBERTA_GoEmotions.ipynb
│ │ └── roBERTa_Twitter-2.ipynb
│ └── svm/
│ ├── svm_goemotion.py
│ ├── svm_twitter.py
│ └── model_test.py
├── .gitignore
├── demo.py
├── README.md
└── requirements.txt
The project implements and compares the following models:
-
Traditional Machine Learning
- Support Vector Machines (SVM)
- Logistic Regression (LR)
-
Transformer-based Models
- BERT
- DistilBERT (a lighter version of BERT)
- RoBERTa
-
Convolutional Neural Networks (CNN)
Each model is implemented for both Twitter data and the GoEmotions dataset to compare performance across different data sources.
The project works with two main datasets:
- Twitter data: Tweets labeled with emotions
- GoEmotions: A dataset of comments from Reddit, labeled with emotions
The data_scripts directory contains scripts for:
- Loading and preprocessing text data
- Feature extraction
- Data transformation for different model architectures
- Analysis of emotion distributions in datasets
To install the required dependencies:
pip install -r requirements.txt- Jupyter Notebooks: Open and run the respective
.ipynbfiles in the model directories - Python Scripts: Run the
.pyfiles for the corresponding models
Example:
python models/svm/svm_twitter.pyA demonstration script is available:
python demo.pyThe Demo UI to test all trained models against custom text inputs is deployed and can be leveraged at: Emotion Detection Ui
This allows for quick testing of emotion detection on sample text inputs.
Sanshrit Bakshi
Shashwat Tiwari
Sanidhya Maharia