Skip to content

kupakwash/Financial-NLP-Analysis-Case-Study

Repository files navigation

💹 Financial NLP Sentiment Analysis

A Comparative Study of Classical ML, Deep Learning & Transformer Models

Python TensorFlow HuggingFace License: MIT Colab


📌 Project Overview

This project implements and compares five NLP models for 3-class financial sentiment classification (Positive / Neutral / Negative) on the Financial PhraseBank dataset. It also includes a full MLOps pipeline covering model serving, containerisation, CI/CD, and cloud deployment.

Model Accuracy Weighted F1
Naive Bayes 68.9% 62.9%
Logistic Regression 69.3% 69.9%
SVM (Linear) 69.5% 70.4%
LSTM 55.4% 55.4%
FinBERT 79.4% 79.0%

📁 Project Structure

Financial-NLP-Analysis/
│
├── 📓 notebooks/                      # Colab notebooks (pipeline stages)
│   ├── 01_data_exploration.ipynb
│   ├── 02_text_preprocessing_pipeline.ipynb
│   ├── 03_feature_engineering_and_split.ipynb
│   ├── 04_classical_ml_models.ipynb
│   ├── 05_deep_learning_models.ipynb
│   └── 06_finbert_sentiment_model.ipynb
│
├── 📊 data/
│   ├── raw/                           # Original datasets (from Kaggle/HuggingFace)
│   │   ├── financial_phrasebank.csv
│   │   └── Financial_Sentiment_Categorized.csv
│   └── processed/                     # Cleaned & preprocessed data
│       └── clean_phrasebank.csv
│
├── 🤖 models/                         # Saved model artefacts
│   ├── classical/                     # Sklearn .pkl model files
│   │   ├── naive_bayes_model.pkl
│   │   ├── logistic_regression_model.pkl
│   │   ├── svm_model.pkl
│   │   └── tfidf_vectorizer.pkl
│   ├── lstm/                          # Keras .h5 / SavedModel
│   │   └── lstm_sentiment_model.h5
│   └── finbert/                       # HuggingFace fine-tuned FinBERT
│       ├── config.json
│       ├── model.safetensors
│       ├── tokenizer_config.json
│       └── vocab.txt
│
├── 📈 results/
│   ├── metrics/                       # CSV files with model scores
│   │   ├── classical_model_results.csv
│   │   ├── lstm_results.csv
│   │   └── finbert_results.csv
│   ├── plots/                         # All visualisation PNGs
│   │   ├── sentiment_distribution.png
│   │   ├── sentence_length_distribution.png
│   │   ├── model_comparison.png
│   │   ├── lstm_training_curve.png
│   │   └── confusion_matrix_*.png
│   ├── predictions/                   # Model prediction CSVs
│   │   └── lstm_predictions.csv
│   └── TF-IDF_vectors/                # Serialised feature matrices
│       ├── X_train_tfidf.pkl
│       ├── X_test_tfidf.pkl
│       ├── y_train.pkl
│       └── y_test.pkl
│
├── 🚀 mlops/                          # MLOps deployment artefacts
│   ├── api/                           # FastAPI model serving
│   │   ├── main.py
│   │   └── predict.py
│   ├── docker/                        # Containerisation
│   │   ├── Dockerfile
│   │   └── docker-compose.yml
│   └── monitoring/                    # Model monitoring
│       └── monitor.py
│
├── 📄 reports/                        # Final deliverables
│   ├── nlp_case_study_report.docx
│   └── ieee_paper/
│       ├── main.tex
│       └── references.bib
│
├── .github/
│   └── workflows/
│       └── ci.yml                     # GitHub Actions CI/CD
│
├── requirements.txt
├── .gitignore
└── README.md

🚀 Quick Start

1. Clone the Repository

git clone https://github.com/YOUR_USERNAME/Financial-NLP-Analysis.git
cd Financial-NLP-Analysis

2. Install Dependencies

pip install -r requirements.txt

3. Run Notebooks in Order

Open in Google Colab or Jupyter and run sequentially:

01 → 02 → 03 → 04 → 05 → 06

4. Serve the Model (FastAPI)

cd mlops/api
uvicorn main:app --reload
# API available at http://localhost:8000

5. Run via Docker

docker-compose up --build

📦 Datasets

Dataset Source Size Use
Financial PhraseBank Kaggle 5,842 sentences Training
Financial Sentiment Categorized Kaggle 1,169 sentences Testing

🧪 Model Artefacts

All trained models are saved in the models/ directory:

  • Classical models (.pkl) — serialised with joblib
  • LSTM (.h5) — saved with model.save()
  • FinBERT — saved with HuggingFace trainer.save_model()

🏗️ MLOps Pipeline

This project implements a production-grade MLOps pipeline:

Data Ingestion → Preprocessing → Training → Evaluation → Serving → Monitoring
      ↑                                                                   |
      └──────────────────── Feedback Loop ───────────────────────────────┘
Stage Tool
Experiment Tracking MLflow
Model Serving FastAPI + Uvicorn
Containerisation Docker + Docker Compose
CI/CD GitHub Actions
Cloud Deployment Google Cloud Run / HuggingFace Spaces
Monitoring Custom drift detection

📝 Reports & Paper


👤 Author

Kupakwashe T. Mapuranga Department of Computer Science & AI 📧 kupakwashemapuranga@gmail.com


📜 License

This project is licensed under the MIT License — see LICENSE for details.

About

A systematic comparison of classical ML, deep learning, and transformer-based models for financial sentiment analysis. Trained on Financial PhraseBank and evaluated cross-dataset, the study highlights the impact of domain-specific pre-training, with FinBERT significantly outperforming NB, LR, SVM, and LSTM models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors