This project implements and compares five NLP models for 3-class financial sentiment classification (Positive / Neutral / Negative) on the Financial PhraseBank dataset. It also includes a full MLOps pipeline covering model serving, containerisation, CI/CD, and cloud deployment.
| Model | Accuracy | Weighted F1 |
|---|---|---|
| Naive Bayes | 68.9% | 62.9% |
| Logistic Regression | 69.3% | 69.9% |
| SVM (Linear) | 69.5% | 70.4% |
| LSTM | 55.4% | 55.4% |
| FinBERT ✅ | 79.4% | 79.0% |
Financial-NLP-Analysis/
│
├── 📓 notebooks/ # Colab notebooks (pipeline stages)
│ ├── 01_data_exploration.ipynb
│ ├── 02_text_preprocessing_pipeline.ipynb
│ ├── 03_feature_engineering_and_split.ipynb
│ ├── 04_classical_ml_models.ipynb
│ ├── 05_deep_learning_models.ipynb
│ └── 06_finbert_sentiment_model.ipynb
│
├── 📊 data/
│ ├── raw/ # Original datasets (from Kaggle/HuggingFace)
│ │ ├── financial_phrasebank.csv
│ │ └── Financial_Sentiment_Categorized.csv
│ └── processed/ # Cleaned & preprocessed data
│ └── clean_phrasebank.csv
│
├── 🤖 models/ # Saved model artefacts
│ ├── classical/ # Sklearn .pkl model files
│ │ ├── naive_bayes_model.pkl
│ │ ├── logistic_regression_model.pkl
│ │ ├── svm_model.pkl
│ │ └── tfidf_vectorizer.pkl
│ ├── lstm/ # Keras .h5 / SavedModel
│ │ └── lstm_sentiment_model.h5
│ └── finbert/ # HuggingFace fine-tuned FinBERT
│ ├── config.json
│ ├── model.safetensors
│ ├── tokenizer_config.json
│ └── vocab.txt
│
├── 📈 results/
│ ├── metrics/ # CSV files with model scores
│ │ ├── classical_model_results.csv
│ │ ├── lstm_results.csv
│ │ └── finbert_results.csv
│ ├── plots/ # All visualisation PNGs
│ │ ├── sentiment_distribution.png
│ │ ├── sentence_length_distribution.png
│ │ ├── model_comparison.png
│ │ ├── lstm_training_curve.png
│ │ └── confusion_matrix_*.png
│ ├── predictions/ # Model prediction CSVs
│ │ └── lstm_predictions.csv
│ └── TF-IDF_vectors/ # Serialised feature matrices
│ ├── X_train_tfidf.pkl
│ ├── X_test_tfidf.pkl
│ ├── y_train.pkl
│ └── y_test.pkl
│
├── 🚀 mlops/ # MLOps deployment artefacts
│ ├── api/ # FastAPI model serving
│ │ ├── main.py
│ │ └── predict.py
│ ├── docker/ # Containerisation
│ │ ├── Dockerfile
│ │ └── docker-compose.yml
│ └── monitoring/ # Model monitoring
│ └── monitor.py
│
├── 📄 reports/ # Final deliverables
│ ├── nlp_case_study_report.docx
│ └── ieee_paper/
│ ├── main.tex
│ └── references.bib
│
├── .github/
│ └── workflows/
│ └── ci.yml # GitHub Actions CI/CD
│
├── requirements.txt
├── .gitignore
└── README.md
git clone https://github.com/YOUR_USERNAME/Financial-NLP-Analysis.git
cd Financial-NLP-Analysispip install -r requirements.txtOpen in Google Colab or Jupyter and run sequentially:
01 → 02 → 03 → 04 → 05 → 06
cd mlops/api
uvicorn main:app --reload
# API available at http://localhost:8000docker-compose up --build| Dataset | Source | Size | Use |
|---|---|---|---|
| Financial PhraseBank | Kaggle | 5,842 sentences | Training |
| Financial Sentiment Categorized | Kaggle | 1,169 sentences | Testing |
All trained models are saved in the models/ directory:
- Classical models (
.pkl) — serialised withjoblib - LSTM (
.h5) — saved withmodel.save() - FinBERT — saved with HuggingFace
trainer.save_model()
This project implements a production-grade MLOps pipeline:
Data Ingestion → Preprocessing → Training → Evaluation → Serving → Monitoring
↑ |
└──────────────────── Feedback Loop ───────────────────────────────┘
| Stage | Tool |
|---|---|
| Experiment Tracking | MLflow |
| Model Serving | FastAPI + Uvicorn |
| Containerisation | Docker + Docker Compose |
| CI/CD | GitHub Actions |
| Cloud Deployment | Google Cloud Run / HuggingFace Spaces |
| Monitoring | Custom drift detection |
Kupakwashe T. Mapuranga Department of Computer Science & AI 📧 kupakwashemapuranga@gmail.com
This project is licensed under the MIT License — see LICENSE for details.