25070149022 - MLOps Mini Project

IMDB Sentiment Analysis - End-to-End MLOps Pipeline

Student: Shoeb Shakil Sutar Roll No: 25070149022 Course: Essentials of MLOps Faculty: Mr. Shridhar Shende

📌 Problem Definition

This project implements a complete MLOps pipeline for IMDB Movie Review Sentiment Analysis. The primary focus is on the MLOps infrastructure rather than model performance. Traditional ML models are trained, tracked, versioned, and deployed following industry-standard MLOps practices.

Note: A Custom Transformer Encoder is implemented from scratch for research comparison but is not deployed due to computational constraints.

🏗️ Architecture

Raw Data (IMDB CSV - 50,000 reviews) → Data Versioning (DVC → AWS S3) → Data Preprocessing (NLTK) → Model Training → Experiment Tracking (MLflow) → Best Model Selected (Logistic Regression + TF-IDF → 88.43%) → Flask REST API → Docker Container → CI/CD (GitHub Actions) → Auto Deploy → AWS EC2 Deployment → Monitoring & Logging

🛠️ Tools & Technologies

Tool	Purpose
DVC	Data versioning
AWS S3	Remote data storage
MLflow	Experiment tracking
GitHub Actions	CI/CD automation
Docker	Containerization
AWS EC2	Cloud deployment
Flask	REST API
Scikit-learn	Traditional ML models
PyTorch	Custom Transformer
NLTK	Text preprocessing

📊 Model Results

Model	Vectorizer	Accuracy
Logistic Regression	TF-IDF	88.43% ✅
Linear SVC	TF-IDF	87.90%
Logistic Regression	CountVec	86.68%
Linear SVC	CountVec	85.77%
Multinomial NB	CountVec	84.62%
Random Forest	TF-IDF	83.95%
Random Forest	CountVec	83.91%
Custom Transformer	BERT Tokenizer	85.79%

🚀 How to Run

1. Clone the repository

git clone https://github.com/sutarshoeb/25070149022_MLOps_Project.git cd 25070149022_MLOps_Project

2. Create virtual environment

python3 -m venv venv source venv/bin/activate

3. Install dependencies

pip install -r requirements.txt

4. Pull data from DVC

dvc pull

5. Run preprocessing

python3 src/data_preprocessing.py

6. Train traditional models

python3 src/train_traditional.py

7. View MLflow experiments

mlflow ui Open http://127.0.0.1:5000

8. Run Flask API locally

python3 app.py

9. Test API

curl -X POST http://localhost:5000/predict -H "Content-Type: application/json" -d '{"review": "This movie was amazing!"}'

🐳 Docker

Build image

docker build -t imdb-sentiment-api .

Run container

docker run -p 5000:5000 imdb-sentiment-api

⚙️ CI/CD Pipeline

Every push to main branch automatically:

Sets up Python 3.10
Installs dependencies
Tests preprocessing module
Tests Flask import
Checks project structure
Deploys to AWS EC2 if all tests pass

☁️ Live API

The API is deployed on AWS EC2 and accessible at: http://3.109.243.234:5000

Endpoints:

Endpoint	Method	Description
/	GET	API information
/predict	POST	Predict sentiment
/health	GET	Health check

Sample Request:

POST http://3.109.243.234:5000/predict {"review": "This movie was absolutely amazing!"}

Sample Response:

{"review": "This movie was absolutely amazing!", "sentiment": "Positive", "confidence": "High"}

📁 Project Structure

25070149022_MLOps_Project/ ├── .github/workflows/ci_cd.yml ├── .dvc/config ├── data/raw/ ├── data/processed/ ├── models/ ├── src/data_preprocessing.py ├── src/train_traditional.py ├── src/train_transformer.py ├── src/evaluate.py ├── src/monitoring.py ├── logs/ ├── app.py ├── Dockerfile ├── dvc.yaml ├── dvc.lock ├── params.yaml └── requirements.txt

📝 Monitoring

Every prediction is logged with timestamp, review length, predicted sentiment and confidence level. Logs are stored in logs/app.log

🔗 GitHub Repository

https://github.com/sutarshoeb/25070149022_MLOps_Project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

25070149022 - MLOps Mini Project

IMDB Sentiment Analysis - End-to-End MLOps Pipeline

📌 Problem Definition

🏗️ Architecture

🛠️ Tools & Technologies

📊 Model Results

🚀 How to Run

1. Clone the repository

2. Create virtual environment

3. Install dependencies

4. Pull data from DVC

5. Run preprocessing

6. Train traditional models

7. View MLflow experiments

8. Run Flask API locally

9. Test API

🐳 Docker

Build image

Run container

⚙️ CI/CD Pipeline

☁️ Live API

Endpoints:

Sample Request:

Sample Response:

📁 Project Structure

📝 Monitoring

🔗 GitHub Repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.dvc		.dvc
.github/workflows		.github/workflows
data		data
logs		logs
src		src
.DS_Store		.DS_Store
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

25070149022 - MLOps Mini Project

IMDB Sentiment Analysis - End-to-End MLOps Pipeline

📌 Problem Definition

🏗️ Architecture

🛠️ Tools & Technologies

📊 Model Results

🚀 How to Run

1. Clone the repository

2. Create virtual environment

3. Install dependencies

4. Pull data from DVC

5. Run preprocessing

6. Train traditional models

7. View MLflow experiments

8. Run Flask API locally

9. Test API

🐳 Docker

Build image

Run container

⚙️ CI/CD Pipeline

☁️ Live API

Endpoints:

Sample Request:

Sample Response:

📁 Project Structure

📝 Monitoring

🔗 GitHub Repository

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages