Customer churn significantly impacts recurring revenue. This project builds a machine learning pipeline to predict high-risk customers and simulate retention targeting strategies
- 440,00+ customer records
- Features include: Tenure, Usage Frequency, Support Calls, Payment Delay, etc.
- Bianry target variable: Churn (1 = churned, 0 = retained)
- api/
- app.py
- notebooks/
- eda.ipynb
- src/
- preprocessing.py
- train.py
- evaluate.py
- models/
- churn-prediction.joblib
- main.py
- requirements.txt
- README.md
The project follows a standard machine learning pipeline:
- Data Loading
- Data Cleaning
- Exploratory Data Analysis (EDA)
- Feature Preprocessing
- Model Training
- Model Evaluation
- Model deployment via FastAPI
Model comparison in progress. Planned models include:
- Logistic Regression (baseline)
- Random Forest
- Gradient Boosting
Evaluation will focus on:
- ROC-AUC
- Precision / Recall
- Confusion Matrix
To be finalised after model benchmarking and evaluation
Insights will be dreived after selecting the optimal recall-precisin balance for churn detection.
-
Create virtual environment python -m venv venv
-
Activate environment source venv/bin/activate or Windows equivalent
-
Install dependencies pip install -r requirements.txt
-
Run the pipeline python main.py
The model achieved an ROC-AUC score of 0.91, indicating strong classification performance.
The model achieved an ROC-AUC score of 0.95, indicating a stronger classification performance.
.png)
.png)
.png)
.png)
.png)
.png)