Customer Churn Prediction

Problem

Customer churn significantly impacts recurring revenue. This project builds a machine learning pipeline to predict high-risk customers and simulate retention targeting strategies

Dataset

440,00+ customer records
Features include: Tenure, Usage Frequency, Support Calls, Payment Delay, etc.
Bianry target variable: Churn (1 = churned, 0 = retained)

Project Structure

api/
- app.py
notebooks/
eda.ipynb
src/
- preprocessing.py
- train.py
- evaluate.py
models/
- churn-prediction.joblib
main.py
requirements.txt
README.md

Methodology

The project follows a standard machine learning pipeline:

Data Loading
Data Cleaning
Exploratory Data Analysis (EDA)
Feature Preprocessing
Model Training
Model Evaluation
Model deployment via FastAPI

Model comparison in progress. Planned models include:

Logistic Regression (baseline)
Random Forest
Gradient Boosting

Evaluation will focus on:

ROC-AUC
Precision / Recall
Confusion Matrix

Results

To be finalised after model benchmarking and evaluation

Business Implications

Insights will be dreived after selecting the optimal recall-precisin balance for churn detection.

How to Run

Create virtual environment python -m venv venv
Activate environment source venv/bin/activate or Windows equivalent
Install dependencies pip install -r requirements.txt
Run the pipeline python main.py

Model Performance (Logsitic Regression)

ROC Curve

The model achieved an ROC-AUC score of 0.91, indicating strong classification performance.

Classification Report

Confusion Matrix

Model Performance (Random Forest)

ROC Curve

The model achieved an ROC-AUC score of 0.95, indicating a stronger classification performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Prediction

Problem

Dataset

Project Structure

Methodology

Results

Business Implications

How to Run

Model Performance (Logsitic Regression)

ROC Curve

Classification Report

Confusion Matrix

Model Performance (Random Forest)

ROC Curve

Classification Report

Confusion Matrix

TO DO

Model Performance (Gradient Boost)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
api		api
models		models
notebooks		notebooks
reports		reports
src		src
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction

Problem

Dataset

Project Structure

Methodology

Results

Business Implications

How to Run

Model Performance (Logsitic Regression)

ROC Curve

Classification Report

Confusion Matrix

Model Performance (Random Forest)

ROC Curve

Classification Report

Confusion Matrix

TO DO

Model Performance (Gradient Boost)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages