Skip to content

Mentigen/CreditScoringML

Repository files navigation

Bank Credit Scoring Model

Credit Scoring Python Streamlit License

Overview

A machine learning pipeline to predict loan default probability, with:

  • A CLI script for full analysis and artifact generation
  • A Streamlit dashboard for interactive EDA, model training, metrics, and live predictions

Project Structure

  • credit_scoring.py — CLI pipeline: EDA, preprocessing, training, metrics, plots
  • credit_scoring_app.py — Streamlit dashboard (upload CSV → explore → train → evaluate → predict)
  • CreditScoring.ipynb — Original notebook (will be removed later)

Dataset

Typical columns:

  • SeriousDlqin2yrs (target: 0/1)
  • age, DebtRatio, MonthlyIncome, NumberOfDependents
  • Payment delinquency counts, utilization metrics, etc.

Features

  • EDA: distributions, correlation matrix
  • Preprocessing: missing values, basic encoding for categoricals
  • Modeling: Logistic Regression baseline
  • Evaluation: confusion matrix, ROC AUC, PR curve, classification report
  • Feature importance (coefficients)
  • Streamlit app: upload data, interactively analyze and predict

Requirements

pip install -r requirements.txt

Main libraries: numpy, pandas, seaborn, matplotlib, scikit-learn, streamlit.

Dependencies and requirements.txt

Keep and commit requirements.txt for reproducible installs (local, CI, deployment).

  • Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
python -m pip install --upgrade pip
  • Install dependencies from the repo file:
pip install -r requirements.txt
  • Update requirements.txt after changing packages:
pip freeze > requirements.txt

Quick Start

CLI (generates plots and metrics to files):

python credit_scoring.py

Streamlit app:

streamlit run credit_scoring_app.py

Using the Streamlit App

  1. Upload a CSV containing SeriousDlqin2yrs and features.
  2. Explore: target distribution, numeric distributions, correlation matrix.
  3. Train: one-click Logistic Regression with stratified split.
  4. Evaluate: confusion matrix, ROC AUC, classification report, top features.
  5. Predict: input feature values and get default probability with a simple recommendation.

Results (CLI)

The script saves:

  • confusion_matrix.png
  • roc_curve.png
  • precision_recall_curve.png
  • feature_importance.png
  • age_distribution.png, debt_ratio_distribution.png, numerical_distributions.png, correlation_matrix.png

License

MIT — see LICENSE.

Contact

Open an issue on GitHub for questions or suggestions.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published