A machine learning pipeline to predict loan default probability, with:
- A CLI script for full analysis and artifact generation
- A Streamlit dashboard for interactive EDA, model training, metrics, and live predictions
- credit_scoring.py — CLI pipeline: EDA, preprocessing, training, metrics, plots
- credit_scoring_app.py — Streamlit dashboard (upload CSV → explore → train → evaluate → predict)
- CreditScoring.ipynb — Original notebook (will be removed later)
Typical columns:
- SeriousDlqin2yrs (target: 0/1)
- age, DebtRatio, MonthlyIncome, NumberOfDependents
- Payment delinquency counts, utilization metrics, etc.
- EDA: distributions, correlation matrix
- Preprocessing: missing values, basic encoding for categoricals
- Modeling: Logistic Regression baseline
- Evaluation: confusion matrix, ROC AUC, PR curve, classification report
- Feature importance (coefficients)
- Streamlit app: upload data, interactively analyze and predict
pip install -r requirements.txtMain libraries: numpy, pandas, seaborn, matplotlib, scikit-learn, streamlit.
Keep and commit requirements.txt for reproducible installs (local, CI, deployment).
- Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
python -m pip install --upgrade pip- Install dependencies from the repo file:
pip install -r requirements.txt- Update requirements.txt after changing packages:
pip freeze > requirements.txtCLI (generates plots and metrics to files):
python credit_scoring.pyStreamlit app:
streamlit run credit_scoring_app.py- Upload a CSV containing SeriousDlqin2yrs and features.
- Explore: target distribution, numeric distributions, correlation matrix.
- Train: one-click Logistic Regression with stratified split.
- Evaluate: confusion matrix, ROC AUC, classification report, top features.
- Predict: input feature values and get default probability with a simple recommendation.
The script saves:
- confusion_matrix.png
- roc_curve.png
- precision_recall_curve.png
- feature_importance.png
- age_distribution.png, debt_ratio_distribution.png, numerical_distributions.png, correlation_matrix.png
MIT — see LICENSE.
Open an issue on GitHub for questions or suggestions.