Skip to content

sfriam/Loan-Default-Risk-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ’Έ Loan Default Risk Prediction - End-to-End ML Project

πŸ“Œ Objective

Build a machine learning model to predict loan default risk using LendingClub data (2007–2018). The goal is to help financial institutions make informed lending decisions by identifying high-risk borrowers before disbursing loans.

🧰 Tools & Technologies Used

  • Python (Pandas, NumPy, Matplotlib, Seaborn)
  • Machine Learning: Scikit-learn, XGBoost, LightGBM, Random Forest
  • Hyperparameter Optimization: Optuna
  • Model Interpretability: SHAP
  • Environment: Google Colab

πŸ” Dataset Overview

  • Dataset Source: LendingClub via Kaggle
  • Raw dataset: accepted_2007_to_2018Q4.csv (~2 million records)
  • Target variable: loan_status β†’ binary (Fully Paid β†’ 0, Charged Off β†’ 1)

πŸ”¬ Data Preprocessing & Feature Engineering

  • Selected relevant features (loan amount, term, interest rate, income, credit history, etc.)
  • Encoded categorical variables using Label Encoding
  • Converted % fields (like int_rate, revol_util) to float
  • Handled missing values using median imputation
  • Added derived features like credit_history_length (from issue date and earliest credit line)

πŸ“Š Exploratory Data Analysis (EDA)

  • Analyzed class imbalance: ~20% of loans are defaults
  • Visualized default rates by:
    • Grade & Sub-grade
    • Employment Length
    • Home Ownership
    • Loan Purpose
  • Observed key default patterns (e.g., higher interest & lower grades β†’ higher default)

Modeling Approaches

Three tree-based models were trained and compared:

Model AUC-ROC Accuracy Recall (Default) F1-Score (Default)
Random Forest 0.738 0.68 0.66 0.45
XGBoost 0.741 0.68 0.67 0.46
LightGBM 0.743 0.68 0.68 0.46

βœ… LightGBM was selected as the best performing model (highest AUC, best recall for default class).

πŸ§ͺ Hyperparameter Tuning (Optuna)

  • Used Optuna to maximize AUC by optimizing:
    • learning_rate, num_leaves, max_depth
    • min_child_samples, subsample, colsample_bytree
  • Applied early stopping during training
  • Best AUC achieved via Optuna: 0.7379

πŸ“ˆ Threshold Tuning (Precision vs Recall)

  • Evaluated model across thresholds (0.0 to 1.0)
  • Identified best threshold: 0.23
  • Final classification metrics at optimal threshold:
Precision (default): 0.37
Recall (default):    0.60
F1-score (default):  0.46
Accuracy:            0.72
AUC-ROC:             0.7419





About

Classifying whether a borrower will default on a loan using advanced machine learning models.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors