This project focuses on building, tuning, and evaluating multiple regression models to predict how much a customer will spend on a catalog purchase based on demographic and behavioral features.
- Task: Predict the continuous
Purchase
amount from customer data. - Goal: Identify the best regression model that minimizes prediction error.
- Dataset: Customer-level features and their catalog purchase amounts.
- Metric: Root Mean Squared Error (RMSE)
- Python, Jupyter Notebook
scikit-learn
XGBoost
LightGBM
MLPRegressor
fromsklearn.neural_network
- Applied Nested Cross-Validation to ensure unbiased model evaluation.
- Compared models: Linear Regression, KNN, SVR, Decision Tree, Random Forest, Gradient Boosting, XGBoost, LightGBM, and Neural Network.
- Final model was tuned using
GridSearchCV
and tested on a holdout set.
- Best model: XGBoost (or Neural Net, depending on part B)
- Robust performance and generalization verified on the holdout set.
Predicting customer spend helps optimize:
- Targeted marketing
- Inventory forecasting
- Personalized promotions
๐ Explore the full notebook:
regression_modeling.ipynb