Travel Insurance Claim Prediction

Description

A machine learning project to predict the likelihood of travel insurance claims using various models. It involves data preprocessing, exploratory data analysis, and model evaluation to assist insurance companies in risk assessment and decision-making. Best model results are saved into a sav format file that can be used to deploy to the application.

Project Overview

This project focuses on predicting travel insurance claims using machine learning techniques. The goal is to build a predictive model that helps insurance companies identify the likelihood of a customer filing a claim based on historical data. This can assist companies in risk assessment and creating more tailored insurance products.

Objectives

Analyze historical travel insurance data to understand customer behavior.
Build a machine learning model to predict the likelihood of insurance claims.
Provide actionable insights for optimizing insurance policies and reducing risk.

Dataset

The dataset used in this project was sourced from Kaggle and contains information about customer demographics, travel details, and whether an insurance claim was filed. The key features include customer age, duration of travel, travel type, and insurance claim status.

Methodology

Data Preprocessing: Cleaned and prepared the data by handling missing values, encoding categorical variables, and normalizing numerical features.
Exploratory Data Analysis (EDA): Analyzed key features to identify trends and relationships that impact claim likelihood.
Model Building: Experimented with several machine learning algorithms for classification problems to predict insurance claims.
Model Evaluation: Evaluated models using recall to choose the best performing model.

Key Insights

Older customers tend to have a higher likelihood of filing travel insurance claims.
Customers with longer travel durations are more likely to make claims.
The Random Forest model showed the best performance in terms of accuracy and generalizability.

Tools Used

Basic Libraries: NumPy, Pandas
Visualization: Matplotlib, Seaborn
Statistical Hypothesis Testing: SciPy
Applying Algorithm Chains: Scikit-learn (ColumnTransformer, Pipeline)
Data Encoding: Scikit-learn (OneHotEncoder), Category Encoders
Data Scaling: Scikit-learn (RobustScaler)
Data Splitting: Scikit-learn (train_test_split)
Modeling: Scikit-learn (Logistic Regression, KNeighborsClassifier, DecisionTreeClassifier, RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier), XGBoost, LightGBM
Model Benchmarking: Scikit-learn (cross_val_score, StratifiedKFold)
Metrics Evaluation: Scikit-learn (confusion_matrix, classification_report, recall_score, precision_recall_curve, auc, learning_curve)
Handling Imbalanced Data: Scikit-learn (compute_sample_weight), Imbalanced-learn (Pipeline, SMOTE, ADASYN, SMOTEENN)
Hyperparameter Tuning: Scikit-learn (RandomizedSearchCV)
Saving Model: Pickle
Calculate Training Time: Time

How to Run

Clone the repository:

git clone https://github.com/ffarishelmi/Travel-Insurance-Claim-Prediction.git

Install the required library:

pip install scipy 
pip install scikit-learn
pip install category_encoders
pip install xgboost
pip install lightgbm
pip install imbalanced-learn 
pip install pickle5

Run travel_insurance_claim_prediction.ipynb file the notebook environment you are using.

Conclusion

This project provides a machine learning-based solution to predict travel insurance claims, which can help insurance companies in risk assessment and decision-making processes. By understanding key factors that influence claims, insurers can better tailor their products to customer needs.

Next Steps

Experiment with other advanced machine learning algorithms to further improve prediction accuracy.
Create a dashboard to visualize key statistics data to monitor insurance sales performance.

Feel free to contribute by suggesting improvements or adding new features!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Travel Insurance Claim Prediction

Project Overview

Objectives

Dataset

Methodology

Key Insights

Tools Used

How to Run

Conclusion

Next Steps

Files

README.md

Latest commit

History

README.md

File metadata and controls

Travel Insurance Claim Prediction

Project Overview

Objectives

Dataset

Methodology

Key Insights

Tools Used

How to Run

Conclusion

Next Steps