Description
A machine learning project to predict the likelihood of travel insurance claims using various models. It involves data preprocessing, exploratory data analysis, and model evaluation to assist insurance companies in risk assessment and decision-making. Best model results are saved into a sav format file that can be used to deploy to the application.
This project focuses on predicting travel insurance claims using machine learning techniques. The goal is to build a predictive model that helps insurance companies identify the likelihood of a customer filing a claim based on historical data. This can assist companies in risk assessment and creating more tailored insurance products.
- Analyze historical travel insurance data to understand customer behavior.
- Build a machine learning model to predict the likelihood of insurance claims.
- Provide actionable insights for optimizing insurance policies and reducing risk.
The dataset used in this project was sourced from Kaggle and contains information about customer demographics, travel details, and whether an insurance claim was filed. The key features include customer age, duration of travel, travel type, and insurance claim status.
- Data Preprocessing: Cleaned and prepared the data by handling missing values, encoding categorical variables, and normalizing numerical features.
- Exploratory Data Analysis (EDA): Analyzed key features to identify trends and relationships that impact claim likelihood.
- Model Building: Experimented with several machine learning algorithms for classification problems to predict insurance claims.
- Model Evaluation: Evaluated models using recall to choose the best performing model.
- Older customers tend to have a higher likelihood of filing travel insurance claims.
- Customers with longer travel durations are more likely to make claims.
- The Random Forest model showed the best performance in terms of accuracy and generalizability.
- Basic Libraries: NumPy, Pandas
- Visualization: Matplotlib, Seaborn
- Statistical Hypothesis Testing: SciPy
- Applying Algorithm Chains: Scikit-learn (ColumnTransformer, Pipeline)
- Data Encoding: Scikit-learn (OneHotEncoder), Category Encoders
- Data Scaling: Scikit-learn (RobustScaler)
- Data Splitting: Scikit-learn (train_test_split)
- Modeling: Scikit-learn (Logistic Regression, KNeighborsClassifier, DecisionTreeClassifier, RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier), XGBoost, LightGBM
- Model Benchmarking: Scikit-learn (cross_val_score, StratifiedKFold)
- Metrics Evaluation: Scikit-learn (confusion_matrix, classification_report, recall_score, precision_recall_curve, auc, learning_curve)
- Handling Imbalanced Data: Scikit-learn (compute_sample_weight), Imbalanced-learn (Pipeline, SMOTE, ADASYN, SMOTEENN)
- Hyperparameter Tuning: Scikit-learn (RandomizedSearchCV)
- Saving Model: Pickle
- Calculate Training Time: Time
- Clone the repository:
git clone https://github.com/ffarishelmi/Travel-Insurance-Claim-Prediction.git
- Install the required library:
pip install scipy pip install scikit-learn pip install category_encoders pip install xgboost pip install lightgbm pip install imbalanced-learn pip install pickle5
- Run travel_insurance_claim_prediction.ipynb file the notebook environment you are using.
This project provides a machine learning-based solution to predict travel insurance claims, which can help insurance companies in risk assessment and decision-making processes. By understanding key factors that influence claims, insurers can better tailor their products to customer needs.
- Experiment with other advanced machine learning algorithms to further improve prediction accuracy.
- Create a dashboard to visualize key statistics data to monitor insurance sales performance.
Feel free to contribute by suggesting improvements or adding new features!