Project Overview

layout

title

image

Project Overview

Context

The agency serves a broad spectrum of clients, each with unique financial needs and health profiles. Traditionally, calculating life insurance premiums involved a complex evaluation of multiple factors, often leading to discrepancies and inefficiencies. While financial advisors have access to software for evaluating different plans and determining premiums, the agency wants to empower clients with a platform that allows them to get a rough estimate of their premiums based on their specific budget and status, as well as the potential savings.

Actions

To tackle this challenge, a comprehensive data-driven approach was adopted. The journey began with the collection of extensive client data, including demographic information, health metrics, and lifestyle factors. This data was then meticulously preprocessed to ensure its accuracy and completeness.

I built a predictive model to find relationships between client metrics and life insurance premium for previous clients, and used this to predict premiums for potential new clients.

As I was predicting a numeric output, I tested three regression modeling approaches, namely:

Linear Regression
Decision Tree
Random Forest

Results

The Random Forest had the highest predictive accuracy.

**Metric 1: R-Squared (Test Set)**

Random Forest = 0.918
Decision Tree = 0.908
Linear Regression = 0.795

**Metric 2: Adjusted R-Squared (Test Set)**

Random Forest = 0.915
Decision Tree = 0.904
Linear Regression = 0.786

**Metric 3: Cross Validated R-Squared (K-Fold Cross Validation, k = 4)**

Random Forest = 0.881
Decision Tree = 0.865
Linear Regression = 0.743

As the most important outcome for this project was predictive accuracy, rather than explicitly understanding weighted drivers of prediction, I chose the Random Forest as the model to use for making predictions on the life insurance premiums for future clients.

Key Definition

age: client's age sex: client's gender bmi: Body mass index is a value derived from the mass and height of a person (the body mass (kg) divided by the square of the body height (m^2)) children: number of client's children smoker: client's smoking status region: client's place of residence CI: includes critical illness insurance rated: increased premium due to health problems UL permanent: combined investment and life insurance disability: includes disability insurance premium: client's monthly payment

Data Overview

The initial dataset included various attributes such as age, gender, BMI, number of children, smoking status, region, and several insurance-related features. To prepare the data for modeling, several key steps were undertaken:

Handling Missing Values: Any missing values in the dataset were identified and appropriately addressed.

Dealing with Outliers: The dataset was examined for outliers to ensure the integrity of the data.

Encoding Categorical Variables: Categorical variables like gender, smoker status, and region were encoded using one-hot encoding to make them suitable for machine learning models.

Feature Scaling: Numerical features were standardized to ensure they were on a comparable scale, enhancing the model's performance.

Model Training and Evaluation

With the data prepared, the next step was to train a machine learning model capable of accurately predicting life insurance premiums.

I tested three regression modeling approaches, namely:

Linear Regression
Decision Tree
Random Forest

For each model, I imported the data in the same way but needed to pre-process the data based on the requirements of each particular algorithm. I trained & tested each model, refined each to provide optimal performance, and then measured this predictive performance based on several metrics to give a well-rounded overview of which is best.

The dataset was split into training and testing sets, ensuring that the model could be evaluated on unseen data. The model was trained on the training set, and its performance was evaluated using the testing set. Key metrics of R-squared and adjusted R-squared were calculated to assess the model's accuracy.

To further refine the model, cross-validation was performed using KFold, providing a more robust evaluation by splitting the data into multiple folds and ensuring the model's consistency across different subsets of the data.

Optimal Model Selection Determining the optimal complexity of the decision tree was crucial. By experimenting with different maximum depths for the tree, the optimal depth was identified based on the highest accuracy score. This step ensured that the model was neither too simple to capture essential patterns nor too complex to overfit the training data.

Results and Insights The final decision tree model provided valuable insights into the factors influencing life insurance premiums. Feature importance analysis highlighted the key variables impacting premium calculations, offering transparency and interpretability to WFG's underwriting process.

Visualization and Predictions Visualizations such as histograms, pair plots, and tree plots were created to understand the data distribution and model structure better. Additionally, the model was used to predict premiums for new clients, showcasing its practical applicability in real-world scenarios.

Impact and Future Directions The implementation of this machine learning solution marked a significant milestone for WFG. By accurately predicting life insurance premiums, WFG was able to offer fairer and more personalized insurance plans to their clients, enhancing customer satisfaction and trust.

Looking ahead, WFG plans to continuously refine and expand this model by incorporating additional data sources and exploring more advanced machine learning techniques. This initiative represents a commitment to innovation and excellence, ensuring that WFG remains at the forefront of the financial services industry.

Through this project, WFG has demonstrated the transformative power of data and machine learning in revolutionizing traditional financial processes, paving the way for a more efficient and customer-centric future.

Growth: It would also allow clients to play with premiums and see how different premiums could allocate money for their retirements.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Decision Tree.py		Decision Tree.py
Linear_Regression.py		Linear_Regression.py
README.md		README.md
Random Forest.py		Random Forest.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Context

Actions

Results

Key Definition

Data Overview

Model Training and Evaluation

About

Uh oh!

Releases

Packages

Languages

Arezookhalili/Life-Insurance-Premium-Prediction

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Context

Actions

Results

Key Definition

Data Overview

Model Training and Evaluation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages