Predicting Life Expectancy

📊 Predicting Life Expectancy Using Regression This project develops a predictive model for life expectancy based on a dataset from Kaggle containing health, economic, and demographic variables across multiple countries. Using R, we cleaned and transformed the data, explored visualizations, and built multiple regression models to predict life expectancy with improved accuracy.

📁 Dataset

Source: Kaggle
Size: ~3,000 observations, 22 variables
Target Variable: Life expectancy
Selected Predictors:
- Status (Developed / Developing)
- Alcohol (liters per capita)
- Percentage Expenditure
- Hepatitis B, Polio (immunization rates)
- BMI
- Schooling (average years)

🔧 Workflow Summary

Data Cleaning
- Removed irrelevant variables
- Imputed missing values using mean and k-NN methods
- Scaled and transformed skewed variables
Exploratory Data Analysis (EDA)
- Histograms of predictors and target
- Correlation matrix using pairs()
- Identified skewed distributions (e.g., Alcohol, Expenditure)
Modeling
- Built multiple linear regression models
- Applied log and polynomial transformations
- Selected features based on statistical significance and diagnostics
- Achieved an 18% reduction in MSE
Validation
- Checked linear model assumptions using:
- Residual plots
- Q-Q plots
- R² and adjusted R²

📈 Sample Visualizations

# Histogram of Life Expectancy
hist(data_clean$Life.expectancy, main = "Life Expectancy", col = "skyblue", xlab = "Years")

# Correlation Plot
pairs(data_clean, cex = 0.1)

🧠 Sample Model Code

# Fit linear model
model <- lm(Life.expectancy ~ Status + Alcohol + percentage.expenditure + Hepatitis.B +
            Polio + BMI + Schooling, data = data_clean)

# Summary of model
summary(model)

# Residual diagnostics
par(mfrow = c(2, 2))
plot(model)

✅ Results

Final model includes 7 key predictors
Achieved:
- R² ≈ 0.78
- MSE ↓ by 18% after transformations
Schooling, Alcohol, and Polio were strong positive predictors of life expectancy

🔧 Tools Used Language: R Packages: dplyr, ggplot2, caret, MASS

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitattributes		.gitattributes
README.md		README.md
Stats 101A Final Report.Rmd		Stats 101A Final Report.Rmd
final_project.pdf		final_project.pdf
final_project.rmd		final_project.rmd
histogram-life-expectancy.png		histogram-life-expectancy.png
life_data.csv		life_data.csv
pairs-plot.png		pairs-plot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Predicting Life Expectancy

About

Uh oh!

Releases

Packages

rxqnx00/predicting-life-expectancy

Folders and files

Latest commit

History

Repository files navigation

Predicting Life Expectancy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages