π³ Exploratory Data Analysis (EDA) β Titanic Dataset
π Project Overview
This project is part of my internship task on Exploratory Data Analysis (EDA). The goal is to explore the Titanic dataset using summary statistics and visualizations to identify patterns, relationships, and insights that could be useful for further machine learning tasks.
π Dataset
We are using the Titanic dataset which contains passenger information such as age, sex, class, fare, and survival status.
Target column: Survived (0 = Did not survive, 1 = Survived)
π Tools & Libraries Used
Python
Pandas β Data handling
Matplotlib β Visualization
Seaborn β Advanced plots
π§Ύ Steps Performed
1οΈβ£ Summary Statistics
Gave insights into Age, Fare, and other numerical features.
Example: Average Age β 29.7 years, Average Fare β 32.2.
2οΈβ£ Histograms & Boxplots
Age Histogram β Most passengers were 20β40 years old.
Fare Boxplot β Showed extreme outliers (expensive tickets up to 512).
3οΈβ£ Correlation Matrix
Showed relationships between numeric features.
Strong negative correlation between Fare and Passenger Class.
Weak correlation between Age and Survival.
4οΈβ£ Patterns & Trends
Female survival rate: 74%
Male survival rate: 19% β‘ Clear trend: Women had higher survival chances.
5οΈβ£ Feature-Level Inferences
1st Class survival: 63%
2nd Class survival: 47%
3rd Class survival: 24% β‘ Higher-class passengers had better survival rates.
π Key Insights
Women survived at much higher rates than men.
Higher-class passengers had better survival chances.
Fare distribution was highly skewed with significant outliers.
Age distribution showed most passengers were young adults.
π Submission Notes
Repository includes:
Jupyter Notebook with analysis and visualizations
Dataset (titanic.csv)
This README.md
AUTHOR NAME -RAKSHITH n