Skip to content

Latest commit

 

History

History
32 lines (24 loc) · 1.28 KB

File metadata and controls

32 lines (24 loc) · 1.28 KB

Exploratory_data_analysis_and_visualization

This project explores datasets through data cleaning, preprocessing, and visualization. The main tasks include:

Titanic Dataset Analysis

  • Data Loading & Preprocessing

    • Removed unnecessary columns.
    • Extracted deck information from the Cabin column.
    • Label-encoded categorical variables.
    • Imputed missing values with mean (numerical) or mode (categorical).
    • Saved the cleaned dataset to CSV and JSON formats.
  • Exploratory Data Analysis (EDA)

    • Analyzed feature distributions.
    • Calculated medians and modes for survivors and non-survivors.
    • Created “average passenger” profiles and compared them to real passengers.
    • Visualized variable relationships using scatter plots and pairplots.

Example

Text Data Analysis

  • Identified the most common words in positive and negative reviews.
  • Computed TF-IDF vectors for the texts.
  • Visualized key words for easier interpretation.

Wordcloud Negative

Wordcloud Positive

Chart Improvements

  • Selected and improved 3 “junk charts”, making them more informative and visually clear.
  • Saved the enhanced visualizations for reporting and presentation.