Task 1: Data Cleaning & Preprocessing
π Objective
Learn how to clean and prepare raw data for Machine Learning.
π Tools Used
Python
Pandas
NumPy
Matplotlib
Seaborn
π Dataset
You can use any dataset relevant to the task. Example: Titanic Dataset. Download Titanic Dataset
π Steps Performed
-
Imported dataset and explored basic information (null values, data types).
-
Handled missing values using mean/median/imputation.
-
Converted categorical features into numerical using encoding techniques.
-
Normalized/standardized numerical features.
-
Visualized outliers using boxplots and handled them.
π What I Learned
Data cleaning
Handling null values
Encoding categorical variables
Feature scaling (normalization/standardization)
Outlier detection
β Interview Questions
-
What are the different types of missing data?
-
How do you handle categorical variables?
-
What is the difference between normalization and standardization?
-
How do you detect outliers?
-
Why is preprocessing important in ML?
-
What is one-hot encoding vs label encoding?
-
How do you handle data imbalance?
-
Can preprocessing affect model accuracy?
π Submission Guidelines
Created a GitHub repository for this task.
Added code,this README.md file. Task 1: Data Cleaning & Preprocessing
π Objective
Learn how to clean and prepare raw data for Machine Learning.
π Tools Used
Python
Pandas
NumPy
Matplotlib
Seaborn
π Dataset
You can use any dataset relevant to the task. Example: Titanic Dataset. Download Titanic Dataset
π Steps Performed
-
Imported dataset and explored basic information (null values, data types).
-
Handled missing values using mean/median/imputation.
-
Converted categorical features into numerical using encoding techniques.
-
Normalized/standardized numerical features.
-
Visualized outliers using boxplots and handled them.
π What I Learned
Data cleaning
Handling null values
Encoding categorical variables
Feature scaling (normalization/standardization)
Outlier detection
β Interview Questions
-
What are the different types of missing data?
-
How do you handle categorical variables?
-
What is the difference between normalization and standardization?
-
How do you detect outliers?
-
Why is preprocessing important in ML?
-
What is one-hot encoding vs label encoding?
-
How do you handle data imbalance?
-
Can preprocessing affect model accuracy?
π Submission Guidelines
Created a GitHub repository for this task.
Added code, dataset (if needed), and this README.md file.
π¨βπ» Author - Rakshith N