MLP Assignments

This repository contains Machine Learning and Data Analysis assignments with comprehensive dataset analysis and preprocessing.

Project Structure

Week 1: Dataset Analysis & Querying

Dataset: week1/Week1_GA_dataset.csv (10,000 × 12 real estate data)
Notebooks:
- Week1.1_Dataset_Analysis.ipynb: Data cleaning, missing value analysis, row/column queries
- Week1.2_Dataset_Analysis.ipynb: Indexing, slicing, conditional filtering operations

Key Results Week 1.1:

Unknown values: 5,548 total (1,823 literal "?" + 3,725 original NaN)
Missing value analysis and data cleaning operations
Row filtering based on missing value thresholds

Key Results Week 1.2:

Even/odd row and column extraction
Year-based filtering and conditional queries
August property counts and locality-based price analysis

Week 2: Machine Learning Preprocessing

Dataset: week2/GA_2_dataset.csv (10,000 × 13 gaming engagement data)
Notebook: Week2_Dataset_Analysis.ipynb

Preprocessing Pipeline:

Data Type Analysis: Identified object columns (Gender, Location, GameGenre, GameDifficulty, EngagementLevel)
Missing Value Handling: 3,337 total null values across Age, Location, InGamePurchases, GameDifficulty
Imputation Strategy:
- Age: Mean imputation from training data
- Location: NaN → "Other"
- GameDifficulty: Mode imputation
- InGamePurchases: NaN → 0
Feature Engineering:
- Ordinal encoding: GameDifficulty (Easy=0, Medium=1, Hard=2)
- One-hot encoding: Gender, Location, GameGenre (drop_first=True)
- StandardScaler: All numerical features
Train-Test Split: 80-20 split, random_state=42

Final Answers:

Q1: Object columns: Gender, Location, GameGenre (from specified list)
Q2: Males from Europe with purchases: 299
Q3: Under-18 players with >10h playtime: 453
Q4: Total null values: 3,337
Q5: Least frequent target class: High (1,996 samples)
Q6: Sum of transformed Age (test): 16.50 (standardized) | 63,585.24 (imputed raw)
Q7: Sum of first 5 transformed rows: -7.17 (full scaling) | 6.84 (numeric-only scaling)

Technical Implementation

Dependencies: pandas, numpy, scikit-learn
Preprocessing Approach: Complete-case statistics → Imputation → Encoding → Scaling
Validation: Multiple verification cells ensure reproducibility and cross-check different interpretations

Methodology Notes

The notebooks include comprehensive verification sections that explore different interpretations of "transformed" data:

Raw vs. standardized feature values
Numeric-only vs. full-feature scaling approaches
Complete-case vs. all-data statistics for imputation

All results are reproducible with the provided random seeds and preprocessing steps.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
KA-3		KA-3
week1		week1
week10		week10
week2		week2
week3		week3
week5		week5
week6		week6
week7		week7
week9		week9
.DS_Store		.DS_Store
24f1000011-notebook-ka1 (1).ipynb		24f1000011-notebook-ka1 (1).ipynb
Kaggle_Assignment_2.ipynb		Kaggle_Assignment_2.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MLP Assignments

Project Structure

Week 1: Dataset Analysis & Querying

Week 2: Machine Learning Preprocessing

Technical Implementation

Methodology Notes

About

Uh oh!

Releases

Packages

Languages

sathish-k7/MLP_Assignments

Folders and files

Latest commit

History

Repository files navigation

MLP Assignments

Project Structure

Week 1: Dataset Analysis & Querying

Week 2: Machine Learning Preprocessing

Technical Implementation

Methodology Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages