Skip to content
View TheMrityunjayPathak's full-sized avatar
  • Mumbai, Maharashtra, India

Block or report TheMrityunjayPathak

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse

About ย ย ย  Skills ย ย ย  Projects ย ย ย  Certificates ย ย ย  Blogs
ย 

About

Hi ๐Ÿ‘‹, I'm Mrityunjay Pathak

I'm a Data Scientist with a knack for uncovering patterns and trends that drive smarter decisions.

๐ŸŽฏ Tools and Technologies

โ€ข Programming Language : I'm familiar with Python, a powerful language for data science and machine learning.

โ€ข Libraries : I'm also familiar with essential data science libraries like NumPy, Pandas, Matplotlib, Seaborn and Plotly.

โ€ข Machine Learning : I have experience with Scikit-learn, a famous machine learning library used widely across industries.

โ€ข Database : I can work with MySQL, a popular database management system to handle and retrieve data effectively.

โ€ข BI Tool : I'm familiar with Power BI to perform data analysis, create dynamic dashboards and extract meaningful insights.

โ€ข Web Framework : I have experience with FastAPI, a high-performance web framework for building APIs with Python.

โ€ข Containerization : I can work with Docker for packaging application and their dependencies into containers.

โ€ข Version Control : I'm familiar with Git, which helps in keeping track of changes in code and collaborating effectively with a team.

๐Ÿ“ซ Connect with Me

Kaggleย ย |ย ย LinkedInย ย |ย ย GitHubย ย |ย ย Mediumย ย |ย ย Portfolio

Skills



Projects

AutoIQ : Car Price Prediction

ย  ย 

โž” Problem

  • In the used car market, buyers and sellers often struggle to determine a fair price for their vehicle.
  • This project aims to provide accurate and transparent pricing for used cars by analyzing real-world data.
  • It will assist both buyers and sellers make data-driven decisions and ensure fair transactions.

โž” Solution

To address this problem, I built and deployed a complete end-to-end machine learning pipeline :

  • Data Collection
    • Scraped a dataset of 2,800+ used cars from Cars24 using Selenium and BeautifulSoup.
  • Data Optimization
    • Optimized memory consumption of dataset by downcasting data types.
    • Stored the dataset in Parquet format, which compresses data without losing information.
    • It also provides much faster read/write speeds compared to CSV.
  • Preprocessing & Modeling
    • Implemented Scikit-learn Pipelines & ColumnTransformer to prevent data leakage.
  • API Deployment
    • Deployed the machine learning model as an API using FastAPI, with :
      • /predict endpoint for real-time predictions.
      • /health endpoint for monitoring API status.
      • Input validation & rate limiting for reliability.
  • Frontend Integration
    • Designed a HTML/CSS/JS website to send API calls and display predictions in a user-friendly way.
  • Containerization
    • Created a multi-stage Dockerfile with .dockerignore for building an optimized and lightweight Docker image.

โž” ๐—œ๐—บ๐—ฝ๐—ฎ๐—ฐ๐˜

  • Built and deployed a complete machine learning pipeline as a FastAPI application.
  • Reduced dataset memory usage by 90% through data type optimization and Parquet conversion.
  • Delivered 30% lower MAE and 12% higher R2-Score compared to the baseline model.
  • Improved model stability by 70%, ensuring more consistent and reliable predictions.

Pickify : Movie Recommender System

ย 

โž” Problem

  • With the rise of streaming services, viewers now have access to thousands of movies across platforms.
  • As a result, many viewers spend more time browsing than actually watching.
  • This problem can lead to frustration, lower satisfaction and less time spent on the platform.
  • Which can impact both the user experience and business performance.

โž” Solution

  • A content-based movie recommender system built with clean and modular code with proper version control.
  • It analyzes metadata of 5000+ movies to recommend top 5 similar titles based on a user selected input.
  • The system uses techniques like CountVectorizer and CosineSimilarity to recommend similar movies.
  • The project not only focuses on functionality but on building a clean and scalable solution.

โž” Impact

If this system gets scaled and integrated with a streaming service, this could :

  • Reduce the time users spend choosing what to watch.
  • Increase user engagement, watch time and customer satisfaction.
  • Help streaming platforms retain users by offering better personalized content.

Netflix Data Analysis

ย 

โž” Objective

  • To analyze Netflix content data, uncovering valuable insights into how the platform evolves over time.

โž” ๐—ฆ๐—ผ๐—บ๐—ฒ ๐—ž๐—ฒ๐˜† ๐—™๐—ถ๐—ป๐—ฑ๐—ถ๐—ป๐—ด๐˜€

Cleaned and analyzed dataset of 8000+ Netflix Movies and TV Shows.

  • More than 60% of content on Netflix is rated for mature audiences.
    • Suggests that Netflix targets adult viewers to boost engagement and retention.
  • More than 25% of Movies and TV Shows are released on 1st day of the month.
    • Shows a consistent release schedule, likely to align with subscription cycles.
  • More than 40% of the content on Netflix is exclusive to United States.
    • Shows a strong focus on the U.S. market and content availability by location.
  • More than 20% of the content on Netflix falls under the "Drama" genre.
    • Confirms that "Drama" is a key part of Netflix's content library.
  • More than 23% of the content on Netflix was released in 2019 alone.
    • Indicates a major content push that year, possibly tied to growth or user acquisition goals.

Supermarket Sales Analysis

ย 

โž” Objective

  • To analyze Supermarket Sales data, identifying key factors for improving profitability and operational efficiency.

โž” ๐—ฆ๐—ผ๐—บ๐—ฒ ๐—ž๐—ฒ๐˜† ๐—™๐—ถ๐—ป๐—ฑ๐—ถ๐—ป๐—ด๐˜€

Analyzed purchasing pattern of 9000+ customers of Supermarket.

  • More than 15% of the products sold were Snacks.
    • Shows that Snacks are a convenient choice and a big source of revenue.
  • More than 32% of the sales were occurred in West region of Supermarket.
    • Suggests that West region is a strong performing area as compared to others.
  • Health and Soft drinks are the most profitable category in Beverages.
    • Shows that both type of drinks option sells well.
  • November was the most profitable month contributing about 15% of the total annual profits.
    • Makes it an ideal time for running promotions and special offers.

Certificates

ย ย 

Blogs

ย ย 

Pinned Loading

  1. AutoIQ AutoIQ Public

    AutoIQ by Motor.co

    Jupyter Notebook

  2. Pickify Pickify Public

    Smart movie picks, based on what you love!

    Jupyter Notebook

  3. Netflix-Data-Analysis Netflix-Data-Analysis Public

    Netflix Data Analysis

    Jupyter Notebook

  4. Supermarket-Sales-Analysis Supermarket-Sales-Analysis Public

    Supermarket Sales Analysis

    Jupyter Notebook