Hi ๐, I'm Mrityunjay Pathak
I'm a Data Scientist with a knack for uncovering patterns and trends that drive smarter decisions.
๐ฏ Tools and Technologies
โข Programming Language : I'm familiar with Python, a powerful language for data science and machine learning.
โข Libraries : I'm also familiar with essential data science libraries like NumPy, Pandas, Matplotlib, Seaborn and Plotly.
โข Machine Learning : I have experience with Scikit-learn, a famous machine learning library used widely across industries.
โข Database : I can work with MySQL, a popular database management system to handle and retrieve data effectively.
โข BI Tool : I'm familiar with Power BI to perform data analysis, create dynamic dashboards and extract meaningful insights.
โข Web Framework : I have experience with FastAPI, a high-performance web framework for building APIs with Python.
โข Containerization : I can work with Docker for packaging application and their dependencies into containers.
โข Version Control : I'm familiar with Git, which helps in keeping track of changes in code and collaborating effectively with a team.
๐ซ Connect with Me
Kaggleย ย |ย ย LinkedInย ย |ย ย GitHubย ย |ย ย Mediumย ย |ย ย Portfolio
โ Problem
- In the used car market, buyers and sellers often struggle to determine a fair price for their vehicle.
- This project aims to provide accurate and transparent pricing for used cars by analyzing real-world data.
- It will assist both buyers and sellers make data-driven decisions and ensure fair transactions.
โ Solution
To address this problem, I built and deployed a complete end-to-end machine learning pipeline :
- Data Collection
- Scraped a dataset of 2,800+ used cars from Cars24 using Selenium and BeautifulSoup.
- Data Optimization
- Optimized memory consumption of dataset by downcasting data types.
- Stored the dataset in Parquet format, which compresses data without losing information.
- It also provides much faster read/write speeds compared to CSV.
- Preprocessing & Modeling
- Implemented Scikit-learn Pipelines & ColumnTransformer to prevent data leakage.
- API Deployment
- Deployed the machine learning model as an API using FastAPI, with :
- /predict endpoint for real-time predictions.
- /health endpoint for monitoring API status.
- Input validation & rate limiting for reliability.
- Deployed the machine learning model as an API using FastAPI, with :
- Frontend Integration
- Designed a HTML/CSS/JS website to send API calls and display predictions in a user-friendly way.
- Containerization
- Created a multi-stage Dockerfile with .dockerignore for building an optimized and lightweight Docker image.
โ ๐๐บ๐ฝ๐ฎ๐ฐ๐
- Built and deployed a complete machine learning pipeline as a FastAPI application.
- Reduced dataset memory usage by 90% through data type optimization and Parquet conversion.
- Delivered 30% lower MAE and 12% higher R2-Score compared to the baseline model.
- Improved model stability by 70%, ensuring more consistent and reliable predictions.
โ Problem
- With the rise of streaming services, viewers now have access to thousands of movies across platforms.
- As a result, many viewers spend more time browsing than actually watching.
- This problem can lead to frustration, lower satisfaction and less time spent on the platform.
- Which can impact both the user experience and business performance.
โ Solution
- A content-based movie recommender system built with clean and modular code with proper version control.
- It analyzes metadata of 5000+ movies to recommend top 5 similar titles based on a user selected input.
- The system uses techniques like CountVectorizer and CosineSimilarity to recommend similar movies.
- The project not only focuses on functionality but on building a clean and scalable solution.
โ Impact
If this system gets scaled and integrated with a streaming service, this could :
- Reduce the time users spend choosing what to watch.
- Increase user engagement, watch time and customer satisfaction.
- Help streaming platforms retain users by offering better personalized content.
โ Objective
- To analyze Netflix content data, uncovering valuable insights into how the platform evolves over time.
โ ๐ฆ๐ผ๐บ๐ฒ ๐๐ฒ๐ ๐๐ถ๐ป๐ฑ๐ถ๐ป๐ด๐
Cleaned and analyzed dataset of 8000+ Netflix Movies and TV Shows.
- More than 60% of content on Netflix is rated for mature audiences.
- Suggests that Netflix targets adult viewers to boost engagement and retention.
- More than 25% of Movies and TV Shows are released on 1st day of the month.
- Shows a consistent release schedule, likely to align with subscription cycles.
- More than 40% of the content on Netflix is exclusive to United States.
- Shows a strong focus on the U.S. market and content availability by location.
- More than 20% of the content on Netflix falls under the "Drama" genre.
- Confirms that "Drama" is a key part of Netflix's content library.
- More than 23% of the content on Netflix was released in 2019 alone.
- Indicates a major content push that year, possibly tied to growth or user acquisition goals.
โ Objective
- To analyze Supermarket Sales data, identifying key factors for improving profitability and operational efficiency.
โ ๐ฆ๐ผ๐บ๐ฒ ๐๐ฒ๐ ๐๐ถ๐ป๐ฑ๐ถ๐ป๐ด๐
Analyzed purchasing pattern of 9000+ customers of Supermarket.
- More than 15% of the products sold were Snacks.
- Shows that Snacks are a convenient choice and a big source of revenue.
- More than 32% of the sales were occurred in West region of Supermarket.
- Suggests that West region is a strong performing area as compared to others.
- Health and Soft drinks are the most profitable category in Beverages.
- Shows that both type of drinks option sells well.
- November was the most profitable month contributing about 15% of the total annual profits.
- Makes it an ideal time for running promotions and special offers.