Grocery Stores are a vital part of everyday life, providing us with the food and essentials as we need. Many people utilizes grocery delivery applications to order their products making it easy to shop from home.
Each transaction made through these applications is recorded in detail creating a valuable dataset. This project looks at data from these transactions to understand how well these stores are performing.
The dataset is sourced from Kaggle which simulates grocery sales activities within Tamil Nadu state of India.
The dataset includes various columns that provide detailed information about each transaction at the Supermarket.
Link to the Dataset : Supermarket Sales Dataset
-
To gain insights into Supermarket Sales Performance understanding the patterns and trends in customer behavior, product categories and regional sales.
-
This Exploratory Data Analysis (EDA) aims to address the following key questions :
-
Customer Behavior Analysis : What are the purchasing patterns of customers based on different categories and sub-categories? How does customer spending vary across cities and states?
-
Sales Trends : Are there observable trends in sales over time? How do sales figures fluctuate across different months or seasons?
-
Discount Impact : What is the relationship between discounts and sales? How do discounts influence the profit margins across different categories and regions?
-
Profit Analysis : What are the profit margins associated with various product categories and sub-categories? How do these margins vary by city and state?
-
Regional Performance : How do sales and profit performance differ across different regions and states? Are there specific regions that contribute more significantly to overall sales and profits?
-
Category Insights : What are the most and least popular product categories and sub-categories? How does the popularity of these categories vary by location and over time?
-
-
This analysis will provide a deeper understanding of supermarket sales dynamics revealing trends and patterns that can inform inventory management, promotional strategies and regional marketing efforts.
- Setting up the Enviroment
- Libraries required for the Project
- Getting started with Repository
- Steps involved in the Project
- Conclusion
Jupyter Notebook is required for this project and you can install and set it up in the terminal.
- Install the Notebook
pip install notebook
- Run the Notebook
jupyter notebook
Pandas
- Go to the terminal and run this code
pip install pandas
- Go to Jupyter Notebook and run this code from a cell
!pip install pandas
Matplotlib
- Go to the terminal and run this code
pip install matplotlib
- Go to Jupyter Notebook and run this code from a cell
!pip install matplotlib
Seaborn
- Go to the terminal and run this code
pip install seaborn
- Go to Jupyter Notebook and run this code from a cell
!pip install seaborn
- Clone this repository to your local machine by using the following command :
git clone https://github.com/TheMrityunjayPathak/Supermarket-Sales-Analysis.git
Importing Libraries
- Importing pandas, matplotlib and seaborn libraries
Reading CSV File
- Reading csv file by using pd.read_csv() function
Overview of the Dataset
-
Information about shape and size of the dataset
-
Columns present in the dataset
-
Info about the dataset
Handling Null values in the Dataset
- This dataset does not contain any null values
Unique values in Each Categorical Column
-
Unique values in customer name column
-
Unique values in category column
-
Unique values in sub category column
-
Unique values in city column
-
Unique values in region column
Changing DataType of Columns
- Modifying the datatype of order date column to pandas datetime format
Utilizing existing information to create new Columns
-
Extracting year, month and dates from order date column
-
Extracting discount amount from discount percent by using mathematical formulas
Statistical Analysis
-
No. of products sold in each category
-
No. of products sold in each sub category
-
No. of products sold in each city
-
No. of products sold in each region
-
No. of products sold each year, month and date etc.
Data Visualization
- No. of products sold in each category
- No. of products sold in each sub category
- No. of products sold in each city
- No. of products sold in each region
- No. of products sold each year
- No. of products sold each month
- No. of products sold each date
- Total sales in each category
- Total sales in each sub category
- Total sales in each region
- Total sales in each city
- Total sales in each month
- Total sales in each year
- Total profit in each category
- Total profit in each sub category
- Total profit in each region
- Total profit in each city
- Total profit in each month
- Total profit in each year
- Customers with highest amount of total sales
- Customers with highest profit on their purchase
- Total discount availed by customers
-
The Exploratory Data Analysis (EDA) of the Supermarket Sales Dataset has provided a comprehensive understanding of the sales dynamics, customer behaviors and regional performance of the supermarket chain.
-
This analysis has provided a detailed understanding of various factors influencing supermarket performance.