Customer Segmentation using KMeans Clustering

📌 Project Overview

This project applies KMeans clustering to segment customers based on their purchasing behavior using an online retail dataset. The goal is to identify actionable customer groups for better marketing and retention strategies.

📂 Dataset

The dataset contains transactional data from an online retail store, including:

InvoiceNo: Unique transaction ID
StockCode: Product code
Description: Product description
Quantity: Number of items purchased
InvoiceDate: Date of transaction
UnitPrice: Price per item
CustomerID: Unique identifier for customers
Country: Customer's location

The dataset is available at: Online Retail II Dataset

🚀 Methodology

1️⃣ Data Preprocessing

Removed missing values and invalid transactions
Filtered out refunds (negative quantities)
Computed Monetary Value, Frequency, and Recency features

2️⃣ Feature Engineering

Recency: Days since last purchase
Frequency: Number of transactions
Monetary Value: Total spending
Applied StandardScaler for normalization

3️⃣ Clustering using KMeans

Determined the optimal number of clusters using Elbow Method (Inertia) & Silhouette Score
Applied KMeans clustering to segment customers

4️⃣ Cluster Analysis & Interpretation

Used 3D scatter plots for visualization
Identified three customer segments:
- Retain: High-value, loyal customers
- Re-Engage: Low-value, infrequent buyers
- Nurture: Recent buyers with lower engagement

📊 Visualizations

Elbow Method Plot to determine optimal clusters
3D Scatter Plot for customer segmentation
Violin Plots to compare cluster distributions

🛠️ Technologies Used

Python
Pandas, NumPy
Scikit-learn (KMeans, StandardScaler)
Matplotlib, Seaborn

📈 Results & Insights

High-value customers contribute the most to revenue → Priority for retention.
Infrequent buyers need marketing efforts to increase engagement.
Recent buyers have potential to be nurtured into loyal customers.

📌 How to Run the Project

Prerequisites

Ensure you have Python installed and the required libraries:

pip install pandas numpy scikit-learn matplotlib seaborn

Running the Script

python customer_segmentation.py

🔗 Future Improvements

Explore Hierarchical Clustering and DBSCAN for comparison
Implement Customer Lifetime Value (CLV) predictions
Apply Deep Learning for more advanced segmentation

📜 License

This project is open-source under the MIT License. Feel free to contribute!

🤝 Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you would like to change.

📬 Contact

For any queries, reach out via GitHub Issues.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
README.md		README.md
customer_segmentation.ipynb		customer_segmentation.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Customer Segmentation using KMeans Clustering

📌 Project Overview

📂 Dataset

🚀 Methodology

1️⃣ Data Preprocessing

2️⃣ Feature Engineering

3️⃣ Clustering using KMeans

4️⃣ Cluster Analysis & Interpretation

📊 Visualizations

🛠️ Technologies Used

📈 Results & Insights

📌 How to Run the Project

Prerequisites

Running the Script

🔗 Future Improvements

📜 License

🤝 Contributing

📬 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

UtkarshMidha/Customer_Segmentation_using_KMeans_Clustering

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation using KMeans Clustering

📌 Project Overview

📂 Dataset

🚀 Methodology

1️⃣ Data Preprocessing

2️⃣ Feature Engineering

3️⃣ Clustering using KMeans

4️⃣ Cluster Analysis & Interpretation

📊 Visualizations

🛠️ Technologies Used

📈 Results & Insights

📌 How to Run the Project

Prerequisites

Running the Script

🔗 Future Improvements

📜 License

🤝 Contributing

📬 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages