This project applies Recency, Frequency, and Monetary (RFM) analysis to segment e-commerce customers based on their purchasing behavior, using SQL as the primary analytical tool. The analysis is built on a real-world UK-based e-commerce transactions dataset of 500K+ records, imported into MySQL, cleaned and engineered into meaningful customer metrics. Each customer is scored and categorized into segments such as Champions, Loyal Customers, At-Risk Big Spenders and enabling data-driven targeting strategies. The final segmented output is visualized through an interactive Power BI dashboard, providing business stakeholders with actionable insights on customer value and retention. This end-to-end project demonstrates practical skills across data cleaning, SQL feature engineering, RFM modeling and business intelligence reporting.
To segment customers for targeted marketing by identifying:
- Champions
- Loyal Customers
- At Risk / Lost Customers
- New Customers
- Big Spenders at Risk.
- Source: Kaggle E-Commerce UK Dataset
- Link : https://www.kaggle.com/datasets/carrie1/ecommerce-data/data
- Records: 500K+ transactions
- Fields: InvoiceNo, Product Description, Quantity, UnitPrice, CustomerID, Country, InvoiceDate.
- Removed null and blank descriptions & customer IDs
- Removed duplicates using composite key
- Handled returns based on
InvoiceNo LIKE 'C%'
- Created table
edata_clean,edata_cancelled; created viewedata_customer_type - Calculated RFM metrics:
- Recency: Days since last purchase
- Frequency: Count of unique invoices
- Monetary: Total spend per customer
- Scores (1–4) assigned for each metric based on thresholds
- Joined into final view
edata_rfm_segmented
Applied conditional logic to tag each customer into:
| Segment | Criteria (R, F, M) |
|---|---|
| Champions | R=4, F=4, M=4 |
| Loyal Customers | R≥4, F≥3, M≥3 |
| Loyal but Slipping | R≤2, F=4, M=4 |
| Big Spenders at Risk | R≤2, F≤2, M≥3 |
| New Customers | R=4, F=1, M=1 |
| Lost | R=1, F=1, M=1 |
| Others | Everything else |
| CustomerID | Recency | Frequency | Monetary | Segment |
|---|---|---|---|---|
| 12748.0 | 3 | 107 | 11773.76 | Champions |
| 14911.0 | 1 | 80 | 49217.04 | Champions |
| 16013.0 | 16 | 25 | 15321.32 | Loyal Customers |
| 17850.0 | 216 | 34 | 5391.21 | Others |
The cleaned SQL output was imported into Power BI to create an interactive business dashboard.
- KPIs: Total Revenue, Total Customers, New vs Returning %
- Monthly Revenue Trends
- Pie Chart: Customer Type Distribution
- Filter by Customer Type
- KPI Cards: Recency, Frequency, Monetary
- Bar Chart: Segment Count
- Line Chart: Revenue per Segment by Month
- Table: Customers per Segment with drillthrough.
- Customer Details by RFM Segment
- Cross-filtering using drillthrough
- Dynamic visuals updated based on the selected customer.
- MySQL Workbench
- CSV Import & Table Creation
- SQL Views & Case Logic
- Exported final table for Power BI dashboard.
- Data Cleaning (NULLs, Duplicates, Returns)
- SQL Feature Engineering
- Customer Segmentation via RFM
- Business Logic Implementation
- Preparation for Power BI Visualization.
- DAX Measures for KPIs
- Drillthrough & Cross-filtering
- Page navigation & conditional formatting
- Data storytelling with business context.
This project is open-source under the MIT License.
Contributions are welcome!
-
Open an issue for bugs or feature requests
-
Submit a pull request for improvements.