Do you want to increase revenue and marketing ROI using customer segmentation? If yes, this is the project for you.
In this project, you will apply unsupervised learning techniques to identify segments of the population that form the core customer base for a mail-order sales company in Germany. These segments can then be used to direct marketing campaigns towards audiences that will have the highest expected rate of returns.
The purpose of this project is to share with fellow sales, marketing professionals and data scientists on how to approach customer segmentation to answer questions as:
- How many consumer segments are in the whole market population?
- Which ones of these consumer segments represent our customer base?
From a data science perspective it means:
- How to use principal component analysis to reduce the dimensionality of consumer characteristics?
- How to use non-hierarchical clustering to decide to form and decide the number of consumer segments?
- How to profile those consumer segments on the existing customer base to improve targetting?
The main findings of the code can be found at the notebook available here.
There are several necessary 3rd party libraries beyond the Anaconda distribution of Python which needs to be installed and imported to run code. These are:
- BeautifulSoup for parsing consumer characteristics definitions
- StatsModels to enable optimal sample size calculations for hypothesis testing
- rpy2 to enable R functionalities to impute missing data
- tqdm for displaying progress bar in time-consuming code fractions
There is 1 notebook available here to showcase work related to the above questions. Markdown cells were used to assist in walking through the thought process for individual steps.
There are additional files:
Identify_Customer_Segments.html
HTML verion of notebookterms.pdf
terms to use provided data
Must give credit to AZ Direct GmbH's, Arvato Finance Solution and Udacity for the data. Data files are not available in the repository as there are subject to terms