How can the bank segment its customers based on their demographics, financial behaviour, and marketing interactions to improve targeted marketing strategies?
Use clustering techniques (e.g., K-means, Hierarchical clustering) to identify distinct customer segments and tailor marketing campaigns to each segment to maximize engagement and conversion rates.
The variables selected for analysis were specifically those that were present in both datasets and were deemed to hold significant relevance based on prior research
- Age (numeric)
- Job: type of job (categorical: "admin.", "unknown", "unemployed", "management", "housemaid", "entrepreneur", "student", "blue-collar", "self-employed", "retired", "technician", "Services")
- Marital: marital status (categorical: "married", "divorced", "single"; note: "divorced" means divorced or widowed)
- Education (categorical: "unknown", "secondary", "primary", "tertiary")
- Default: Is credit in default? (binary: "yes", "no")
- Balance: average yearly balance, in euros (numeric)
- Housing: has a housing loan? (binary: "yes", "no")
- Loan: has a personal loan? (binary: "yes", "no") related to the last contact of the current campaign:
- Contact: contact communication type (categorical: "unknown", "telephone", "cellular")
- Month: last contact month of the year (categorical: "Jan", "Feb", "Mar", ..., "Nov", "Dec")
- Duration: last contact duration, in seconds (numeric) other attributes:
- Campaign: number of contacts performed during this campaign and for this client (numeric, includes the last contact)
- Pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means the client was not previously contacted)
- Previous: number of contacts performed before this campaign and for this client (numeric) Poutcome: outcome of the previous marketing campaign (categorical: "unknown", "other", "failure", "success")
The data sets were sourced from Kaggle, a well-known provider of diverse datasets. The two selected data sets were carefully chosen for their similarities in structure, maintaining consistent formatting methods and being comparable in size, each containing a significant number of records.