Berkeley Haas - Professional Certificate in Machine Learning and Artificial Intelligence - Module 5 - Practical Application 1
- Background
- Data Cleansing
- Explorary Findings
- Bar Coupon Analysis
- Independent Investigation
- Next Steps
This is a practical application assignment that focuses on using data analysis skills. This project aims to analyze the data collected via a survey on Amazon Mechanical Turk.
- Data cleansing and feature engineering
- Visualize the data using different libraries and derive at the recommendations for which coupons to use.
Explore Coupon Acceptance Factors by Customers
In-Vehicle Coupon Recommendation
-
Step one
Check for the number of null entries for each of the columns.
"car" column has lot of empty values. Car column has lot of null values (12576 out of 12684 have null values). Given the very large percent of entries NULL, just dropping this column entirely.
Comparatively Bar, Coffee House, CarryAway, RestaurantLessThan20 and Restaurant20To50 have few null columns. Filled in those few values with the most frequenly occuring value for each of these features.
-
Step two
Check for the unique values for each of the columns. It is interesting to see observe that column 'toCoupon_GEQ5min' has only 1 value. i.e, No variance and hence this feature does not impact the output column or in other words, there is no way to find if this column has any impact on the outcome / output.
Hence dropping 'toCoupon_GEQ5min' column as well.
- Only 56.8% of the customers used the coupons
- More coupons get used during the warmer days than the colder days
- Only 41% of bar coupons were used by customers
- Customers who went to bars more than 3 or 4 times in month used the coupons 76.9% of the time, where was customers who went lesser number of times used the coupons only 37.1% of the time.
- Customers who went to bars more than once a month and are over age 25 used the coupons ~68% of the time, where was those went less than once and are below 25 years old used coupons 42% of the time.
- Coupons for Bars have a higher usage rate irrespective of age
- Coupons for Bars have a higher usage rate irrespective of being a widow.
- Coupons for Bars hava a Higher usage rate compared to Coupons for Cheap Restaurants used by customers with less than 50K salary
- Coupon Types and their Usage
- 1-Day vs 2-Hour Coupons Usage
- Coupon Usage by Gender
- Coupon Usage by Age
- Coupon Usage by Income
- Feature Engineering
- Find the correlations between features and look at ways to reduce the number of features.
- Text tranformations: Convert textual data into numeric format
- Building Baseline Models for classification - Build models using Linear Regression, Decision Trees, K-Nearest Neighbors and SVMs for classification.