NOTE: the employee data is not real, it was generated and from this dataset from Kaggle
Will they stay or will they go? Predicting whether a fake dataset of employees will leave in the next 6 months.
I haven't done any HR analytics before and the idea of encorporating ML/DL with this domain interests me. The main problem is acquiring HR data. Thankfully IBM has produced a fake dataset that has been used throughout this project. The other one is given the problem ("Who will leave in the next 6 months?") there isn't a high percentage of people who have left historically in the last 6 months which leads to an imbalanced class problem. How can this potentially be solved? Generate more data.
What would happen if key team members left? Progress could stall or worst case regress. If these individuals can be identified early and not just identified but understand WHY they might leave, that is the power of this.
Note: There is a lot of discussion about ML, DL and AI replacing jobs, this doesn't do so. I see the previously mentioned techniques allowing users to increase their productivity and in this particular use case, potentially saving employees leaving and improving the employees' work environment. If you had 3000 employees you could identify who might leave manually but not in a quick enough time nor efficiently. If you combine the knowledge of users and the model that is where this is powerful.
Metric | w/o generated data | w/ generated data |
---|---|---|
AUC | 0.85 | 0.95 |
Precision | 0.73 | 0.92 |
Recall | 0.42 | 0.84 |
- 🕴️🕴️ HR - the obvious one. If employees respond to a survey and you have thousands of employees, without analysis you cannot efficiently find out who might be at risk of leaving. Secondly you will want to know employee's specific reasons as to why they might be at risk of leaving, it needs to be personal. Having a model that can tell you who might be at risk to leave the company and why they might leave saves HR time and offers a possible "save" of the employees at risk of leaving.
- 💁Employees - employees at risk of leaving the company will likely have a reason to leave e.g. working overtime often. Some may be on the fence about leaving and if the model can capture it early through using the inputs such as working overtime often this can be resolved. If the predictions can then be interpretted (using SHAP values) then HR can use the reasons to approach the employee and discuss these pain points that weren't necessarily obvious before. The employee will hopefully stay and have a better work environment and their talent retained.
- 🏭Company - there may be key members within a team that bring substantial value to their teams. If they were to leave then the company will potentially lose their value and may slow or lose progress from the employee's value and skills they bring. If the model captures them early, HR realise it and approach them with a personal discussion, they can potential "save" this employee from leaving and retain the value and skills they bring.
The final model was used as an experiment using streamlit
to create a user based app. The app is split into two sections...