This project addresses the problem of issuing a credit loan for the purpose of purchasing a car. The dataset was openly presented on a competition the author has opportunity to participate at by one of the Belarusian banks (more precisely, BNB-bank, or Belaruski Narodny Bank). This bank was specialized on medium-small business operations, industrial and personal loans, and was considered one of the most persistent links between the Republics of Belarus and Georgia. The main personal credit directions of BNB bank were car loans (9 credit lines) and mortgages (4 credit lines).
The competition was called Imaguru Datathon 2019, and was held in Minsk, Belarus by Imaguru Startup Hub. After the dataset is revealed, the teams have 3 days to complete their prototype and present for the jury. Note that it is somewhat a hobby contest, not so much about prizes but more about networking and happy data science. The dataset contains the payment history for several months, which is sometimes incomplete and would be described in one of the next sections. Ideas studied in this project are in part related to what was done at the time of the competition, as well as to the author’s industrial experience.
The goal is to understand the client behavior and repayment policies based on their internal features. Although credit scoring is working, clients behave differently. The natural question is then how to use this variability in client behavior to make the bank better? Note that financial atmosphere and economic ecology were very risky in Belarus, which adds additional challenges to the problem.
The project is a POC (proof-of-concept) and is therefore implemented in Jupyter Notebook. To reproduce the results, it is important to run the notebooks exactly in the following sequence:
Step 1. Notebook 1. initial-study.ipynb contains data exploration.
Step 2. Notebook 2. data-engineering.ipynb implements feature engineering.
Step 3. Notebook 3. fit-clustering.ipynb fits clustering algorithms.
Step 4. Notebook 4. regress-next-months.ipynb fits next-period repayment prediction.
Step 5. Notebook 5. fill-nas-in-series.ipynb fills NAs in time series for further clustering.
Step 6. Notebook 6. predict-clustering.ipynb clusters all time series.
Step 7. Notebook 7. client-reliability.ipynb implements study of client reliability.
Project report and corresponding presentation are available in the docs
subdirectory of the repository root.