Predict whether or not loans acquired by Fannie Mae will go into foreclosure. Fannie Mae acquires loans from other lenders as a way of inducing them to lend more. Fannie Mae releases data on the loans it has acquired and their performance afterwards here.
- Clone this repo to your computer.
- Get into the folder using
cd loan-prediction. - Run
mkdir data. - Switch into the
datadirectory usingcd data. - Download the data files from Fannie Mae into the
datadirectory.- You can find the data here.
- You'll need to register with Fannie Mae to download the data.
- It's recommended to download all the data from 2012 Q1 to present.
- Extract all of the
.zipfiles you downloaded.- On OSX, you can run
find ./ -name \*.zip -exec unzip {} \;. - At the end, you should have a bunch of text files called
Acquisition_YQX.txt, andPerformance_YQX.txt, whereYis a year, andXis a number from1to4.
- On OSX, you can run
- Remove all the zip files by running
rm *.zip. - Switch back into the
loan-predictiondirectory usingcd ...
- Install the requirements using
pip install -r requirements.txt.- Make sure you use Python 3.
- You may want to use a virtual environment for this.
- Run
mkdir processedto create a directory for our processed datasets. - Run
python assemble.pyto combine theAcquisitionandPerformancedatasets.- This will create
Acquisition.txtandPerformance.txtin theprocessedfolder.
- This will create
- Run
python annotate.py.- This will create training data from
Acquisition.txtandPerformance.txt. - It will add a file called
train.csvto theprocessedfolder.
- This will create training data from
- Run
python predict.py.- This will run cross validation across the training set, and print the accuracy score.
If you want to extend this work, here are a few places to start:
- Generate more features in
annotate.py. - Switch algorithms in
predict.py. - Add in a way to make predictions on future data.
- Try seeing if you can predict if a bank should have issued the loan.
- Remove any columns from
trainthat the bank wouldn't have known at the time of issuing the loan.- Some columns are known when Fannie Mae bought the loan, but not before
- Make predictions.
- Remove any columns from
- Explore seeing if you can predict columns other than
foreclosure_status.- Can you predict how much the property will be worth at sale time?
- Explore the nuances between performance updates.
- Can you predict how many times the borrower will be late on payments?
- Can you map out the typical loan lifecycle?