Contributing Members:
- Sohyun Lee
- Shin Ehara
- Jou-Ying Lee
Deep learning architectures are now publicly recognized and repeatedly proven to be powerful in a wide range of high-level prediction tasks. While these algorithms’ modeling generally have beyond satisfactory performances with apposite tuning, the long-troubling issue of this specific learning lies in the un-explainability of model learning and predicting. This interpretability of “how” machines learn is often times even more important than ensuring machines outputting “correct” predictions. Especially in the field of finance, users’ ability to dissect how and why an algorithm reached a conclusion from a business standpoint is integral for later applications of i.e., to be incorporated for business decision making, etc. This project studies similar prior work done on image recognition in the financial market and takes a step further on explaining predictions outputted by the Convolutional Neural Network by applying the Grad-CAM algorithm.
Project Website at: Website
Project Report at: Report
-
This project aims to apply the Grad-CAM technique to a CNN model trained on images that represent closing prices during the first hour of market exchange.
-
To engineer data and create a CNN model, you would need to run each notebook in
notebooksfolder in the following order:- 1. Run every cell in
Data Processing.ipynb- This notebooke is preprocessing the raw data by extracting closing prices during first hour after market open and labeling depends on prices increasing or decreasing
- Input:
raw_NIFTY100.csv - output:
first_combined.csvcontains closing prices during the first hour of market exchange
- 2. Run every cell in
Image Conversion.ipynb- This notebook is for an image conversion with
first_combined.csvdata. We will converse data into image with Gramian Angular Algorithm. - Input:
first_combined.csv - output
.pngimages inimgsfolder
- This notebook is for an image conversion with
- 3. Run every cell in
CNN.ipynb- This notebook uses FastAI, a PyTorch-based deep learning library, to build the neural network, which is able to figure out the relationship between input features and find hidden relationship with them. The input data is an image dataset with labels, which is converted from time series with Gramian Angular Field algorithm as described in the previous sections.
- 1. Run every cell in
-
To run Grad-CAM:
- Clone the Grad-CAM submodule we have included in repo homepage.
- Navigate to StockMarket_explainableAI/test and put test_imgs folder inside this cloned submodule folder.
- Set your directory to be in this submodule, and run the following command (feel free to modify the last part in the code for specific images):
- python3 main.py demo1 -a resnet34 -t layer4 -i test_imgs/2017-01-03.png -k 1
-
config
This folder contains json files for main and testing parametersdata_params.json
contains parameters for running main on all datatest_params.json
contains parameters for running main on test data
-
data
This folder contains all stock data from time series to image representation
imgs- This folder contains all images converted from time series. ex) 2017-01-02.png
raw data
raw_NIFTY100.csv
contains raw stoack market data; time series data
processed data
first_combined.csv
contains closing prices during the first hour of market exchangegramian_df.csv
contains data after implementing gramian angular algorithmlabel_dir_2.csv
contains data with label Whether the price goes up or down that day
-
gradcam_submodule @ fd10ff7
This folder is the submodule for gradcam -
notebooks
This folder is the notebook directoryCNN + Grad-CAM.ipynb
is the development notebook for CNN and GradCam implementationData Processing.ipynb
is the notebook that wraps together data cleaning to feature engineeringEDA.ipynb
is the notebook with eda work demonstrationImage Conversion.ipynb
is the notebook with image conversion work done
-
references
This folder contains additional information/references in regards to our projectreport_img
- This folder contains images extracted from coded notebooks and included in the written report
-
src
This folder contains library codes extracted from notebooksfeatures
build_features.py
scripts to build features from merged databuild_labels.py
scripts to create labels for image classificationbuild_images.py
scripts to convert and save time series data to images
model
gradcam.py
scripts to implement gradcam
-
test
This folder contains test results and test images -
Dockerfile
This is the dockerfile necessary to build the environment for this project development -
run.py
This is the main python file to execute our program