Skip to content

Commit

Permalink
added images and updated readme
Browse files Browse the repository at this point in the history
  • Loading branch information
Ignatiocalvin committed Jun 11, 2024
1 parent 5711a52 commit 3333262
Show file tree
Hide file tree
Showing 16 changed files with 68 additions and 24 deletions.
76 changes: 64 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,39 @@
# LLM_StockPredictor
## Overview
The impact of Natural Language Processing (NLP) algorithms in predicting
stock market prices, especially price shocks.

Commodity price shocks are times when the prices for commodities have
drastically increased or decreased over a short time. Typically, the stock market
and economic performance are aligned. Thus, when the stock market is performing well, it is usually a function of a growing economy.

Stock market declines have the potential to diminish wealth across personal
and retirement investment portfolios. Consequently, individuals witnessing a depreciation in their portfolio value are inclined to curtail their expenditures.

With this project, we aim to develop a model that analyzes news headlines
and predicts stock market crashes based on text data. Having such a model may
help us anticipate the stock market movement to better manage our wealth and
prepare for adverse economic events. We chose to work with NLP algorithms and
fine-tune existing pre-trained LLMs, as classical autoregressive models show
poor predictive capacity during price shocks. Using textual data as input such
as news headlines may help the model adjust its predictions to keep up with
drastically changing trends. Aside from that, we used also analyzed the effects of sentiment analysis and moving averages as features used for prediction. The models are evaluated based on the root mean squared error (RMSE) of the predicted stock prices.

## Data

The data is obtained from the Kaggle dataset [Daily News for Stock Market Prediction](https://www.kaggle.com/aaron7sun/stocknews). The dataset contains historical news headlines from Reddit WorldNews Channel and Dow Jones Industrial Average (DJIA) stock prices. The news headlines are from 2008 to 2016, and the stock prices are from 2008 to 2016. Combined_News_DJIA.csv contains the top 25 news headlines and the corresponding stock prices for each day. upload_DJIA_table.csv contains the stock prices for each day.

The project is divided into the following sections:

## Notebooks
### 1. Univariate Model
Filename: `univariate.ipynb`
Description: This notebook focuses on using a Long Short-Term Memory (LSTM) neural network to predict stock prices based on a univariate time series.

### 2. FinGPT Sentiment Retrieval
### 1. FinGPT Sentiment Retrieval
Filename: `retrieve_sentiment.ipynb`
Description: This notebook demonstrates the use of a Large Language Model (LLM), specifically FinGPT, to retrieve and analyze sentiments of financial news articles using the LangChain framework. We asked FinGPT to generate a sentiment score for each news headline in the dataset. The sentiment score ranges from -10 to 10, where -1 indicates a negative sentiment, 0 indicates a neutral sentiment, and 1 indicates a positive sentiment.
Description: This notebook demonstrates the use of a LLM, specifically FinGPT, to retrieve and analyze sentiments of financial news articles using the LangChain framework. We asked FinGPT to generate a sentiment score for each news headline in the dataset. The sentiment score ranges from -10 to 10, where -1 indicates a negative sentiment, 0 indicates a neutral sentiment, and 1 indicates a positive sentiment.

### 2. Univariate Model
Filename: `univariate.ipynb`
Description: This notebook focuses on using a LSTM neural network to predict stock prices based on a univariate time series.

### 3. Predicting Stock Prices with News Headlines without Looking Back at the Data
Filename: `news_rnn.ipynb`
Expand All @@ -28,24 +53,51 @@ Description: Here, aside from using the look back window of 50 days of news head
Filename: `bert_MA.ipynb`
Description: In this notebook, we predict stock prices using an LSTM neural network and financial news headlines which uses BERT embeddings. We use a look back window of 50 days of news headlines and stock prices to predict the next day's stock price in the training data and a 30 day window in the testing data. We also include a moving average of the stock prices of 10 days as an additional feature in the model.

### 7. Predicting Stock Prices with FastText Embeddings
Filename: `lookback_fasttext.ipynb`
Description: This notebook predicts stock prices using an LSTM neural network and financial news headlines which uses FastText embeddings. We use a look back window of 50 days of news headlines and stock prices to predict the next day's stock price in the training data and a 30 day window in the testing data. The FastText embeddings are used as an alternative to BERT embeddings.

### 8. Predicting Stock Prices with XGBoost
Filename: `xgboost.ipynb`
Description: This notebook predicts stock prices using an XGBoost model. The model uses the stock prices and news headlines as features to predict the stock prices. The model is trained on the training data and evaluated on the testing data.





## Results
For our models, we used the root mean squared error (RMSE) as the evaluation metric. The RMSE is a measure of the differences between the predicted values and the actual values. It gives us an idea of how well the model is performing in terms of predicting the stock prices.
For our models, we used the RMSE as the evaluation metric. The RMSE is a measure of the differences between the predicted values and the actual values. It gives us an idea of how well the model is performing in terms of predicting the stock prices.

The table below shows the RMSE values for the training and testing data for each of the models.

<center>

| Notebooks | train RMSE | test RMSE |
| Notebooks | Train RMSE | Test RMSE |
|:---: |:---: |:---: |
| bert_MA | 10.83293233874024 | 816.5899384610334 |
| lookback_news_rnn | 15.457514085649745 | 309.5794180019174 |
| univariate | 31.743497936455558 | 164.36006533558697 |
| xgboost | 0.9986960112179987 | 749.0485733114635 |
| lookback_xgboost | 3.5651031645488147 | 200.72583615414294 |
| bert_MA | 10.83293233874024 | 816.5899384610334 |
| news_rnn | 622.6200037110209 | 741.4623210266183 |
| lookback_news_rnn | 15.457514085649745 | 309.5794180019174 |
| sentiment_rnn | 27.86876269801595 | 1143.883111213618 |
| lookback_fasttext | 13.962031174946512 | 169.93569939132175 |

</center>

</center>
From the table, we can see that the `lookback_xgboost` model has the lowest RMSE value for the testing data, indicating that it is the best performing model among the four. The `bert_MA` model has the lowest RMSE value for the training data, but it has the highest RMSE value for the testing data, indicating that it may be overfitting the training data.

From the table, we can see that the `univariate` model has the lowest RMSE value for the testing data, indicating that it is the best performing model among the four. The `bert_MA` model has the lowest RMSE value for the training data, but it has the highest RMSE value for the testing data, indicating that it may be overfitting the training data.
Having news ruin the data dependency, ultimately they act as noise for the data. The `univariate` model, which only uses the stock prices, performs the best among the models. This suggests that the stock prices themselves contain enough information to predict future stock prices, and the addition of news headlines does not significantly improve the model's performance.

Having news ruin the data dependency, ultimately they act as noise for the data. The `univariate` model, which only uses the stock prices, performs the best among the models. This suggests that the stock prices themselves contain enough information to predict future stock prices, and the addition of news headlines does not significantly improve the model's performance.

| Notebooks | Loss Graph | Test Predictions |
|:---: |:---: |:---: |
| univariate | ![alt text](images/image-2.png) | ![alt text](images/image-6.png) |
| xgboost | 0.9986960112179987 | ![alt text](images/image-11.png) |
| lookback_xgboost | 3.5651031645488147 | ![alt text](images/image-10.png) |
| bert_MA | ![alt text](images/image.png) | ![alt text](images/image-5.png) |
| news_rnn | ![alt text](images/image-3.png) | ![alt text](images/image-8.png) |
| lookback_news_rnn | ![alt text](images/image-1.png) | ![alt text](images/image-7.png) |
| sentiment_rnn | ![alt text](images/image-4.png) | ![alt text](images/image-9.png) |
| lookback_fasttext | ![alt text](images/image-12.png) | ![alt text](images/image-13.png) |

Binary file added images/image-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image-10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image-11.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image-12.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image-13.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image-3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image-4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image-5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image-6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image-7.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image-8.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image-9.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 3333262

Please sign in to comment.