Skip to content

Commit

Permalink
updated readme and images
Browse files Browse the repository at this point in the history
  • Loading branch information
Ignatiocalvin committed Jun 11, 2024
1 parent 1f6c1d9 commit 41277e0
Show file tree
Hide file tree
Showing 4 changed files with 20 additions and 17 deletions.
37 changes: 20 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
# LLM_StockPredictor
Project of Bagautdin Nukhkadiev and Ignatio Calvin Hidayat for the course "Machine Learning and Causal Inference" at the University of Mannheim.

## Overview
The impact of Natural Language Processing (NLP) algorithms in predicting
stock market prices, especially price shocks.
Expand Down Expand Up @@ -61,9 +63,9 @@ Description: This notebook predicts stock prices using an LSTM neural network an
Filename: `xgboost.ipynb`
Description: This notebook predicts stock prices using an XGBoost model. The model uses the stock prices and news headlines as features to predict the stock prices. The model is trained on the training data and evaluated on the testing data.




### 9. Predicting Stock Prices with XGBoost with Look Back
Filename: `lookback_xgboost.ipynb`
Description: This notebook predicts stock prices using an XGBoost model. The model uses a look back window of 50 days of stock prices and news headlines as features to predict the next day's stock price in the training data and a 30 day window in the testing data. The model is trained on the training data and evaluated on the testing data.

## Results
For our models, we used the RMSE as the evaluation metric. The RMSE is a measure of the differences between the predicted values and the actual values. It gives us an idea of how well the model is performing in terms of predicting the stock prices.
Expand All @@ -72,29 +74,30 @@ The table below shows the RMSE values for the training and testing data for each

<center>

| Notebooks | Train RMSE | Test RMSE |
|:---: |:---: |:---: |
| univariate | 31.743497936455558 | 164.36006533558697 |
| xgboost | 0.9986960112179987 | 749.0485733114635 |
| lookback_xgboost | 3.5651031645488147 | 200.72583615414294 |
| bert_MA | 10.83293233874024 | 816.5899384610334 |
| news_rnn | 622.6200037110209 | 741.4623210266183 |
| lookback_news_rnn | 15.457514085649745 | 309.5794180019174 |
| sentiment_rnn | 27.86876269801595 | 1143.883111213618 |
| lookback_fasttext | 13.962031174946512 | 169.93569939132175 |
| Notebooks | Train RMSE | Test RMSE |
|:---: |:---: |:---: |
| univariate | 31.74 | 164.36 |
| xgboost | 0.99 | 749.04 |
| lookback_xgboost | 3.56 | 200.72 |
| bert_MA | 10.83 | 816.58 |
| news_rnn | 622.62 | 741.46 |
| lookback_news_rnn | 15.45 | 309.57 |
| sentiment_rnn | 27.86 | 1143.88 |
| lookback_fasttext | 13.96 | 169.93 |

</center>

From the table, we can see that the `lookback_xgboost` model has the lowest RMSE value for the testing data, indicating that it is the best performing model among the four. The `bert_MA` model has the lowest RMSE value for the training data, but it has the highest RMSE value for the testing data, indicating that it may be overfitting the training data.
From the table, we can see that the `univariate` model has the lowest RMSE value for the testing data, indicating that it is the best performing model among all of the models. The `xgboost` model has the lowest RMSE value for the training data, but it has one of the highest RMSE value for the testing data, indicating that it may be overfitting the training data. The `lookback_xgboost` model has a lower RMSE value for the testing data compared to the `xgboost` model, indicating that the look back window helps improve the model's performance. The `news_rnn` model has the highest RMSE value for the testing data, indicating that it is the worst performing model among all of the models. The `lookback_news_rnn` model has a lower RMSE value for the testing data compared to the `news_rnn` model, indicating that the look back window helps improve the model's performance. The `sentiment_rnn` model has the highest RMSE value for the training data, indicating that it may be underfitting the training data. The `lookback_fasttext` model has a lower RMSE value for the testing data compared to the `univariate` model, indicating that the FastText embeddings may be a better choice than the BERT embeddings for this task.
The `bert_MA` model has the lowest RMSE value for the training data, but it has the highest RMSE value for the testing data, indicating that it may be overfitting the training data.

Having news ruin the data dependency, ultimately they act as noise for the data. The `univariate` model, which only uses the stock prices, performs the best among the models. This suggests that the stock prices themselves contain enough information to predict future stock prices, and the addition of news headlines does not significantly improve the model's performance.


| Notebooks | Loss Graph | Test Predictions |
|:---: |:---: |:---: |
| univariate | ![alt text](images/image-2.png) | ![alt text](images/image-6.png) |
| xgboost | 0.9986960112179987 | ![alt text](images/image-11.png) |
| lookback_xgboost | 3.5651031645488147 | ![alt text](images/image-10.png) |
| univariate | ![alt text](images/image-2.png) | ![alt text](images/image-6.png) |
| xgboost | ![alt text](images/image-14.png) | ![alt text](images/image-11.png) |
| lookback_xgboost | ![alt text](images/image-15.png) | ![alt text](images/image-10.png) |
| bert_MA | ![alt text](images/image.png) | ![alt text](images/image-5.png) |
| news_rnn | ![alt text](images/image-3.png) | ![alt text](images/image-8.png) |
| lookback_news_rnn | ![alt text](images/image-1.png) | ![alt text](images/image-7.png) |
Expand Down
Binary file added images/image-14.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/image-15.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/image-6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 41277e0

Please sign in to comment.