updated readme and images

Ignatiocalvin · Jun 11, 2024 · 41277e0 · 41277e0
1 parent 1f6c1d9
commit 41277e0
Show file tree

Hide file tree

Showing 4 changed files with 20 additions and 17 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,6 @@
 # LLM_StockPredictor
+Project of Bagautdin Nukhkadiev and Ignatio Calvin Hidayat for the course "Machine Learning and Causal Inference" at the University of Mannheim.
+
 ## Overview
 The impact of Natural Language Processing (NLP) algorithms in predicting
 stock market prices, especially price shocks.
@@ -61,9 +63,9 @@ Description: This notebook predicts stock prices using an LSTM neural network an
 Filename: `xgboost.ipynb`
 Description: This notebook predicts stock prices using an XGBoost model. The model uses the stock prices and news headlines as features to predict the stock prices. The model is trained on the training data and evaluated on the testing data.
 
-
-
-
+### 9. Predicting Stock Prices with XGBoost with Look Back
+Filename: `lookback_xgboost.ipynb`
+Description: This notebook predicts stock prices using an XGBoost model. The model uses a look back window of 50 days of stock prices and news headlines as features to predict the next day's stock price in the training data and a 30 day window in the testing data. The model is trained on the training data and evaluated on the testing data.
 
 ## Results
 For our models, we used the RMSE as the evaluation metric. The RMSE is a measure of the differences between the predicted values and the actual values. It gives us an idea of how well the model is performing in terms of predicting the stock prices.
@@ -72,29 +74,30 @@ The table below shows the RMSE values for the training and testing data for each
 
 <center>
 
-| Notebooks 	        | Train RMSE 	        | Test RMSE 	        |
-|:---:	                |:---:	                |:---:	                |
-| univariate 	        | 31.743497936455558 	| 164.36006533558697 	|
-| xgboost 	            | 0.9986960112179987 	| 749.0485733114635 	|
-| lookback_xgboost 	    | 3.5651031645488147 	| 200.72583615414294 	|
-| bert_MA 	            | 10.83293233874024 	| 816.5899384610334 	|
-| news_rnn            	| 622.6200037110209  	| 741.4623210266183 	|
-| lookback_news_rnn 	| 15.457514085649745 	| 309.5794180019174 	|
-| sentiment_rnn 	    | 27.86876269801595 	| 1143.883111213618 	|
-| lookback_fasttext 	| 13.962031174946512 	| 169.93569939132175 	|
+| Notebooks 	        | Train RMSE 	        | Test RMSE 	    |
+|:---:	                |:---:	                |:---:	            |
+| univariate 	        | 31.74	                | 164.36 	        |
+| xgboost 	            | 0.99 	                | 749.04	        |
+| lookback_xgboost 	    | 3.56	                | 200.72	        |
+| bert_MA 	            | 10.83	                | 816.58	        |
+| news_rnn            	| 622.62 	            | 741.46	        |
+| lookback_news_rnn 	| 15.45	                | 309.57	        |
+| sentiment_rnn 	    | 27.86 	            | 1143.88	        |
+| lookback_fasttext 	| 13.96 	            | 169.93	        |
 
 </center>   
 
-From the table, we can see that the `lookback_xgboost` model has the lowest RMSE value for the testing data, indicating that it is the best performing model among the four. The `bert_MA` model has the lowest RMSE value for the training data, but it has the highest RMSE value for the testing data, indicating that it may be overfitting the training data.
+From the table, we can see that the `univariate` model has the lowest RMSE value for the testing data, indicating that it is the best performing model among all of the models. The `xgboost` model has the lowest RMSE value for the training data, but it has one of the highest RMSE value for the testing data, indicating that it may be overfitting the training data. The `lookback_xgboost` model has a lower RMSE value for the testing data compared to the `xgboost` model, indicating that the look back window helps improve the model's performance. The `news_rnn` model has the highest RMSE value for the testing data, indicating that it is the worst performing model among all of the models. The `lookback_news_rnn` model has a lower RMSE value for the testing data compared to the `news_rnn` model, indicating that the look back window helps improve the model's performance. The `sentiment_rnn` model has the highest RMSE value for the training data, indicating that it may be underfitting the training data. The `lookback_fasttext` model has a lower RMSE value for the testing data compared to the `univariate` model, indicating that the FastText embeddings may be a better choice than the BERT embeddings for this task.
+ The `bert_MA` model has the lowest RMSE value for the training data, but it has the highest RMSE value for the testing data, indicating that it may be overfitting the training data.
 
 Having news ruin the data dependency, ultimately they act as noise for the data. The `univariate` model, which only uses the stock prices, performs the best among the models. This suggests that the stock prices themselves contain enough information to predict future stock prices, and the addition of news headlines does not significantly improve the model's performance. 
 
 
 | Notebooks 	        | Loss Graph 	                        | Test Predictions	                |
 |:---:	                |:---:	                                |:---:	                            |
-| univariate 	        | ![alt text](images/image-2.png) 	    | ![alt text](images/image-6.png) 	|
-| xgboost 	            | 0.9986960112179987 	                | ![alt text](images/image-11.png)  |
-| lookback_xgboost 	    | 3.5651031645488147 	                | ![alt text](images/image-10.png)  |
+| univariate 	        | ![alt text](images/image-2.png) 	    | ![alt text](images/image-6.png)	|
+| xgboost 	            | ![alt text](images/image-14.png)	    | ![alt text](images/image-11.png)  |
+| lookback_xgboost 	    | ![alt text](images/image-15.png)      | ![alt text](images/image-10.png)  |
 | bert_MA 	            | ![alt text](images/image.png) 	    | ![alt text](images/image-5.png) 	|
 | news_rnn            	| ![alt text](images/image-3.png)       | ![alt text](images/image-8.png) 	|
 | lookback_news_rnn 	| ![alt text](images/image-1.png) 	    | ![alt text](images/image-7.png) 	|

diff --git a/images/image-14.png b/images/image-14.png
diff --git a/images/image-15.png b/images/image-15.png
diff --git a/images/image-6.png b/images/image-6.png