This project aimed to investigate the feasibility of applying different RL algorithms for producing profitable forex trading strategies on EUR/USD pair.
The study was performed on the per minute exchange data ranging from Jan 2003 to Oct 2022, collected from Forextester for 12 major forex pairs. Preprocessing involved forward-filling missing values and aggregating to 15 minutes time step.
The RL environment was designed to return a description of the current price dynamics at each time step, accept a discrete action from the agent (buy, sell, or do nothing) and return the associated reward with the description of the next time step.
The initial features chosen to represent the environment states were the 8 most recent log returns on the target pair (EUR/USD) price and other 11 Forex trading pairs, calendar features, and the current position value, similar to Financial Trading as a Game: A Deep Reinforcement Learning Approach.
The reward was implemented as a log percentage change in portfolio balance between consecutive time steps:
Additional emphasis was placed on feature engineering by introducing an extended set of features comprising of multiple lagged major technical indicators such as RSI, MACD, Bollinger Bands, and others.
The environment was implented in Gym.
The considered RL algorithms were chosen as a representative set of model-free approaches, including both classic and more recent SOTA: Deep Q-Learning, Proximal Policy Optimization, Advantage Actor-Critic, and others, implemented in SB3.
For the purpose of the experiments, the dataset was divided into train, validation and evaluation sets such that the evaluation and validation sets contained the last 3 years and prior last 3 years worth of data respectively (100,000 time steps each).
Default hyperparameter values located under hyperparparameters/default
were used during training, which consisted of 20 episodes of full train set runs.
Hyperparameter tuning was conducted in the following settings:
- DQN, 2 actions (Buy/Sell), 0.0001 spread, basic + TA features
- PPO, 2 actions (Buy/Sell), no fees, basic + TA features
using Optuna's TPESampler with 40 and 20 trials respectively. Tuned hyperparemeters can be found under hyperparparameters/tuned
.
Experiments were performed with different commission levels, as well as with and without TA features. Both fully sequential and randomly sub-sampled runs were considered.
Custom fork of RL Baselines3 Zoo, D3F4LT4ST/rl-baselines3-zoo was used as a training framework.
In the commission-free environments, PPO model was found to be capable of consistently generating profit several years after training period and outperforming the base exchange rate. The applicability of obtained AI strategies is still limited, primarily due to the commission factor. The conducted experiments have illustrated that introducing any kind of commission severely lowered the agent profitability, irrespective of the architecture. Such effect can be attributed to the high-frequency behavior demonstrated by the agents in zero-fee environments, which results in significant cost being incurred. Performed experiments aimed at teaching the agent to conduct less frequent but confident trades did not produce the desired outcomes. DQN and A2C architectures were observed to converge to simplistic buy-and-hold or sell-and-hold strategies, while the PPO models were found to continue high-frequency trading and thus incur the most losses.
Tuned PPO model performance compared to EUR/USD on validation (left) and evaluation (right) sets:
PPO Val | EUR/USD Val | PPO Eval | EUR/USD Eval | |
---|---|---|---|---|
Start Period | 2017-01-16 | 2017-01-16 | 2019-11-24 | 2019-11-24 |
End Period | 2019-11-24 | 2019-11-24 | 2022-09-30 | 2022-09-30 |
Cumulative Return | 112.53% | 3.95% | 46.31% | -11.02% |
Sharpe | 3.26 | 0.2 | 1.49 | -0.42 |
Sortino | 5.77 | 0.28 | 2.45 | -0.57 |
Max Drawdown | -4.06% | -12.86% | -6.94% | -22.45% |
Longest DD Days | 61 | 660 | 220 | 631 |
Volatility (ann.) | 5.64% | 5.57% | 6.31% | 6.24% |
Expected Daily % | 0.07% | 0.0% | 0.04% | -0.01% |
Expected Monthly % | 2.18% | 0.11% | 1.09% | -0.33% |
Expected Yearly % | 28.57% | 1.3% | 9.98% | -2.88% |
Gain/Pain Ratio | 0.84 | 0.04 | 0.34 | -0.08 |
Outlier Win Ratio | 4.08 | 4.89 | 4.82 | 5.42 |
- Data collection:
notebooks/forex_data_collection.ipynb
- Data preprocessing:
notebooks/forex_data_preproc_eda.ipynb
- Feature engineering:
notebooks/forex_data_feature_engineering_basic.ipynb
,notebooks/forex_data_feature_engineering_ta.ipynb
- RL Experiments:
notebooks/forex_full_eurusd_rl_experiments.ipynb
- Best models analysis:
notebooks/forex_full_eurusd_best_rl_models_analysis.ipynb
git clone --recurse-submodules https://github.com/D3F4LT4ST/RL-trading.git
Enable gym==0.21.0
installation (https://stackoverflow.com/questions/77124879/pip-extras-require-must-be-a-dictionary-whose-values-are-strings-or-lists-of):
pip install setuptools==65.5.0 pip==21
pip install wheel==0.38.0
Install dependencies:
pip install -r requirements.txt
pip install -r rl-baselines3-zoo/requirements.txt