Skip to content

Forex trading strategy learning with reinforcement learning

Notifications You must be signed in to change notification settings

fortesenselabs/RL-trading

 
 

Repository files navigation

Reinforcement Learning In Financial Trading

This project aimed to investigate the feasibility of applying different RL algorithms for producing profitable forex trading strategies on EUR/USD pair.

Data

The study was performed on the per minute exchange data ranging from Jan 2003 to Oct 2022, collected from Forextester for 12 major forex pairs. Preprocessing involved forward-filling missing values and aggregating to 15 minutes time step.

Environment

The RL environment was designed to return a description of the current price dynamics at each time step, accept a discrete action from the agent (buy, sell, or do nothing) and return the associated reward with the description of the next time step.

The initial features chosen to represent the environment states were the 8 most recent log returns on the target pair (EUR/USD) price and other 11 Forex trading pairs, calendar features, and the current position value, similar to Financial Trading as a Game: A Deep Reinforcement Learning Approach.

The reward was implemented as a log percentage change in portfolio balance between consecutive time steps: $$r_t = log(\frac{p_t}{p_{t-1}})$$ Two commission types were considered for the task: bid-ask spread and percentage fee.

Additional emphasis was placed on feature engineering by introducing an extended set of features comprising of multiple lagged major technical indicators such as RSI, MACD, Bollinger Bands, and others.

The environment was implented in Gym.

Agent architectures

The considered RL algorithms were chosen as a representative set of model-free approaches, including both classic and more recent SOTA: Deep Q-Learning, Proximal Policy Optimization, Advantage Actor-Critic, and others, implemented in SB3.

Experiments

For the purpose of the experiments, the dataset was divided into train, validation and evaluation sets such that the evaluation and validation sets contained the last 3 years and prior last 3 years worth of data respectively (100,000 time steps each).

Default hyperparameter values located under hyperparparameters/default were used during training, which consisted of 20 episodes of full train set runs. Hyperparameter tuning was conducted in the following settings:

  • DQN, 2 actions (Buy/Sell), 0.0001 spread, basic + TA features
  • PPO, 2 actions (Buy/Sell), no fees, basic + TA features

using Optuna's TPESampler with 40 and 20 trials respectively. Tuned hyperparemeters can be found under hyperparparameters/tuned.

Experiments were performed with different commission levels, as well as with and without TA features. Both fully sequential and randomly sub-sampled runs were considered.

Custom fork of RL Baselines3 Zoo, D3F4LT4ST/rl-baselines3-zoo was used as a training framework.

Results


In the commission-free environments, PPO model was found to be capable of consistently generating profit several years after training period and outperforming the base exchange rate. The applicability of obtained AI strategies is still limited, primarily due to the commission factor. The conducted experiments have illustrated that introducing any kind of commission severely lowered the agent profitability, irrespective of the architecture. Such effect can be attributed to the high-frequency behavior demonstrated by the agents in zero-fee environments, which results in significant cost being incurred. Performed experiments aimed at teaching the agent to conduct less frequent but confident trades did not produce the desired outcomes. DQN and A2C architectures were observed to converge to simplistic buy-and-hold or sell-and-hold strategies, while the PPO models were found to continue high-frequency trading and thus incur the most losses.

Tuned PPO model performance compared to EUR/USD on validation (left) and evaluation (right) sets:


PPO Val EUR/USD Val PPO Eval EUR/USD Eval
Start Period 2017-01-16 2017-01-16 2019-11-24 2019-11-24
End Period 2019-11-24 2019-11-24 2022-09-30 2022-09-30
Cumulative Return 112.53% 3.95% 46.31% -11.02%
Sharpe 3.26 0.2 1.49 -0.42
Sortino 5.77 0.28 2.45 -0.57
Max Drawdown -4.06% -12.86% -6.94% -22.45%
Longest DD Days 61 660 220 631
Volatility (ann.) 5.64% 5.57% 6.31% 6.24%
Expected Daily % 0.07% 0.0% 0.04% -0.01%
Expected Monthly % 2.18% 0.11% 1.09% -0.33%
Expected Yearly % 28.57% 1.3% 9.98% -2.88%
Gain/Pain Ratio 0.84 0.04 0.34 -0.08
Outlier Win Ratio 4.08 4.89 4.82 5.42

Notebooks

  • Data collection: notebooks/forex_data_collection.ipynb
  • Data preprocessing: notebooks/forex_data_preproc_eda.ipynb
  • Feature engineering: notebooks/forex_data_feature_engineering_basic.ipynb, notebooks/forex_data_feature_engineering_ta.ipynb
  • RL Experiments: notebooks/forex_full_eurusd_rl_experiments.ipynb
  • Best models analysis: notebooks/forex_full_eurusd_best_rl_models_analysis.ipynb

Installation

git clone --recurse-submodules https://github.com/D3F4LT4ST/RL-trading.git

Enable gym==0.21.0 installation (https://stackoverflow.com/questions/77124879/pip-extras-require-must-be-a-dictionary-whose-values-are-strings-or-lists-of):

pip install setuptools==65.5.0 pip==21 
pip install wheel==0.38.0

Install dependencies:

pip install -r requirements.txt
pip install -r rl-baselines3-zoo/requirements.txt

About

Forex trading strategy learning with reinforcement learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.9%
  • Python 0.1%