[07/01/2021] Added SVEA, DrQ, Distracting Control Suite, and reduced memory consumption by 5x
Benchmark for generalization in continuous control from pixels, based on DMControl.
Also contains official implementations of
Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation (SVEA)
Nicklas Hansen, Hao Su, Xiaolong Wang
and
Generalization in Reinforcement Learning by Soft Data Augmentation (SODA)
Nicklas Hansen, Xiaolong Wang
See this repository for SVEA implemented using Vision Transformers.
The DMControl Generalization Benchmark provides two distinct benchmarks for visual generalization, random colors and video backgrounds:
Both benchmarks are offered in easy and hard variants. Samples are shown below.
This codebase also integrates a set of challenging test environments from the Distracting Control Suite (DistractingCS). Our implementation of DistractingCS includes environments of 8 gradually increasing randomization intensities. Note that our implementation of DistractingCS is not equivalent to the original DistractingCS benchmark -- they differ in important ways: (1) we evaluate at a different set of intensities (and number of videos) that more closely matches performance of current algorithms; (2) we reduce randomization update frequency by a factor of 2 to account for frame skip (action repeat); (3) all Tensorflow dependencies have been replaced by PyTorch. By default, algorithms are trained for 500k frames and are continuously evaluated in both training and test environments. Environment randomization is seeded to promote reproducibility.
This repository contains implementations of the following algorithms in a unified framework:
- SVEA (Hansen et al., 2021)
- SODA (Hansen and Wang, 2021)
- PAD (Hansen et al., 2020)
- DrQ (Kostrikov et al., 2020)
- RAD (Laskin et al., 2020)
- CURL (Srinivas et al., 2020)
- SAC (Haarnoja et al., 2018)
using standardized architectures and hyper-parameters, wherever applicable. If you want to add an algorithm, feel free to send a pull request.
If you find our work useful in your research, please consider citing our work as follows:
@article{hansen2021stabilizing,
title={Stabilizing Deep Q-Learning with ConvNets and Vision Transformers under Data Augmentation},
author={Nicklas Hansen and Hao Su and Xiaolong Wang},
year={2021},
eprint={2107.00644},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
for the SVEA method, and
@inproceedings{hansen2021softda,
title={Generalization in Reinforcement Learning by Soft Data Augmentation},
author={Nicklas Hansen and Xiaolong Wang},
booktitle={International Conference on Robotics and Automation},
year={2021},
}
for the SODA method and the DMControl Generalization Benchmark.
We assume that you have access to a GPU with CUDA >=9.2 support. All dependencies can then be installed with the following commands:
conda env create -f setup/conda.yml
conda activate dmcgb
sh setup/install_envs.sh
Part of this repository relies on external datasets. SODA uses the Places dataset for data augmentation, which can be downloaded by running
wget http://data.csail.mit.edu/places/places365/places365standard_easyformat.tar
Distracting Control Suite uses the DAVIS dataset for video backgrounds, which can be downloaded by running
wget https://data.vision.ee.ethz.ch/csergi/share/davis/DAVIS-2017-trainval-480p.zip
You should familiarize yourself with their terms before downloading. After downloading and extracting the data, add your dataset directory to the datasets
list in setup/config.cfg
.
The video_easy
environment was proposed in PAD, and the video_hard
environment uses a subset of the RealEstate10K dataset for background rendering. All test environments (including video files) are included in this repository, namely in the src/env/
directory.
The scripts
directory contains training and evaluation bash scripts for all the included algorithms. Alternatively, you can call the python scripts directly, e.g. for training call
python3 src/train.py \
--algorithm svea \
--seed 0
to run SVEA on the default task, walker_walk
. This should give you an output of the form:
Working directory: logs/walker_walk/svea/0
Evaluating: logs/walker_walk/svea/0
| eval | S: 0 | ER: 26.2285 | ERTEST: 25.3730
| train | E: 1 | S: 250 | D: 70.1 s | R: 0.0000 | ALOSS: 0.0000 | CLOSS: 0.0000 | AUXLOSS: 0.0000
where ER
and ERTEST
corresponds to the average return in the training and test environments, respectively. You can select the test environment used in evaluation with the --eval_mode
argument, which accepts one of (train, color_easy, color_hard, video_easy, video_hard, distracting_cs, none)
. Use none
if you want to disable continual evaluation of generalization. Note that not all combinations of arguments have been tested. Feel free to open an issue or send a pull request if you encounter an issue or would like to add support for new features.
We provide test results for each of the SVEA, SODA, PAD, DrQ, RAD, and CURL methods. Results for color_hard
and video_easy
are shown below:
See our paper for additional results.
We would like to thank the numerous researchers and engineers involved in work of which this work is based on. This repository is a product of our work on SVEA, SODA and PAD. Our SAC implementation is based on this repository, the original DMControl is available here, and the gym wrapper for it is available here. The Distracting Control Suite environments were adapted from this implementation. PAD, RAD, CURL, and DrQ baselines are based on their official implementations provided here, here, here, and here, respectively.