Official repository for "Direct Advantage Estimation"
We recommend using Python 3.8 with venv. Please make sure pip is up to date by running:
pip install -U pip
Install requirements:
pip install -r requirements.txt
To reproduce the results, run the following command:
python train.py --algo {algo} --hparam_file {hyperparameter_file} --envs {env} --threads {threads}
--algo
: PPO
(GAE) or CustomPPO
(DAE)
--hparam_file
: See ./params/
for the hyperparameters used in the paper, the files are named by {algo}_{network}.yml
--envs
: Environment to train. For example, Pong
, Breakout
, etc. For MinAtar environments, please add the suffix -MinAtar-v0
. (e.g., Breakout-MinAtar-v0
)
--threads
: Number of parallel threads for asynchronous environment steps
--logging
: Save logs in ./logs/{env}/
--save_model
: Save the trained model to ./logs/{env}/
To view the tensorboard logs, run
python -m tensorboard --logdir ./logs/
and open the displayed URL in a browser.
Please use the following BibTex entry.
@article{pan2022direct,
title={Direct advantage estimation},
author={Pan, Hsiao-Ru and G{\"u}rtler, Nico and Neitz, Alexander and Sch{\"o}lkopf, Bernhard},
journal={Advances in Neural Information Processing Systems},
volume={35},
pages={11869--11880},
year={2022}
}