Skip to content

Latest commit

 

History

History
221 lines (176 loc) · 11.4 KB

README.md

File metadata and controls

221 lines (176 loc) · 11.4 KB

Manette : Deep Reinforcement Learning with Fined Grained Action Repetition

This repository contains an open source implementation of the PAAC algorithm presented in Efficient Parallel Methods for Deep Reinforcement Learning and forked from Alfredvc's implementation. We added the possibility to use the FiGAR algorithm presented in Fine Grained Action Repetition for Deep Reinforcement Learning as well as LSTM networks, Bayesian networks, e-greedy policy and playing with colored images.

PAAC is a conceptually simple advantage actor-critic algorithm designed to run efficiently on a GPU, offering A3C like performance in under 12 hours of training. When adding FiGAR, the agent can explore more possibilities and achieve higher scores with a better motion control.

Functionalities

  • Input games :
    • ALE Atari environment
    • Open gym
    • Tetris
  • Models :
    • Black and white convolutionnal networks
    • Color convolutionnal networks
    • Action repetitions
  • Algorithms :
    • PAAC
    • FiGAR + PAAC
    • LSTM + PAAC
    • LSTM + FiGAR + PAAC
  • Visualization :
    • Tensorboard
    • Learning graphs
    • Repetition histograms
    • TensorFlow graph

Recorded results with FiGAR10

breakout gif mspacman gif space invaders gif seaquest gif pong gif

Results

Average scores for 50 experiments on 12 Atari games :

A3C (*) PAAC FiGAR10
Asterix 22140 17525 38315
Asteroids 4474 1427 66708
Breakout 681 407 779
Enduro -82.5 0 677
Gopher 10022 14034 26625
Gravitar 303 168 502
Montezuma 67 0 0
Ms Pacman 653 2408 5488
Pong 5.6 19.9 19.5
Seaquest 2355 1679 10032
Space Invaders 15730 747 4262
Yar's revenge 7270 12808 11329

(* ) A3C scores taken from Asynchronous Methods for Deep Reinforcement Learning

Graphs of training for Seaquest with FiGAR for 0, 5 and 10 repetitions :

seaquest training

Histograms of the repetitions during training for Seaquest with FiGAR for 5 and 10 repetitions :

seaquest repetitions

Results above show that PAAC's performances were stuck around 2000 points. The agent would not resurface to grab oxygen and it would die early.

Available games

Platform Games
Atari 2600 All Games !
OpenGym FlappyBird, MountainCar, Catcher, MonsterKong, RaycastMaze, Snake
Tetris Tetris !!

Implemented publications

Requirements

Training the agent

To train an agent to play Pong, for example, run : python3 train.py -g pong -df logs/test_pong/.

Training can be stopped (using Ctrl+c) and then resumed by running python3 train.py -g pong -df logs/test_pong/.

Visualizing training

  1. Open a new terminal
  2. Run tensorboard --logdir=<absolute-path>/manette/logs/.
  3. In your browser navigate to localhost:6006/

Many graphs are already available (rewards per episode, length of episode, steps per second, loss, ...) and you can easily add yours.

Testing the agent

To test the performance of a trained agent run python3 test.py -f logs/test_pong -tc 5.

Generating gifs

Gifs can be generated from stored network weights. For example a gif of the agent playing breakout can be generated with

python3 test.py -f pretrained/breakout/ -gn breakout

This may take a few minutes.

Training options

The most useful options for the training are :

  • -g : Name of the Atari 2600 game you want to play. All the games in atari_roms are available.
  • -df : Destination folder, where the information is saved for each game (checkpoints, tensorboard graphs, ...)
  • -lr : Initial value for the learning rate. Default = 0.224.
  • -lra : Number of global steps during which the learning rate will be linearly annealed towards zero.
  • --entropy : Strength of the entropy regularization term. Default = 0.02. Should be increased when using FiGAR.
  • --max_global_steps : Maximum number of training steps. 80 million steps are enough for most games.
  • --max_local_steps : Number of steps to gain experience from before every update. 5 is good.
  • --arch : Which network architecture to use : NIPS, NATURE, PWYX, LSTM, BAYESIAN. See below for descriptions.
  • -ec : Emulator counts. Number of emulator playing simultaneously. Default = 32.
  • -ew : Emulator workers. Number of threads that computes the emulators' steps. Default = 8 : each thread computes for 4 emulators.
  • --egreedy : Whether to use an e-greedy policy to choose the actions or not.
  • --epsilon : If using an e-greedy policy, the epsilon coefficient. Default = 0.05 .
  • --softmax_temp : Softmax temperature for the Boltzmann action choice policy. Default = 1.
  • --annealed : Whether to anneal the epsilon towards zero or not for e-greedy policy.
  • --annealed_steps : Number of global steps before epsilon is annealed.
  • --keep_percentage : When the Bayesian/Dropout network is used, keep percentage. Default = 0.9 .
  • --rgb : Whether to use RGB images for the training or not.
  • --checkpoint_interval : Interval of steps btw checkpoints
  • --activation : Activation function for the network : relu or leaky_relu.
  • --alpha_leaky_relu : Coefficient when using leaky relu.

To use FiGAR, the options are --max_repetition and --nb_choices. max_repetition is the maximum number of times that an action can be repeated. nb_choices is the number of choices that the agent has, equally distributed from 0 to max. If put to (0, 1), there is no repetition, you are not using FiGAR (i.e. the possible repetitions are [0]). If put to (10, 11), the possible repetitions are [0,1,2,3,4,5,6,7,8,9,10]. If put to (10, 6), the possible repetitions are [0,2,4,6,8,10]

BatchTrain script

If you are tired of typing multiple options in the command line to use the train.py file, you can use the batchTrain.py script in the script folder. Simply write as many JSON files (like the one below) as you want, change all the options you wish and put them all in the same folder, say toTrain/experiment1/.

Run : python3 script/batchTrain -f toTrain/ -d logs/ .

All your JSON files will be loaded and trained, one after the other, with the right options, and saved in logs/DATE-experiment1/.

Exemple of JSON file for Pong, with PWYX network and FiGAR 10 repetitions :

{
  "game": "pong",
  "initial_lr": 0.0224,
  "lr_annealing_steps": 80000000,
  "max_global_steps": 80000000,
  "max_local_steps": 5,
  "gamma": 0.99,
  "alpha": 0.99,
  "entropy_regularisation_strength": 0.02,
  "arch": "PWYX",
  "emulator_workers": 8,
  "emulator_counts": 32,
  "clip_norm_type": "global",
  "clip_norm": 3.0,
  "single_life_episodes": false,
  "e": 0.1,
  "random_start": true,
  "egreedy": false,
  "epsilon": 0.05,
  "softmax_temp": 1.0,
  "annealed": false,
  "annealed_steps": 80000000,
  "keep_percentage": 0.9,
  "rgb": false,
  "max_repetition": 10,
  "nb_choices": 11,
  "checkpoint_interval": 1000000,
  "activation": "relu",
  "alpha_leaky_relu": 0.1
}

Other scripts

Some other scripts can also simplify your life (ex. test all the agents, create gifs for all the agents, ...). You can find them in the script folder. The script/README.md contains explanations on how to use them.

Adapting to new neural network architectures

The codebase currently contains five neural network architectures :

To create a new architecture follow the pattern demonstrated in the other networks. Then create a new class that inherits from both the PolicyVNetwork andYourNetwork. For example: NewArchitecturePolicyVNetwork(PolicyVNetwork, YourNetwork). Then use this class in train.py.

Other games

Some other games are also available. Feel free to add yours and have fun !

catcher gif flappyBird gif monsterKong gif snake gif tetris gif

Currently these games are available :

  • All the Atari games
  • Some Open AI Gym games : FlappyBird-v0, CartPole-v0, MountainCar-v0, Catcher-v0, MonsterKong-v0, RaycastMaze-v0, Snake-v0 . Requirements : Open AI Gym and gym-ple
  • Tetris ! You can even play the game yourself by running python3 tetris.py.

Just change the name of the game that you want to play, with the -g option.

Ex : python3 train.py -g tetris -df logs/test_tetris/.

Tips

  • When using FIGAR, it is better to choose a bigger network like PWYX.
  • The entropy regularization strength (ERS) is an important parameter. It should stay between 0.01 and 0.1 . If you notice that your agent's score is stuck and can't improve, try increasing the ERS. On the contrary, if the score seams unstable (often falling down to zero without reason) or the standard deviation of the score is high, try decreasing the ERS. As an example, for PAAC default, I use ERS=0.02, and for FiGAR 10 , ERS = 0.05.
  • When training some other (non Atari) games, you might need to put the random_start option to false or the agent migth die before even starting to play...

About

This work was realized by Léa Berthomier during a 5 months internship at Jolibrain.