This project aims to explore the power of teaching an agent through Reinforced Learning (RL) to navigate on a Banana World.
The agents uses a DQN Network with a Deep Q-Learning Aldorithm to learn how to navigate efficiently in the virtual world collecting bananas.
The implementation options for: Vanilla DQN
, Double DQN
, Dueling DQN
and Priorized Replay Experience DQN
.
Please check under Instructions on how to activate each of this options
-
You need to have installed the requirements (specially mlagents==0.4.0). Due to deprecated libraries, I've included a python folder which will help with installation of the system.
- Clone the repository:
git clone https://github.com/joao-d-oliveira/RL-SmartAgent-BananaGame.git
- Go to python folder:
cd RL-SmartAgent-BananaGame/python
- Compile and install needed libraries
pip install .
- Clone the repository:
-
Download the environment from one of the links below Download only the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.
(For AWS or Collab) If you'd like to train the agent on AWS (and have not enabled a virtual screen), then please use this link to obtain the environment.
2.1 In case you prefer to test the Visual Environment (where the states are defined by the video instead of a vector that indicates the state). please Download instead bellow:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
- Place the downloaded file for your environment in the DRLND GitHub repository, in the
RL-SmartAgent-BananaGame
folder, and unzip (or decompress) the file.
A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of your agent is to collect as many yellow bananas as possible while avoiding blue bananas.
The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction. Given this information, the agent has to learn how to best select actions.
The state space of the "Visual" environment is composed by the snapshot of the video of the game, meaning that is an array composed by (84, 84, 3) which means, 84 of width and height and 3 channels (R.G.B.).
Four discrete actions are available, corresponding to:
0
- move forward.1
- move backward.2
- turn left.3
- turn right.
The task is episodic, and in order to solve the environment, your agent must get an average score of +13 over 100 consecutive episodes.
- agent.py - Agent class containing Q-Learning algorithm and all supoprt for
Vanilla DQN
,Double DQN
,Dueling DQN
andPriorized Replay Experience DQN
. - model.py - DQN model class setup (containing configuration for
Dueling DQN
) - Navigation.ipynb - Jupyter Notebook for running experiment, with simple navigation (getting state space through vector)
- agent_vision.py - Agent class containing Q-Learning algorithm Visual environment
- model_vision.py - DQN model class setup for Visual environment
- Navigation_Pixels.ipynb - Jupyter Notebook for running experiment, with pixel navigation (getting state space through pixeis)
All models are saved on the subfolder (models). For example, checkpoint.pt is a file which has been saved upon success of achieving the goal, and model.pt is the end model after runing all episodes.
The structure of the notebook follows the following:
- Initial Setup: (setup for parameters of experience, check report for more details)
- Navigation
2.1 Start the Environment: (load environment for the game)
2.2 HelperFunctions: (functions to help the experience, such as Optuna, DQNsearch, ...)
2.3 Baseline DQN: (section to train an agent with the standard parameters, without searching for hyper-parameters)
2.4 Vanilla DQN: (section to train an agent with a Vanilla DQN)
2.5 Double DQN: (section to train an agent with a Double DQN)
2.6 Dueling DQN: (section to train an agent with a Dueling DQN)
2.7 Prioritized Experience Replay (PER) DQN: (section to train an agent with a PER DQN)
2.8 Double DQN with PER: (section to train an agent with a PER and Double DQN at same time)
2.9 Double with Dueling and PER DQN: (section to train an agent with a PER and Double and dueling DQN)
3.0 Plot all results: (section where all the results from above sections are plotted to compare performance)
Each of the sections: [2.3 Baseline DQN
, 2.4 Vanilla DQN
, 2.5 Double DQN
, 2.6 Dueling DQN
, 2.7 Prioritized Replay DQN
, 2.8 Double DQN with PER
, 2.9 Double with Dueling and PER DQN
]
Have subsessions:
2.x.1 Find HyperParameters (Optuna)
2.x.1.1 Ploting Optuna Results
2.x.2 Run (network) DQN
2.x.3 Plot Scores
Each section relevant to the respective DQN.
You can choose whether to use the regular parameters, or try to find them through Optuna
After fulling the requirements on section Getting Started and at requirements.txt 0. Load Jupyter notebook Navigation.ipynb
- Adapt dictionary
SETUP = {
with the desired paramenters - Load the environment. Running sections:
1 Initial Setup
2.1 Start the Environment
2.2. Helper Functions - Then go the section of the Network you want to run [
2.3 Baseline DQN
,2.4 Vanilla DQN
,2.5 Double DQN
,2.6 Dueling DQN
,2.7 Prioritized Replay DQN
,2.8 Double DQN with PER
,2.9 Double with Dueling and PER DQN
] There you will be able to either run Optuna to find the theoretically best parameters, or run the model with the base paramenters.