Skip to content

joao-d-oliveira/RL-SmartAgent-BananaGame

Repository files navigation

AI Agent 🤖 to navigate in a Banana 🍌 world 🌐

Trained Agent

Introduction

This project aims to explore the power of teaching an agent through Reinforced Learning (RL) to navigate on a Banana World.

The agents uses a DQN Network with a Deep Q-Learning Aldorithm to learn how to navigate efficiently in the virtual world collecting bananas.

Options for different Networks

The implementation options for: Vanilla DQN, Double DQN, Dueling DQN and Priorized Replay Experience DQN.
Please check under Instructions on how to activate each of this options


Getting Started

  1. You need to have installed the requirements (specially mlagents==0.4.0). Due to deprecated libraries, I've included a python folder which will help with installation of the system.

    • Clone the repository: git clone https://github.com/joao-d-oliveira/RL-SmartAgent-BananaGame.git
    • Go to python folder: cd RL-SmartAgent-BananaGame/python
    • Compile and install needed libraries pip install .
  2. Download the environment from one of the links below Download only the environment that matches your operating system:

    (For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.

    (For AWS or Collab) If you'd like to train the agent on AWS (and have not enabled a virtual screen), then please use this link to obtain the environment.

2.1 In case you prefer to test the Visual Environment (where the states are defined by the video instead of a vector that indicates the state). please Download instead bellow:

  1. Place the downloaded file for your environment in the DRLND GitHub repository, in the RL-SmartAgent-BananaGame folder, and unzip (or decompress) the file.

Project Details

Rules of The Game

A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of your agent is to collect as many yellow bananas as possible while avoiding blue bananas.

State Space

The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction. Given this information, the agent has to learn how to best select actions.

The state space of the "Visual" environment is composed by the snapshot of the video of the game, meaning that is an array composed by (84, 84, 3) which means, 84 of width and height and 3 channels (R.G.B.).

Action Space

Four discrete actions are available, corresponding to:

  • 0 - move forward.
  • 1 - move backward.
  • 2 - turn left.
  • 3 - turn right.

Conditions to consider solved

The task is episodic, and in order to solve the environment, your agent must get an average score of +13 over 100 consecutive episodes.

Instructions

Files

Code

  1. agent.py - Agent class containing Q-Learning algorithm and all supoprt for Vanilla DQN, Double DQN, Dueling DQN and Priorized Replay Experience DQN.
  2. model.py - DQN model class setup (containing configuration for Dueling DQN)
  3. Navigation.ipynb - Jupyter Notebook for running experiment, with simple navigation (getting state space through vector)

  1. agent_vision.py - Agent class containing Q-Learning algorithm Visual environment
  2. model_vision.py - DQN model class setup for Visual environment
  3. Navigation_Pixels.ipynb - Jupyter Notebook for running experiment, with pixel navigation (getting state space through pixeis)

Documentation

  1. README.md - This file
  2. Report.md - Detailed Report on the project

Models

All models are saved on the subfolder (models). For example, checkpoint.pt is a file which has been saved upon success of achieving the goal, and model.pt is the end model after runing all episodes.

Running Normal navigation with state space of 37 dimensions

Structure of Notebook

The structure of the notebook follows the following:

  1. Initial Setup: (setup for parameters of experience, check report for more details)
  2. Navigation
    2.1 Start the Environment: (load environment for the game)
    2.2 HelperFunctions: (functions to help the experience, such as Optuna, DQNsearch, ...)
    2.3 Baseline DQN: (section to train an agent with the standard parameters, without searching for hyper-parameters)
    2.4 Vanilla DQN: (section to train an agent with a Vanilla DQN)
    2.5 Double DQN: (section to train an agent with a Double DQN)
    2.6 Dueling DQN: (section to train an agent with a Dueling DQN)
    2.7 Prioritized Experience Replay (PER) DQN: (section to train an agent with a PER DQN)
    2.8 Double DQN with PER: (section to train an agent with a PER and Double DQN at same time)
    2.9 Double with Dueling and PER DQN: (section to train an agent with a PER and Double and dueling DQN)
    3.0 Plot all results: (section where all the results from above sections are plotted to compare performance)

Each of the sections: [2.3 Baseline DQN, 2.4 Vanilla DQN, 2.5 Double DQN, 2.6 Dueling DQN, 2.7 Prioritized Replay DQN, 2.8 Double DQN with PER, 2.9 Double with Dueling and PER DQN]

Have subsessions:

2.x.1 Find HyperParameters (Optuna)
2.x.1.1 Ploting Optuna Results
2.x.2 Run (network) DQN
2.x.3 Plot Scores

Each section relevant to the respective DQN.
You can choose whether to use the regular parameters, or try to find them through Optuna

Running

After fulling the requirements on section Getting Started and at requirements.txt 0. Load Jupyter notebook Navigation.ipynb

  1. Adapt dictionary SETUP = { with the desired paramenters
  2. Load the environment. Running sections:

    1 Initial Setup
    2.1 Start the Environment
    2.2. Helper Functions

  3. Then go the section of the Network you want to run [2.3 Baseline DQN, 2.4 Vanilla DQN, 2.5 Double DQN, 2.6 Dueling DQN, 2.7 Prioritized Replay DQN, 2.8 Double DQN with PER, 2.9 Double with Dueling and PER DQN] There you will be able to either run Optuna to find the theoretically best parameters, or run the model with the base paramenters.

About

OpenAI Banana Game Reinforced Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published