Skip to content

simoninithomas/Policy_gradients_CartPole

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Policy_gradients_CartPole

Policy Gradient Learning with CartPole-v0
Cart Pole game

Getting started

The challenge of the week was: solving a simple game using policy gradients (other than pong). I've chosen CartPole v1.0 because that's a basic game and there is a ton of documentations/tutorials about that kind of game.

Goal

CartPole-v0 defines "solving" as getting average reward of 195.0 over 100 consecutive trials.

Dependencies

Usage

To be more readable and more easier to explain I use Jupyter Notebook

Open your terminal, go to the Policy_gradients_CartPole folder and launch notebook

jupyter notebook

Walkthrough

The CartPole

Cart Pole game

4 kinds of information given by the state:

  • Position of the cart
  • Velocity of the cart
  • Position of the pole
  • Velocity of the pole

An agent can push the cart:
  • 0: left
  • 1: right

The NN

Originally taken from, Siraj's Solving the basic game of Pong video modified with my exceptional skills in paint 😂

The advantage function

What we must understand here is that immediate rewards are more important than delayed rewards.

That's why we use gamma as a discount factor

Discount reward

Why ? Because delayed rewards have less impact: imagine you screw up at step 5 (the bar is too leaning) we don't care of rewards after that because you will lose that's why the reward is more and more discounted

Originally taken from, DQN Bootcamp Lecture: Core Lecture 4b Pong from Pixels -- Andrej Karpathy

Remember that:

  • A positive advantage --> make the action more likely to happen in the future, at that state
  • A negative advantage --> make the action less likely to happen in the future, at that state

Acknowledgments

This was made possible thanks these 2 fantastic resources:

About

A Policy Gradient Learning with CartPole-v0 for Siraj Raval's challenge

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published