Policy_gradients_CartPole

Policy Gradient Learning with CartPole-v0

Getting started

The challenge of the week was: solving a simple game using policy gradients (other than pong). I've chosen CartPole v1.0 because that's a basic game and there is a ton of documentations/tutorials about that kind of game.

Goal

CartPole-v0 defines "solving" as getting average reward of 195.0 over 100 consecutive trials.

Dependencies

numpy
gym https://github.com/openai/gym
tensorflow

Usage

To be more readable and more easier to explain I use Jupyter Notebook

Open your terminal, go to the Policy_gradients_CartPole folder and launch notebook

jupyter notebook

Walkthrough

The CartPole

4 kinds of information given by the state:

Position of the cart
Velocity of the cart
Position of the pole
Velocity of the pole

An agent can push the cart:

0: left
1: right

The NN

Originally taken from, Siraj's Solving the basic game of Pong video modified with my exceptional skills in paint 😂

The advantage function

What we must understand here is that immediate rewards are more important than delayed rewards.

That's why we use gamma as a discount factor

Why ? Because delayed rewards have less impact: imagine you screw up at step 5 (the bar is too leaning) we don't care of rewards after that because you will lose that's why the reward is more and more discounted

Originally taken from, DQN Bootcamp Lecture: Core Lecture 4b Pong from Pixels -- Andrej Karpathy

Remember that:

A positive advantage --> make the action more likely to happen in the future, at that state

A negative advantage --> make the action less likely to happen in the future, at that state

Acknowledgments

This was made possible thanks these 2 fantastic resources:

Simple Reinforcement Learning with Tensorflow: Part 2 - Policy-based Agents : this article helps me to define a part of the architecture and helps me a lot for the training part.

Policy gradients for reinforcement learning in TensorFlow

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
checkpoints		checkpoints
README.md		README.md
Siraj's Challenge Policy Gradient Learning.ipynb		Siraj's Challenge Policy Gradient Learning.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Policy_gradients_CartPole

Getting started

Goal

Dependencies

Usage

Walkthrough

The CartPole

The NN

The advantage function

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

simoninithomas/Policy_gradients_CartPole

Folders and files

Latest commit

History

Repository files navigation

Policy_gradients_CartPole

Getting started

Goal

Dependencies

Usage

Walkthrough

The CartPole

The NN

The advantage function

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages