L2 RL Basics

These are my personal lecture notes for Georgia Tech's Reinforcement Learning course (CS 7642, Spring 2024) by Charles Isbell and Michael Littman. All images are taken from the course's lectures unless stated otherwise.

References and further readings

Littman, M. L. (1996). Algorithms for sequential decision-making. Brown University. (Chapter 2)
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine learning, 3, 9-44.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.

Chapters: 4.1-4.8, 5.1-5.4, 5.10, 6.1-6.7, 6.9, 8.1-8.5, 8.12-8.13

Introduction

RL is about the interaction between an agent and an environment
The agent takes actions and the environment responds to the actions with rewards and new states
RL is about learning a behavior that interacts with the environment:
- Behavior structures:
  - Plan: fixed sequence of actions
  - Conditional plan: plan that includes "if" statements
  - Stationary policy (aka universal plan):
    - mapping from states to actions (like a conditional plan but has "if" at every state, so it's a mapping)
    - very large (know what to do in every state)
    - always optimal stationary policy

Evaluating a policy

Calculate the values of state-action-reward sequences generated by a policy:

Map state transitions to immediate rewards (e.g. use reward function, R(s, a))
Truncate sequences according to horizon (e.g. T = 10 steps)
Summarize each sequence (i.e. compute the return for each sequence, e.g. discounted sum of rewards $\sum_{t=0}^T \gamma^t r_t$)
summarize over sequences (average: expected return)

Evaluating a learner

value of returned policy
time to learn:
- computational complexity
- sample complexity
  - How much data it needs?
  - How much time does it take to interact with the environment to gather data?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

L2-RL-basics.md

L2-RL-basics.md

L2 RL Basics

References and further readings

Introduction

Evaluating a policy

Evaluating a learner

Files

L2-RL-basics.md

Latest commit

History

L2-RL-basics.md

File metadata and controls

L2 RL Basics

References and further readings

Introduction

Evaluating a policy

Evaluating a learner