Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated doc to clarify off-policy RL vs. offline RL #596

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions units/en/unitbonus3/offline-online.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,10 @@ This method has one drawback: the *counterfactual queries problem*. What do we d

There exist some solutions on this topic, but if you want to know more about offline reinforcement learning, you can [watch this video](https://www.youtube.com/watch?v=k08N5a0gG0A)

## Offline RL is Not the Same as Off-Policy RL

Offline reinforcement learning (offline RL) and off-policy reinforcement learning (off-policy RL) are often confused but are distinct methods with different objectives and constraints. Both can use data generated by other policies, but their training scenarios, data interaction capabilities, and challenges differ significantly.Off-policy reinforcement learning allows an agent to learn a policy by using experience collected by another policy (known as the behavior policy). The agent can gather new experiences by interacting with the environment while still using past experiences for training.Offline RL, also known as batch RL, involves training a policy using a fixed dataset collected from previous interactions. The agent cannot interact with the environment during training.

## Further reading

For more information, we recommend you check out the following resources:
Expand Down