Skip to content

Commit

Permalink
Merge pull request #466 from lutzvdb/patch-2
Browse files Browse the repository at this point in the history
Update mid-way-recap.mdx
  • Loading branch information
simoninithomas authored Jan 24, 2024
2 parents 4966807 + ca29ddf commit 605ce60
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion units/en/unit2/mid-way-recap.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ We have two types of value-based functions:
- Action-value function: outputs the expected return if **the agent starts in a given state, takes a given action at that state** and then acts accordingly to the policy forever after.
- In value-based methods, rather than learning the policy, **we define the policy by hand** and we learn a value function. If we have an optimal value function, we **will have an optimal policy.**

There are two types of methods to learn a policy for a value function:
There are two types of methods to update the value function:

- With *the Monte Carlo method*, we update the value function from a complete episode, and so we **use the actual discounted return of this episode.**
- With *the TD Learning method,* we update the value function from a step, replacing the unknown \\(G_t\\) with **an estimated return called the TD target.**
Expand Down

0 comments on commit 605ce60

Please sign in to comment.