[UPDATE] UNIT 1: the two main approaches... #553

romuvt · 2024-07-17T10:24:22Z

When you define stochastic policies, you write:

\pi (a|s) = P [A|s]

LHS is a specific real number in [0,1] while on the RHS you have a probability distribution, don't you?
So I think it should be something like \pi (a|s) = P [A_t = a | S_t = s]. An alternative could be to write on RHS that it is the probability of choosing action a given state s.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UPDATE] UNIT 1: the two main approaches... #553

[UPDATE] UNIT 1: the two main approaches... #553

romuvt commented Jul 17, 2024

[UPDATE] UNIT 1: the two main approaches... #553

[UPDATE] UNIT 1: the two main approaches... #553

Comments

romuvt commented Jul 17, 2024