Skip to content

Commit 7d808de

Browse files
authored
Update report.md
1 parent 2973baf commit 7d808de

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

report.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Homework3-Policy-Gradient report
22
## problem1 construct a neural network to represent policy
3-
In more complex tasks(atari games, and even in real-world tasks), it's hard to apply policy iteration/ value iteration directly due to large state/action space, requiring large storage and hard to calculate the Q values for all. So we "learn" the Q values or plicy by neural network. Here in problem 1, we want to use a simple neural network <img src="https://latex.codecogs.com/gif.latex? f_{Q^*} (s, a;\Theta)"> to represent $ Q^* (s, a) \end where $\Theta$ is the parameters of the nerual network, just like the figure showed below:
3+
In more complex tasks(atari games, and even in real-world tasks), it's hard to apply policy iteration/ value iteration directly due to large state/action space, requiring large storage and hard to calculate the Q values for all. So we "learn" the Q values or plicy by neural network. Here in problem 1, we want to use a simple neural network <img src="https://latex.codecogs.com/gif.latex?f_{Q^*}(s,%20a;\Theta)%22"> to represent $ Q^* (s, a) \end where $\Theta$ is the parameters of the nerual network, just like the figure showed below:
44
<img src='pictures/DNNforQ.png' width='300'>
55

66
To implement this, I added two fully connected layers in policy.py file:
@@ -54,4 +54,4 @@ Here we use ```util.discount``` to calculate the advantages by discount_rate and
5454
a = util.discount(a, self.discount_rate * LAMBDA)
5555
```
5656
The GAE agent converged in 73 episodes.
57-
<img src ='pictures/p6.png'>
57+
<img src ='pictures/p6.png'>

0 commit comments

Comments
 (0)