Skip to content

Commit 2973baf

Browse files
committed
report test
1 parent 1046c96 commit 2973baf

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

report.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Homework3-Policy-Gradient report
22
## problem1 construct a neural network to represent policy
3-
In more complex tasks(atari games, and even in real-world tasks), it's hard to apply policy iteration/ value iteration directly due to large state/action space, requiring large storage and hard to calculate the Q values for all. So we "learn" the Q values or plicy by neural network. Here in problem 1, we want to use a simple neural network \begin f_{Q^*} (s, a;\Theta) $ to represent $ Q^* (s, a) \end where $\Theta$ is the parameters of the nerual network, just like the figure showed below:
3+
In more complex tasks(atari games, and even in real-world tasks), it's hard to apply policy iteration/ value iteration directly due to large state/action space, requiring large storage and hard to calculate the Q values for all. So we "learn" the Q values or plicy by neural network. Here in problem 1, we want to use a simple neural network <img src="https://latex.codecogs.com/gif.latex? f_{Q^*} (s, a;\Theta)"> to represent $ Q^* (s, a) \end where $\Theta$ is the parameters of the nerual network, just like the figure showed below:
44
<img src='pictures/DNNforQ.png' width='300'>
55

66
To implement this, I added two fully connected layers in policy.py file:

0 commit comments

Comments
 (0)