Update report.md

orchinia · web-flow · commit 7d808de41ad5 · 2018-01-17T21:45:27.000+08:00
diff --git a/report.md b/report.md
@@ -1,6 +1,6 @@
 # Homework3-Policy-Gradient report
 ## problem1 construct a neural network to represent policy
-   In more complex tasks(atari games, and even in real-world tasks), it's hard to apply policy iteration/ value iteration directly due to large state/action space, requiring large storage and hard to calculate the Q values for all. So we "learn" the Q values or plicy by neural network. Here in problem 1, we want to use a simple neural network <img src="https://latex.codecogs.com/gif.latex? f_{Q^*} (s, a;\Theta)"> to represent $ Q^* (s, a) \end where $\Theta$ is the parameters of the nerual network, just like the figure showed below:
+   In more complex tasks(atari games, and even in real-world tasks), it's hard to apply policy iteration/ value iteration directly due to large state/action space, requiring large storage and hard to calculate the Q values for all. So we "learn" the Q values or plicy by neural network. Here in problem 1, we want to use a simple neural network <img src="https://latex.codecogs.com/gif.latex?f_{Q^*}(s,%20a;\Theta)%22"> to represent $ Q^* (s, a) \end where $\Theta$ is the parameters of the nerual network, just like the figure showed below:
    <img src='pictures/DNNforQ.png' width='300'>
    
    To implement this, I added two fully connected layers in policy.py file:
@@ -54,4 +54,4 @@ Here we use ```util.discount``` to calculate the advantages by discount_rate and
 a = util.discount(a, self.discount_rate * LAMBDA)
 ```
 The GAE agent converged in 73 episodes.
-<img src ='pictures/p6.png'>
+<img src ='pictures/p6.png'>