Skip to content

Commit ef3312d

Browse files
committed
fix done flag
1 parent ef90832 commit ef3312d

File tree

1 file changed

+1
-5
lines changed

1 file changed

+1
-5
lines changed

tf2.0/rl_trader.py

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -288,11 +288,7 @@ def replay(self, batch_size=32):
288288
done = minibatch['d']
289289

290290
# Calculate the tentative target: Q(s',a)
291-
target = rewards + self.gamma * np.amax(self.model.predict(next_states), axis=1)
292-
293-
# The value of terminal states is zero
294-
# so set the target to be the reward only
295-
target[done] = rewards[done]
291+
target = rewards + (1 - done) * self.gamma * np.amax(self.model.predict(next_states), axis=1)
296292

297293
# With the Keras API, the target (usually) must have the same
298294
# shape as the predictions.

0 commit comments

Comments
 (0)