You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, it's not fully explicit in the algorithm box but section 3.5.2 (link) explains in detail. We use that to normalize each dimension of the random prior's output. That way, if the learned component outputs 0 (which it may do for things you've never seen), the initial bonus is still 1, which is roughly the behavior you want on totally novel observations.
Hope that clears it up, and sorry for the slow response -- I didn't see the comment until just now.
In algo 1, the running mean and variance is updated at step 12, but not used anywhere.
Can you elaborate please?
The text was updated successfully, but these errors were encountered: