Skip to content

Conversation

@ertsiger
Copy link

While experimenting with the maze_plr.py example, I realized that the metrics weighted_score and mean_score where being reported as -Infinity in Weights and Biases.

After investigating a bit, I saw this was not a problem when sampling new levels: the replace_cond condition in the _insert_new method from level_sampler.py ensures that -jnp.inf scores cannot be pushed into the buffer since the minimum score in the buffer is precisely that amount.

Instead, the problem is introduced when levels are replayed and their scores need to be updated in the buffer. If the rollout for a given level doesn't contain at least one full episode (i.e., the rollout does not end in a terminal state or reach the maximum number of steps), the score is -jnp.inf. The update method from the LevelSampler does not account for this possibility, and can hence introduce -jnp.inf scores leading to problems in the weighted_score and mean_score metrics computation; besides, I think it would make the sampling of the associated levels based on staleness only.

This phenomenon does not occur with the default PPO rollout length used in jaxued (256) since the maximum episode length used in Maze is 250; therefore, it is guaranteed that each rollout will contain at least one episode, hence avoiding the -jnp.inf. By setting the rollout length to a value lower than 250 (e.g., 128), the phenomenon is easily reproduced.

I investigated whether other UED codebases perform a similar check to the one I have introduced. For example, minimax does it here (note that ignore_val is -jnp.inf, like in your approach).

My suggestion is to perform a similar check to the one in minimax, which seems to do the job as shown below. You can see that for rollout-128-bug there are many "bullet points" at the bottom of the plot representing the -inf. I've followed a similar structure to the one you used for insert_new.

image

Hope this is useful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant