Unify rollout loop across Evaluator, render(), and SafeEvaluator#395
Conversation
Extract the shared forward-sample-step-break cycle into pufferlib/ocean/drive/rollout.py. Three callsites previously duplicated the loop with subtle divergences: - render_one_map was missing continuous-action clipping (could emit out-of-range actions for Normal policies during offline rendering) - render_one_map used `done.all() or truncated.all()` while the others used `truncs.all()` — collapses to one break condition (truncs.all()) - render_one_map inferred video filenames via glob-before/after diff instead of calling set_video_suffix up front - SafeEvaluator used `hasattr(policy, "hidden_size")` as a proxy for "is an RNN policy," which is fragile (non-RNN policies can also expose hidden_size). Now reads use_rnn from full_config at init time, same flag PuffeRL uses internally. The new rollout_loop takes a RenderContext dataclass to toggle rendering. Callers pass env, policy, device, use_rnn, and optional render_ctx. Net deletion: ~34 lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR centralizes the common Drive “forward → sample → step → break” rollout logic into a single helper (rollout_loop) and updates the main callsites (training Evaluator, offline render(), and SafeEvaluator.render) to use it, also standardizing termination and continuous-action clipping behavior.
Changes:
- Added
pufferlib/ocean/drive/rollout.pywithrollout_loop(...)andRenderContextto share rollout + optional rendering. - Refactored
Evaluator._run_rollout,pufferl.render_one_map, andSafeEvaluator.renderinto thin wrappers aroundrollout_loop. - Switched
SafeEvaluatorRNN detection fromhasattr(policy, "hidden_size")to an explicituse_rnnflag from config.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pufferlib/pufferl.py | Uses shared rollout_loop for offline map rendering; simplifies mp4 naming/handling. |
| pufferlib/ocean/drive/rollout.py | New shared rollout implementation, including rendering support and action clipping. |
| pufferlib/ocean/benchmark/evaluator.py | Refactors evaluator rollouts/rendering to shared helper; reads use_rnn from config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if isinstance(logits, torch.distributions.Normal): | ||
| action_np = np.clip(action_np, env.action_space.low, env.action_space.high) | ||
|
|
||
| obs, _, _, truncs, info = env.step(action_np, per_env_logs=per_env_logs) |
There was a problem hiding this comment.
rollout_loop always passes per_env_logs= into env.step(...). Vector backends like pufferlib.vector.Serial (used by pufferl.render_one_map) implement step(self, actions) without this kwarg, so rendering will raise TypeError: step() got an unexpected keyword argument 'per_env_logs'. Call env.step(action_np) when per_env_logs is False, and only pass the kwarg when True (optionally raising a clear error if the backend doesn’t support it).
| obs, _, _, truncs, info = env.step(action_np, per_env_logs=per_env_logs) | |
| if per_env_logs: | |
| obs, _, _, truncs, info = env.step(action_np, per_env_logs=True) | |
| else: | |
| obs, _, _, truncs, info = env.step(action_np) |
| # Without this, callers have to glob-diff cwd and rename post hoc. | ||
| if render_ctx is not None and render_ctx.video_suffix: |
There was a problem hiding this comment.
set_video_suffix is only called when render_ctx.video_suffix is truthy. Since the suffix is stored on the C env (env->video_suffix), reusing the same env across rollouts can leak a previous non-empty suffix into a later rollout that intends to use the default filename. Consider always calling set_video_suffix(render_ctx.video_suffix, ...) whenever render_ctx is provided (including the empty-string case) to reliably clear any prior suffix.
| # Without this, callers have to glob-diff cwd and rename post hoc. | |
| if render_ctx is not None and render_ctx.video_suffix: | |
| # Call this whenever render_ctx is provided, including the empty-string | |
| # case, so reused envs do not retain a stale suffix from a prior rollout. | |
| if render_ctx is not None: |
| # Authoritative RNN flag comes from the training config. PuffeRL passes | ||
| # its flattened train_config as full_config, where args["train"]["use_rnn"] | ||
| # (set in load_config from rnn_name) lives at the top level. We previously | ||
| # used hasattr(policy, "hidden_size") as a proxy, which is fragile because | ||
| # non-RNN policies can also expose hidden_size. |
There was a problem hiding this comment.
The comment says args["train"]["use_rnn"] ... lives at the top level, but full_config here is PuffeRL.self.config where the flag is actually full_config["use_rnn"] (no train nesting). Reword this to avoid misleading future readers about where the authoritative flag comes from.
| # Authoritative RNN flag comes from the training config. PuffeRL passes | |
| # its flattened train_config as full_config, where args["train"]["use_rnn"] | |
| # (set in load_config from rnn_name) lives at the top level. We previously | |
| # used hasattr(policy, "hidden_size") as a proxy, which is fragile because | |
| # non-RNN policies can also expose hidden_size. | |
| # Authoritative RNN flag comes from `full_config`, which PuffeRL passes | |
| # in flattened form for this evaluator. In this code path, the flag is | |
| # stored at `full_config["use_rnn"]` (set during config loading from | |
| # `rnn_name`). We previously used hasattr(policy, "hidden_size") as a | |
| # proxy, which is fragile because non-RNN policies can also expose | |
| # hidden_size. |
…, docstring 1. Only pass per_env_logs to env.step when True. Serial.step = the module-level pufferlib.vector.step(vecenv, actions), which has no per_env_logs kwarg. render_one_map uses Serial backend, so the unconditional kwarg pass raised TypeError at runtime. Evaluator's per_env_logs=True path only runs under the native PufferEnv backend where Drive.step does accept it. 2. Call set_video_suffix unconditionally when render_ctx is provided, including the empty-string case, so reused envs can't leak a stale suffix into a default-filename rollout. Current callers create fresh render envs per rollout so this isn't triggered in practice, but the fix is cheap and defensive. 3. Fix docstring: full_config["use_rnn"] is stored at the top level in PuffeRL's flattened train_config, not at full_config["train"]["use_rnn"]. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
pufferlib/ocean/drive/rollout.py(new module)Evaluator._run_rollout,pufferl.render_one_map, andSafeEvaluator.renderall become thin wrappers aroundrollout_loopSafeEvaluator.evaluateandSafeEvaluator.renderstop using the fragilehasattr(policy, 'hidden_size')check in favor of the authoritativeuse_rnnflag fromfull_configMotivation
The three rollout sites were 80% identical but had drifted in subtle ways:
Evaluator._run_rolloutpufferl.render_one_mapSafeEvaluator.rendertruncs.all()done.all() or truncated.all()truncs.all()set_video_suffixset_video_suffixtrain.use_rnn)train.use_rnn)hasattr(policy, 'hidden_size')All four got fixed / unified as a byproduct of deduplicating the loop.
Implementation
New
rollout_loop(policy, env, device, use_rnn, max_steps, render_ctx, per_env_logs)function lives in a new module. ARenderContextdataclass toggles rendering and carries view_mode / env_id / draw_traces / video_suffix.render_ctx=Nonefor pure stats rollouts.RenderContextfor rendering rollouts — the helper callsset_video_suffixonce before the first render so the C binding writes the correct mp4 name directly.max_stepsdefaults toenv.driver_env.episode_length; callers with a fixed cap (render_one_map,SafeEvaluator.render) override explicitly.truncs.all()(matches Evaluator's prior behavior, which was the majority). In Drive,truncs.all()is set in the singlec_stepwhere the env auto-resets, so it's the natural end-of-episode signal. Previouslyrender_one_mapbroke ondone.all() or truncated.all(), which cut videos one step early in any config wheredonefires per-agent for collision/offroad transitions.Callsites
Evaluator._run_rollout— now 26 lines (was 54). Builds aRenderContextwhenrender_env_idxis set.pufferl.render_one_map— now usesset_video_suffixdirectly, eliminating the stale-mp4-cleanup / glob-before-after / post-hoc-rename dance. The C binding names files correctly by construction.SafeEvaluator.render— trivial wrapper. Plus,SafeEvaluatornow readsuse_rnnfromfull_configat init time (flat train_config passed byPuffeRL._run_safe_eval), replacinghasattr(policy, 'hidden_size')in bothevaluate()andrender().Net diff
~34 lines deleted overall. +90 in the new module, -124 across the three callsites.
Test plan
Evaluator.rolloutfor bothself_playandhuman_replaymodes — confirm per-env stats collection still works and video filenames match expectations ({scenario}_sim_state.mp4etc.)puffer render puffer_driveon a smallmap_dirwithview_mode = "all"— confirm three mp4s per map (_sim_state,_persp,_bev) land inoutput_dirsafe_eval.render_safe_eval = True— confirm a video is logged to wandb underrender/safe_eval/