Unify rollout loop across Evaluator, render(), and SafeEvaluator by eugenevinitsky · Pull Request #395 · Emerge-Lab/PufferDrive

eugenevinitsky · 2026-04-11T17:10:46Z

Summary

Extracts the shared forward-sample-step-break loop into pufferlib/ocean/drive/rollout.py (new module)
Evaluator._run_rollout, pufferl.render_one_map, and SafeEvaluator.render all become thin wrappers around rollout_loop
SafeEvaluator.evaluate and SafeEvaluator.render stop using the fragile hasattr(policy, 'hidden_size') check in favor of the authoritative use_rnn flag from full_config

Motivation

The three rollout sites were 80% identical but had drifted in subtle ways:

Issue	`Evaluator._run_rollout`	`pufferl.render_one_map`	`SafeEvaluator.render`
Continuous action clipping	✅	❌ (latent bug)	✅
Break condition	`truncs.all()`	`done.all() or truncated.all()`	`truncs.all()`
Video filename handling	`set_video_suffix`	glob-before/after diff	`set_video_suffix`
RNN detection	config (`train.use_rnn`)	config (`train.use_rnn`)	`hasattr(policy, 'hidden_size')`

All four got fixed / unified as a byproduct of deduplicating the loop.

Implementation

New rollout_loop(policy, env, device, use_rnn, max_steps, render_ctx, per_env_logs) function lives in a new module. A RenderContext dataclass toggles rendering and carries view_mode / env_id / draw_traces / video_suffix.

Pass render_ctx=None for pure stats rollouts.
Pass a RenderContext for rendering rollouts — the helper calls set_video_suffix once before the first render so the C binding writes the correct mp4 name directly.
max_steps defaults to env.driver_env.episode_length; callers with a fixed cap (render_one_map, SafeEvaluator.render) override explicitly.
Break condition is truncs.all() (matches Evaluator's prior behavior, which was the majority). In Drive, truncs.all() is set in the single c_step where the env auto-resets, so it's the natural end-of-episode signal. Previously render_one_map broke on done.all() or truncated.all(), which cut videos one step early in any config where done fires per-agent for collision/offroad transitions.

Callsites

Evaluator._run_rollout — now 26 lines (was 54). Builds a RenderContext when render_env_idx is set.

pufferl.render_one_map — now uses set_video_suffix directly, eliminating the stale-mp4-cleanup / glob-before-after / post-hoc-rename dance. The C binding names files correctly by construction.

SafeEvaluator.render — trivial wrapper. Plus, SafeEvaluator now reads use_rnn from full_config at init time (flat train_config passed by PuffeRL._run_safe_eval), replacing hasattr(policy, 'hidden_size') in both evaluate() and render().

Net diff

~34 lines deleted overall. +90 in the new module, -124 across the three callsites.

Test plan

Run a short training job that triggers Evaluator.rollout for both self_play and human_replay modes — confirm per-env stats collection still works and video filenames match expectations ({scenario}_sim_state.mp4 etc.)
Run puffer render puffer_drive on a small map_dir with view_mode = "all" — confirm three mp4s per map (_sim_state, _persp, _bev) land in output_dir
Run a short training job with safe_eval.render_safe_eval = True — confirm a video is logged to wandb under render/safe_eval/

Extract the shared forward-sample-step-break cycle into pufferlib/ocean/drive/rollout.py. Three callsites previously duplicated the loop with subtle divergences: - render_one_map was missing continuous-action clipping (could emit out-of-range actions for Normal policies during offline rendering) - render_one_map used `done.all() or truncated.all()` while the others used `truncs.all()` — collapses to one break condition (truncs.all()) - render_one_map inferred video filenames via glob-before/after diff instead of calling set_video_suffix up front - SafeEvaluator used `hasattr(policy, "hidden_size")` as a proxy for "is an RNN policy," which is fragile (non-RNN policies can also expose hidden_size). Now reads use_rnn from full_config at init time, same flag PuffeRL uses internally. The new rollout_loop takes a RenderContext dataclass to toggle rendering. Callers pass env, policy, device, use_rnn, and optional render_ctx. Net deletion: ~34 lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR centralizes the common Drive “forward → sample → step → break” rollout logic into a single helper (rollout_loop) and updates the main callsites (training Evaluator, offline render(), and SafeEvaluator.render) to use it, also standardizing termination and continuous-action clipping behavior.

Changes:

Added pufferlib/ocean/drive/rollout.py with rollout_loop(...) and RenderContext to share rollout + optional rendering.
Refactored Evaluator._run_rollout, pufferl.render_one_map, and SafeEvaluator.render into thin wrappers around rollout_loop.
Switched SafeEvaluator RNN detection from hasattr(policy, "hidden_size") to an explicit use_rnn flag from config.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
pufferlib/pufferl.py	Uses shared `rollout_loop` for offline map rendering; simplifies mp4 naming/handling.
pufferlib/ocean/drive/rollout.py	New shared rollout implementation, including rendering support and action clipping.
pufferlib/ocean/benchmark/evaluator.py	Refactors evaluator rollouts/rendering to shared helper; reads `use_rnn` from config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-11T17:16:50Z

+        if isinstance(logits, torch.distributions.Normal):
+            action_np = np.clip(action_np, env.action_space.low, env.action_space.high)
+
+        obs, _, _, truncs, info = env.step(action_np, per_env_logs=per_env_logs)


rollout_loop always passes per_env_logs= into env.step(...). Vector backends like pufferlib.vector.Serial (used by pufferl.render_one_map) implement step(self, actions) without this kwarg, so rendering will raise TypeError: step() got an unexpected keyword argument 'per_env_logs'. Call env.step(action_np) when per_env_logs is False, and only pass the kwarg when True (optionally raising a clear error if the backend doesn’t support it).

Suggested change

obs, _, _, truncs, info = env.step(action_np, per_env_logs=per_env_logs)

if per_env_logs:

obs, _, _, truncs, info = env.step(action_np, per_env_logs=True)

else:

obs, _, _, truncs, info = env.step(action_np)

Copilot · 2026-04-11T17:16:51Z

+    # Without this, callers have to glob-diff cwd and rename post hoc.
+    if render_ctx is not None and render_ctx.video_suffix:


set_video_suffix is only called when render_ctx.video_suffix is truthy. Since the suffix is stored on the C env (env->video_suffix), reusing the same env across rollouts can leak a previous non-empty suffix into a later rollout that intends to use the default filename. Consider always calling set_video_suffix(render_ctx.video_suffix, ...) whenever render_ctx is provided (including the empty-string case) to reliably clear any prior suffix.

Suggested change

# Without this, callers have to glob-diff cwd and rename post hoc.

if render_ctx is not None and render_ctx.video_suffix:

# Call this whenever render_ctx is provided, including the empty-string

# case, so reused envs do not retain a stale suffix from a prior rollout.

if render_ctx is not None:

Copilot · 2026-04-11T17:16:51Z

+        # Authoritative RNN flag comes from the training config. PuffeRL passes
+        # its flattened train_config as full_config, where args["train"]["use_rnn"]
+        # (set in load_config from rnn_name) lives at the top level. We previously
+        # used hasattr(policy, "hidden_size") as a proxy, which is fragile because
+        # non-RNN policies can also expose hidden_size.


The comment says args["train"]["use_rnn"] ... lives at the top level, but full_config here is PuffeRL.self.config where the flag is actually full_config["use_rnn"] (no train nesting). Reword this to avoid misleading future readers about where the authoritative flag comes from.

Suggested change

# Authoritative RNN flag comes from the training config. PuffeRL passes

# its flattened train_config as full_config, where args["train"]["use_rnn"]

# (set in load_config from rnn_name) lives at the top level. We previously

# used hasattr(policy, "hidden_size") as a proxy, which is fragile because

# non-RNN policies can also expose hidden_size.

# Authoritative RNN flag comes from `full_config`, which PuffeRL passes

# in flattened form for this evaluator. In this code path, the flag is

# stored at `full_config["use_rnn"]` (set during config loading from

# `rnn_name`). We previously used hasattr(policy, "hidden_size") as a

# proxy, which is fragile because non-RNN policies can also expose

# hidden_size.

…, docstring 1. Only pass per_env_logs to env.step when True. Serial.step = the module-level pufferlib.vector.step(vecenv, actions), which has no per_env_logs kwarg. render_one_map uses Serial backend, so the unconditional kwarg pass raised TypeError at runtime. Evaluator's per_env_logs=True path only runs under the native PufferEnv backend where Drive.step does accept it. 2. Call set_video_suffix unconditionally when render_ctx is provided, including the empty-string case, so reused envs can't leak a stale suffix into a default-filename rollout. Current callers create fresh render envs per rollout so this isn't triggered in practice, but the fix is cheap and defensive. 3. Fix docstring: full_config["use_rnn"] is stored at the top level in PuffeRL's flattened train_config, not at full_config["train"]["use_rnn"]. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 11, 2026 17:10

Copilot started reviewing on behalf of eugenevinitsky April 11, 2026 17:11 View session

Copilot AI reviewed Apr 11, 2026

View reviewed changes

eugenevinitsky mentioned this pull request Apr 11, 2026

PuffeRL carries two config shapes (flat train_config + nested full_args) that drift #396

Closed

eugenevinitsky merged commit 02d8ecf into aditya/lean_renders Apr 11, 2026
10 checks passed

eugenevinitsky deleted the ev/unify-rollout branch April 11, 2026 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unify rollout loop across Evaluator, render(), and SafeEvaluator#395

Unify rollout loop across Evaluator, render(), and SafeEvaluator#395
eugenevinitsky merged 2 commits into
aditya/lean_rendersfrom
ev/unify-rollout

eugenevinitsky commented Apr 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Copilot AI Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        obs, _, _, truncs, info = env.step(action_np, per_env_logs=per_env_logs)
+        if per_env_logs:
+            obs, _, _, truncs, info = env.step(action_np, per_env_logs=True)
+        else:
+            obs, _, _, truncs, info = env.step(action_np)

		# Without this, callers have to glob-diff cwd and rename post hoc.
		if render_ctx is not None and render_ctx.video_suffix:

-    # Without this, callers have to glob-diff cwd and rename post hoc.
-    if render_ctx is not None and render_ctx.video_suffix:
+    # Call this whenever render_ctx is provided, including the empty-string
+    # case, so reused envs do not retain a stale suffix from a prior rollout.
+    if render_ctx is not None:

-        # Authoritative RNN flag comes from the training config. PuffeRL passes
-        # its flattened train_config as full_config, where args["train"]["use_rnn"]
-        # (set in load_config from rnn_name) lives at the top level. We previously
-        # used hasattr(policy, "hidden_size") as a proxy, which is fragile because
-        # non-RNN policies can also expose hidden_size.
+        # Authoritative RNN flag comes from `full_config`, which PuffeRL passes
+        # in flattened form for this evaluator. In this code path, the flag is
+        # stored at `full_config["use_rnn"]` (set during config loading from
+        # `rnn_name`). We previously used hasattr(policy, "hidden_size") as a
+        # proxy, which is fragile because non-RNN policies can also expose
+        # hidden_size.

Conversation

eugenevinitsky commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Implementation

Callsites

Net diff

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

eugenevinitsky commented Apr 11, 2026 •

edited

Loading