Skip to content

Unify rollout loop across Evaluator, render(), and SafeEvaluator#395

Merged
eugenevinitsky merged 2 commits into
aditya/lean_rendersfrom
ev/unify-rollout
Apr 11, 2026
Merged

Unify rollout loop across Evaluator, render(), and SafeEvaluator#395
eugenevinitsky merged 2 commits into
aditya/lean_rendersfrom
ev/unify-rollout

Conversation

@eugenevinitsky
Copy link
Copy Markdown

@eugenevinitsky eugenevinitsky commented Apr 11, 2026

Summary

  • Extracts the shared forward-sample-step-break loop into pufferlib/ocean/drive/rollout.py (new module)
  • Evaluator._run_rollout, pufferl.render_one_map, and SafeEvaluator.render all become thin wrappers around rollout_loop
  • SafeEvaluator.evaluate and SafeEvaluator.render stop using the fragile hasattr(policy, 'hidden_size') check in favor of the authoritative use_rnn flag from full_config

Motivation

The three rollout sites were 80% identical but had drifted in subtle ways:

Issue Evaluator._run_rollout pufferl.render_one_map SafeEvaluator.render
Continuous action clipping ❌ (latent bug)
Break condition truncs.all() done.all() or truncated.all() truncs.all()
Video filename handling set_video_suffix glob-before/after diff set_video_suffix
RNN detection config (train.use_rnn) config (train.use_rnn) hasattr(policy, 'hidden_size')

All four got fixed / unified as a byproduct of deduplicating the loop.

Implementation

New rollout_loop(policy, env, device, use_rnn, max_steps, render_ctx, per_env_logs) function lives in a new module. A RenderContext dataclass toggles rendering and carries view_mode / env_id / draw_traces / video_suffix.

  • Pass render_ctx=None for pure stats rollouts.
  • Pass a RenderContext for rendering rollouts — the helper calls set_video_suffix once before the first render so the C binding writes the correct mp4 name directly.
  • max_steps defaults to env.driver_env.episode_length; callers with a fixed cap (render_one_map, SafeEvaluator.render) override explicitly.
  • Break condition is truncs.all() (matches Evaluator's prior behavior, which was the majority). In Drive, truncs.all() is set in the single c_step where the env auto-resets, so it's the natural end-of-episode signal. Previously render_one_map broke on done.all() or truncated.all(), which cut videos one step early in any config where done fires per-agent for collision/offroad transitions.

Callsites

Evaluator._run_rollout — now 26 lines (was 54). Builds a RenderContext when render_env_idx is set.

pufferl.render_one_map — now uses set_video_suffix directly, eliminating the stale-mp4-cleanup / glob-before-after / post-hoc-rename dance. The C binding names files correctly by construction.

SafeEvaluator.render — trivial wrapper. Plus, SafeEvaluator now reads use_rnn from full_config at init time (flat train_config passed by PuffeRL._run_safe_eval), replacing hasattr(policy, 'hidden_size') in both evaluate() and render().

Net diff

~34 lines deleted overall. +90 in the new module, -124 across the three callsites.

Test plan

  • Run a short training job that triggers Evaluator.rollout for both self_play and human_replay modes — confirm per-env stats collection still works and video filenames match expectations ({scenario}_sim_state.mp4 etc.)
  • Run puffer render puffer_drive on a small map_dir with view_mode = "all" — confirm three mp4s per map (_sim_state, _persp, _bev) land in output_dir
  • Run a short training job with safe_eval.render_safe_eval = True — confirm a video is logged to wandb under render/safe_eval/

Extract the shared forward-sample-step-break cycle into
pufferlib/ocean/drive/rollout.py. Three callsites previously
duplicated the loop with subtle divergences:

- render_one_map was missing continuous-action clipping (could emit
  out-of-range actions for Normal policies during offline rendering)
- render_one_map used `done.all() or truncated.all()` while the others
  used `truncs.all()` — collapses to one break condition (truncs.all())
- render_one_map inferred video filenames via glob-before/after diff
  instead of calling set_video_suffix up front
- SafeEvaluator used `hasattr(policy, "hidden_size")` as a proxy for
  "is an RNN policy," which is fragile (non-RNN policies can also
  expose hidden_size). Now reads use_rnn from full_config at init time,
  same flag PuffeRL uses internally.

The new rollout_loop takes a RenderContext dataclass to toggle
rendering. Callers pass env, policy, device, use_rnn, and optional
render_ctx. Net deletion: ~34 lines.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 11, 2026 17:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR centralizes the common Drive “forward → sample → step → break” rollout logic into a single helper (rollout_loop) and updates the main callsites (training Evaluator, offline render(), and SafeEvaluator.render) to use it, also standardizing termination and continuous-action clipping behavior.

Changes:

  • Added pufferlib/ocean/drive/rollout.py with rollout_loop(...) and RenderContext to share rollout + optional rendering.
  • Refactored Evaluator._run_rollout, pufferl.render_one_map, and SafeEvaluator.render into thin wrappers around rollout_loop.
  • Switched SafeEvaluator RNN detection from hasattr(policy, "hidden_size") to an explicit use_rnn flag from config.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
pufferlib/pufferl.py Uses shared rollout_loop for offline map rendering; simplifies mp4 naming/handling.
pufferlib/ocean/drive/rollout.py New shared rollout implementation, including rendering support and action clipping.
pufferlib/ocean/benchmark/evaluator.py Refactors evaluator rollouts/rendering to shared helper; reads use_rnn from config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pufferlib/ocean/drive/rollout.py Outdated
if isinstance(logits, torch.distributions.Normal):
action_np = np.clip(action_np, env.action_space.low, env.action_space.high)

obs, _, _, truncs, info = env.step(action_np, per_env_logs=per_env_logs)
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rollout_loop always passes per_env_logs= into env.step(...). Vector backends like pufferlib.vector.Serial (used by pufferl.render_one_map) implement step(self, actions) without this kwarg, so rendering will raise TypeError: step() got an unexpected keyword argument 'per_env_logs'. Call env.step(action_np) when per_env_logs is False, and only pass the kwarg when True (optionally raising a clear error if the backend doesn’t support it).

Suggested change
obs, _, _, truncs, info = env.step(action_np, per_env_logs=per_env_logs)
if per_env_logs:
obs, _, _, truncs, info = env.step(action_np, per_env_logs=True)
else:
obs, _, _, truncs, info = env.step(action_np)

Copilot uses AI. Check for mistakes.
Comment thread pufferlib/ocean/drive/rollout.py Outdated
Comment on lines +82 to +83
# Without this, callers have to glob-diff cwd and rename post hoc.
if render_ctx is not None and render_ctx.video_suffix:
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set_video_suffix is only called when render_ctx.video_suffix is truthy. Since the suffix is stored on the C env (env->video_suffix), reusing the same env across rollouts can leak a previous non-empty suffix into a later rollout that intends to use the default filename. Consider always calling set_video_suffix(render_ctx.video_suffix, ...) whenever render_ctx is provided (including the empty-string case) to reliably clear any prior suffix.

Suggested change
# Without this, callers have to glob-diff cwd and rename post hoc.
if render_ctx is not None and render_ctx.video_suffix:
# Call this whenever render_ctx is provided, including the empty-string
# case, so reused envs do not retain a stale suffix from a prior rollout.
if render_ctx is not None:

Copilot uses AI. Check for mistakes.
Comment thread pufferlib/ocean/benchmark/evaluator.py Outdated
Comment on lines +1140 to +1144
# Authoritative RNN flag comes from the training config. PuffeRL passes
# its flattened train_config as full_config, where args["train"]["use_rnn"]
# (set in load_config from rnn_name) lives at the top level. We previously
# used hasattr(policy, "hidden_size") as a proxy, which is fragile because
# non-RNN policies can also expose hidden_size.
Copy link

Copilot AI Apr 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says args["train"]["use_rnn"] ... lives at the top level, but full_config here is PuffeRL.self.config where the flag is actually full_config["use_rnn"] (no train nesting). Reword this to avoid misleading future readers about where the authoritative flag comes from.

Suggested change
# Authoritative RNN flag comes from the training config. PuffeRL passes
# its flattened train_config as full_config, where args["train"]["use_rnn"]
# (set in load_config from rnn_name) lives at the top level. We previously
# used hasattr(policy, "hidden_size") as a proxy, which is fragile because
# non-RNN policies can also expose hidden_size.
# Authoritative RNN flag comes from `full_config`, which PuffeRL passes
# in flattened form for this evaluator. In this code path, the flag is
# stored at `full_config["use_rnn"]` (set during config loading from
# `rnn_name`). We previously used hasattr(policy, "hidden_size") as a
# proxy, which is fragile because non-RNN policies can also expose
# hidden_size.

Copilot uses AI. Check for mistakes.
…, docstring

1. Only pass per_env_logs to env.step when True. Serial.step = the
   module-level pufferlib.vector.step(vecenv, actions), which has no
   per_env_logs kwarg. render_one_map uses Serial backend, so the
   unconditional kwarg pass raised TypeError at runtime. Evaluator's
   per_env_logs=True path only runs under the native PufferEnv backend
   where Drive.step does accept it.

2. Call set_video_suffix unconditionally when render_ctx is provided,
   including the empty-string case, so reused envs can't leak a stale
   suffix into a default-filename rollout. Current callers create fresh
   render envs per rollout so this isn't triggered in practice, but the
   fix is cheap and defensive.

3. Fix docstring: full_config["use_rnn"] is stored at the top level in
   PuffeRL's flattened train_config, not at full_config["train"]["use_rnn"].

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@eugenevinitsky eugenevinitsky merged commit 02d8ecf into aditya/lean_renders Apr 11, 2026
10 checks passed
@eugenevinitsky eugenevinitsky deleted the ev/unify-rollout branch April 11, 2026 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants