Skip to content

pufferl: log obs max/min/mean per eval step#458

Merged
eugenevinitsky merged 1 commit into
emerge/temp_trainingfrom
ev/log-obs-stats
May 30, 2026
Merged

pufferl: log obs max/min/mean per eval step#458
eugenevinitsky merged 1 commit into
emerge/temp_trainingfrom
ev/log-obs-stats

Conversation

@eugenevinitsky

@eugenevinitsky eugenevinitsky commented May 30, 2026

Copy link
Copy Markdown

Summary

Adds three wandb scalars — `obs/max`, `obs/min`, `obs/mean` — computed each env step in `PuffeRL.evaluate()` from the same tensor that's about to enter the policy

Three new wandb keys (`obs/max`, `obs/min`, `obs/mean`) collected every
env step in evaluate() and rolled up by mean_and_log. Each value is the
scalar reduction across the full batch and obs-dim, so the wandb panel
shows the absolute range of every feature the policy sees over the eval
window. Useful for catching clipping, mis-normalization, or unbounded
features post-training-tweak.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 30, 2026 16:37

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds three per-env-step observation statistics (obs/max, obs/min, obs/mean) to PuffeRL.evaluate() so they are surfaced in W&B (as environment/obs/{max,min,mean}) to help catch clipping/normalization regressions.

Changes:

  • Append o_device.max/min/mean().item() into self.stats on each evaluate step, before the policy forward pass.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@eugenevinitsky eugenevinitsky merged commit 943da10 into emerge/temp_training May 30, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants