WIP: fix two DDP wandb logging bugs (duplicate runs + dropped eval metrics)#468
Open
eugenevinitsky wants to merge 1 commit into
Open
WIP: fix two DDP wandb logging bugs (duplicate runs + dropped eval metrics)#468eugenevinitsky wants to merge 1 commit into
eugenevinitsky wants to merge 1 commit into
Conversation
1. Only rank 0 creates the run logger. Logger creation was ungated, so under
torchrun every rank called wandb.init()/NeptuneLogger and produced world_size
duplicate runs. Non-rank-0 ranks now keep logger=None (PuffeRL wraps it in a
NoLogger).
2. Eval logs at the aggregate step, not the per-rank one. Training logs at
agent_steps = dist_sum(global_step) (summed across ranks) while eval logged at
the raw per-rank global_step, which is world_size x smaller. wandb dropped the
eval metrics as non-monotonic ("step ... less than current step ... ignored"),
so validation metrics never showed up. Stash agent_steps in mean_and_log and
log eval at it (both the in-loop and final force=True maybe_run). Single-GPU is
unchanged since dist_sum returns the raw value when not distributed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Fixes two DDP-only logging bugs: (1) every rank was creating its own wandb/Neptune/TB run, producing world_size duplicate runs per training job, and (2) eval metrics were being silently rejected by wandb because eval logged at the raw per-rank global_step while training logged at the rank-summed agent_steps, making the eval step non-monotonic.
Changes:
- Gate logger construction in
train()to rank 0 only; non-rank-0 ranks passlogger=Noneand are wrapped inNoLoggerby PuffeRL. - Cache the
agent_steps(rank-summedglobal_step) computed insidemean_and_logonself.agent_steps, and use it as the step for both the in-loop and finalforce=True_eval_manager.maybe_runcalls. - Initialize
self.agent_steps = 0inPuffeRL.__init__.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
vcharraut
approved these changes
Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WIP. Two independent DDP-only logging bugs surfaced while running the multi-agent nightly on 4 GPUs.
1. Duplicate wandb runs (one per rank)
Logger creation in
train()was ungated, so undertorchrunevery rank calledwandb.init()→world_sizeduplicate runs in the same group. Now only rank 0 builds the logger; other ranks keeplogger=None(PuffeRL wraps that inNoLogger). Eval was already rank-0-only, so nothing else changes.2. Eval metrics silently dropped by wandb
Training logs at
agent_steps = dist_sum(global_step)(summed across ranks); eval logged at the raw per-rankglobal_step, which isworld_size× smaller. wandb rejects every eval log as non-monotonic