Skip to content

WIP: fix two DDP wandb logging bugs (duplicate runs + dropped eval metrics)#468

Open
eugenevinitsky wants to merge 1 commit into
emerge/temp_trainingfrom
ev/fix-ddp-wandb-logging
Open

WIP: fix two DDP wandb logging bugs (duplicate runs + dropped eval metrics)#468
eugenevinitsky wants to merge 1 commit into
emerge/temp_trainingfrom
ev/fix-ddp-wandb-logging

Conversation

@eugenevinitsky
Copy link
Copy Markdown

@eugenevinitsky eugenevinitsky commented Jun 2, 2026

WIP. Two independent DDP-only logging bugs surfaced while running the multi-agent nightly on 4 GPUs.

1. Duplicate wandb runs (one per rank)

Logger creation in train() was ungated, so under torchrun every rank called wandb.init()world_size duplicate runs in the same group. Now only rank 0 builds the logger; other ranks keep logger=None (PuffeRL wraps that in NoLogger). Eval was already rank-0-only, so nothing else changes.

2. Eval metrics silently dropped by wandb

Training logs at agent_steps = dist_sum(global_step) (summed across ranks); eval logged at the raw per-rank global_step, which is world_size× smaller. wandb rejects every eval log as non-monotonic

1. Only rank 0 creates the run logger. Logger creation was ungated, so under
   torchrun every rank called wandb.init()/NeptuneLogger and produced world_size
   duplicate runs. Non-rank-0 ranks now keep logger=None (PuffeRL wraps it in a
   NoLogger).

2. Eval logs at the aggregate step, not the per-rank one. Training logs at
   agent_steps = dist_sum(global_step) (summed across ranks) while eval logged at
   the raw per-rank global_step, which is world_size x smaller. wandb dropped the
   eval metrics as non-monotonic ("step ... less than current step ... ignored"),
   so validation metrics never showed up. Stash agent_steps in mean_and_log and
   log eval at it (both the in-loop and final force=True maybe_run). Single-GPU is
   unchanged since dist_sum returns the raw value when not distributed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 2, 2026 11:37
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes two DDP-only logging bugs: (1) every rank was creating its own wandb/Neptune/TB run, producing world_size duplicate runs per training job, and (2) eval metrics were being silently rejected by wandb because eval logged at the raw per-rank global_step while training logged at the rank-summed agent_steps, making the eval step non-monotonic.

Changes:

  • Gate logger construction in train() to rank 0 only; non-rank-0 ranks pass logger=None and are wrapped in NoLogger by PuffeRL.
  • Cache the agent_steps (rank-summed global_step) computed inside mean_and_log on self.agent_steps, and use it as the step for both the in-loop and final force=True _eval_manager.maybe_run calls.
  • Initialize self.agent_steps = 0 in PuffeRL.__init__.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants