Skip to content

WIP: nightly_best_launch — multi-agent nightly derived from oignons2#467

Draft
eugenevinitsky wants to merge 9 commits into
emerge/temp_trainingfrom
nightly_best_launch
Draft

WIP: nightly_best_launch — multi-agent nightly derived from oignons2#467
eugenevinitsky wants to merge 9 commits into
emerge/temp_trainingfrom
nightly_best_launch

Conversation

@eugenevinitsky
Copy link
Copy Markdown

@eugenevinitsky eugenevinitsky commented Jun 1, 2026

Summary

  • Adds scripts/cluster_configs/nightly_best_launch.yaml and scripts/launch_nightly_best.sh — a multi-agent nightly mirroring the single_agent_speed_run.yaml / launch_single_agent.sh pair but with the oignons2 policy + reward + perturbation config (1600 agents across 8 CARLA towns, 3×1024 split network with gigaflow encoder, reward conditioning + randomization + partner blindness + phantom braking on, route-based [20, 60m] goals).
  • Default 3 seeds, date-stamped wandb run names (<date>_seed{0,1,2}), wandb project nightly-multi-agent, --mem 192gb, 30h wall.

What was derived from oignons2

  • env: max_agents_per_env=150, num_agents=1600, dynamics_model=jerk, target_type=dijkstra, scenario_length=3840, resample_frequency=38400, full perturbation + reward conditioning/randomization, oignons2 obs ranges (200m partner / 200m road_front), oignons2 obs dropouts.
  • policy: input_size=256, backbone=3×1024, actor=critic=1024, split_network=true, encoder_gigaflow=true.
  • train: total_timesteps=125B, minibatch=153600, update_epochs=3, bptt=128, compile=true, precision=bfloat16, normalize_rewards=false, checkpoint_interval=500.
  • adapted: map_dir → local pufferlib/resources/drive/binaries/carla, num_maps=8 (vs. oignons2's 16 — we ship 8 CARLA towns).

Evals

  • validation_gigaflow inline (egl render), interval 250 to keep eval ~5% of wall-clock instead of ~85%.
  • validation_replay disabled (nuPlan bins not on cluster path).
  • All behaviors_* disabled.

Test plan

  • Verify 3 seeds (9997175, 9997176, 9997177) train past first eval cycle on h200_tandon without OOM/crash
  • Confirm wandb runs land in nightly-multi-agent project with run names 2026-06-01_seed{0,1,2}
  • Verify policy architecture in saved config.yaml matches oignons2 (3×1024 split network)

🤖 Generated with Claude Code

Adds scripts/cluster_configs/nightly_best_launch.yaml and
scripts/launch_nightly_best.sh — a multi-agent nightly mirroring the
single_agent_speed_run/launch_single_agent pair but with the oignons2
(emerge/temp_training/weights/oignons2/config.yaml) policy + reward +
perturbation config: 1600 agents across 8 CARLA towns, 3x1024 split
network with gigaflow encoder, reward conditioning + randomization +
partner blindness + phantom braking on, route-based [20, 60m] goals.
Defaults wandb_project=nightly-multi-agent, 3 seeds, --mem 192gb.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Eugene Vinitsky and others added 8 commits June 1, 2026 19:49
…jkstra)

Drive.__init__ on this branch only accepts target_type in {static,dynamic};
oignons2's "dijkstra" raised at first env construction (3 seeds died at 30s).
Goals still follow the route via goal_on_lane=True (default), so dropping to
"static" preserves the chained-route semantics.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sub-envs keep their initial map for the full run instead of cycling every
38400 steps.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ength

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Direct `puffer train puffer_drive` equivalent of nightly_best_launch.yaml for
running without SLURM/submit_cluster. Booleans are passed as Python literals
(True/False) since pufferl parses values via ast.literal_eval and the C binding
rejects lowercase yaml-style true/false. NUM_AGENTS is parameterized because
batch_size is auto (num_agents * bptt_horizon), so the on-GPU obs buffer scales
with agent count.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The headless EGL render path is compile-gated on <EGL/egl.h> (DRIVE_HAS_EGL in
drive.h) and that header isn't installed here, so the binding was built without
it and render_backend=egl would fall back to Xvfb/software. Set
validation_gigaflow.render=False so the gigaflow eval still runs and logs
metrics while skipping the render pass (base.py: _render_pass is gated on
self.render).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- First positional arg is now NUM_GPUS: >1 launches DDP via
  `torchrun --standalone --nproc-per-node N -m pufferlib.pufferl train`,
  else single-GPU `puffer train` on device 0. Under DDP num_agents is
  per-rank, so `4 2048` = 2048 agents/GPU = 8192 effective.
- Flags moved into a bash array shared by both launch paths.
- cd to repo root via $0 so relative config paths (env.map_dir) resolve
  regardless of invocation cwd.
- Enable wandb (project nightly-multi-agent, group Nightly_MultiAgent)
  with a dated RUN_TAG, matching the cluster nightly config.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pufferl divides train.total_timesteps by world size when LOCAL_RANK is set, so
a fixed 10B total became 2.5B/rank on 4 GPUs. Scale the total by NUM_GPUS
(PER_RANK_TIMESTEPS * NUM_GPUS) so each rank always targets 10B: 4 GPUs -> 40B
total -> 10B/rank (40B aggregate env-steps), 1 GPU -> 10B (no division).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
obs_dropout structurally shrinks the road-obs width via
compute_effective_road_obs_count (kept = n * (1 - dropout)), so a policy
trained with dropout (e.g. lane 40 / boundary 48) cannot slice the clean-eval
obs (dropout forced to 0 by CLEAN_EVAL_OVERRIDES -> lane 80 / boundary 80).
The misaligned slide_idx walk read garbage into traffic_control_type/state,
and F.one_hot's scatter asserted "index out of bounds" (device-side assert),
killing eval at epoch 250.

Branch the encoder slicing on self.training: use *_kept while training, full
*_n during eval (policy.eval() at benchmark base.py). The per-slot encoders +
max-pool are slot-count-agnostic, so the same weights consume both widths.

Net diff of PR #453 (restore obs_slots_boundary_kept for obs dim), torch.py
only; applied directly to skip the branch's unrelated merge/render commits.
Source commits: e7c5031, c613020, e4e8d02, a62f186.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant