WIP: nightly_best_launch — multi-agent nightly derived from oignons2#467
Draft
eugenevinitsky wants to merge 9 commits into
Draft
WIP: nightly_best_launch — multi-agent nightly derived from oignons2#467eugenevinitsky wants to merge 9 commits into
eugenevinitsky wants to merge 9 commits into
Conversation
Adds scripts/cluster_configs/nightly_best_launch.yaml and scripts/launch_nightly_best.sh — a multi-agent nightly mirroring the single_agent_speed_run/launch_single_agent pair but with the oignons2 (emerge/temp_training/weights/oignons2/config.yaml) policy + reward + perturbation config: 1600 agents across 8 CARLA towns, 3x1024 split network with gigaflow encoder, reward conditioning + randomization + partner blindness + phantom braking on, route-based [20, 60m] goals. Defaults wandb_project=nightly-multi-agent, 3 seeds, --mem 192gb. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
0693de9 to
b232b7e
Compare
…jkstra)
Drive.__init__ on this branch only accepts target_type in {static,dynamic};
oignons2's "dijkstra" raised at first env construction (3 seeds died at 30s).
Goals still follow the route via goal_on_lane=True (default), so dropping to
"static" preserves the chained-route semantics.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sub-envs keep their initial map for the full run instead of cycling every 38400 steps. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ength Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Direct `puffer train puffer_drive` equivalent of nightly_best_launch.yaml for running without SLURM/submit_cluster. Booleans are passed as Python literals (True/False) since pufferl parses values via ast.literal_eval and the C binding rejects lowercase yaml-style true/false. NUM_AGENTS is parameterized because batch_size is auto (num_agents * bptt_horizon), so the on-GPU obs buffer scales with agent count. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The headless EGL render path is compile-gated on <EGL/egl.h> (DRIVE_HAS_EGL in drive.h) and that header isn't installed here, so the binding was built without it and render_backend=egl would fall back to Xvfb/software. Set validation_gigaflow.render=False so the gigaflow eval still runs and logs metrics while skipping the render pass (base.py: _render_pass is gated on self.render). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- First positional arg is now NUM_GPUS: >1 launches DDP via `torchrun --standalone --nproc-per-node N -m pufferlib.pufferl train`, else single-GPU `puffer train` on device 0. Under DDP num_agents is per-rank, so `4 2048` = 2048 agents/GPU = 8192 effective. - Flags moved into a bash array shared by both launch paths. - cd to repo root via $0 so relative config paths (env.map_dir) resolve regardless of invocation cwd. - Enable wandb (project nightly-multi-agent, group Nightly_MultiAgent) with a dated RUN_TAG, matching the cluster nightly config. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pufferl divides train.total_timesteps by world size when LOCAL_RANK is set, so a fixed 10B total became 2.5B/rank on 4 GPUs. Scale the total by NUM_GPUS (PER_RANK_TIMESTEPS * NUM_GPUS) so each rank always targets 10B: 4 GPUs -> 40B total -> 10B/rank (40B aggregate env-steps), 1 GPU -> 10B (no division). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
obs_dropout structurally shrinks the road-obs width via compute_effective_road_obs_count (kept = n * (1 - dropout)), so a policy trained with dropout (e.g. lane 40 / boundary 48) cannot slice the clean-eval obs (dropout forced to 0 by CLEAN_EVAL_OVERRIDES -> lane 80 / boundary 80). The misaligned slide_idx walk read garbage into traffic_control_type/state, and F.one_hot's scatter asserted "index out of bounds" (device-side assert), killing eval at epoch 250. Branch the encoder slicing on self.training: use *_kept while training, full *_n during eval (policy.eval() at benchmark base.py). The per-slot encoders + max-pool are slot-count-agnostic, so the same weights consume both widths. Net diff of PR #453 (restore obs_slots_boundary_kept for obs dim), torch.py only; applied directly to skip the branch's unrelated merge/render commits. Source commits: e7c5031, c613020, e4e8d02, a62f186. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
scripts/cluster_configs/nightly_best_launch.yamlandscripts/launch_nightly_best.sh— a multi-agent nightly mirroring thesingle_agent_speed_run.yaml/launch_single_agent.shpair but with the oignons2 policy + reward + perturbation config (1600 agents across 8 CARLA towns, 3×1024 split network with gigaflow encoder, reward conditioning + randomization + partner blindness + phantom braking on, route-based [20, 60m] goals).<date>_seed{0,1,2}), wandb projectnightly-multi-agent,--mem 192gb, 30h wall.What was derived from oignons2
max_agents_per_env=150,num_agents=1600,dynamics_model=jerk,target_type=dijkstra,scenario_length=3840,resample_frequency=38400, full perturbation + reward conditioning/randomization, oignons2 obs ranges (200m partner / 200m road_front), oignons2 obs dropouts.input_size=256,backbone=3×1024,actor=critic=1024,split_network=true,encoder_gigaflow=true.total_timesteps=125B,minibatch=153600,update_epochs=3,bptt=128,compile=true,precision=bfloat16,normalize_rewards=false,checkpoint_interval=500.map_dir→ localpufferlib/resources/drive/binaries/carla,num_maps=8(vs. oignons2's 16 — we ship 8 CARLA towns).Evals
validation_gigaflowinline (egl render), interval 250 to keep eval ~5% of wall-clock instead of ~85%.validation_replaydisabled (nuPlan bins not on cluster path).behaviors_*disabled.Test plan
9997175,9997176,9997177) train past first eval cycle on h200_tandon without OOM/crashnightly-multi-agentproject with run names2026-06-01_seed{0,1,2}config.yamlmatches oignons2 (3×1024 split network)🤖 Generated with Claude Code