Skip to content

[R3]: Move to new vLLM routed experts format#2487

Open
S1ro1 wants to merge 33 commits into
mainfrom
feat/r3-v3-routed-experts
Open

[R3]: Move to new vLLM routed experts format#2487
S1ro1 wants to merge 33 commits into
mainfrom
feat/r3-v3-routed-experts

Conversation

@S1ro1
Copy link
Copy Markdown
Collaborator

@S1ro1 S1ro1 commented May 13, 2026

PR is ready - verified with verifiers/renderers (we need to pin to main), however waiting: vllm-project/vllm#39568 to be included in vLLM release - expected 0.21.1

  • expose choices[i].routed_experts from the prime-rl vLLM token path as compact raw-uint8 JSON payloads: {"data": base64(raw_bytes), "shape": [...]}
  • stitch P/D routed experts by joining RequestOutput.prompt_routed_experts with per-completion decode experts before serializing
  • keep routed-experts data opaque through verifiers during token truncation; prime-rl decodes at the orchestrator boundary
  • preserve this branch's routed-experts source of truth: RoutedExperts(data, shape, dtype), explicit dtype maps, and _pack_routed_experts / _unpack_routed_experts for multi-turn stitching
  • update trainer packing/loading to slice, append, pad, and reconstruct the RoutedExperts transport struct with torch.frombuffer
  • pin vllm-router to 0.1.25 for the matching raw-uint8 schema and add pybase64
  • pin deps/verifiers to the cleaned production-equivalent routed-experts response path from verifiers PR [Feat] Multi-lora layer, data packing, optimizer, broadcast and scheduler #1433
  • disable vLLM async scheduling only for the NIXL routed-experts capture path, where async scheduling leaves placeholder sampled-token state during capture
  • fix rendered multi-node orchestrator args to use the student client config keys

Related PRs

Verification

  • uv sync --all-extras
  • uv run ruff check src/prime_rl/inference/patches.py src/prime_rl/inference/vllm/routed_experts.py
  • bash -n /beegfs/outputs/qwen3-30b-a3b-router-replay-diag-r3-v3-clean/rl.sbatch
  • 3-node Qwen3 30B A3B routed-experts validation, Slurm job 19354, output /beegfs/outputs/qwen3-30b-a3b-router-replay-diag-r3-v3-clean, node 53 excluded. Orchestrator completed 5/5 steps using the direct renderer rollout client; trainer completed steps 0-3 with finite grad norms and began step 4 before the generated script terminated remaining processes after orchestrator completion.
  • Log scan for Failed to merge, Rollout error, Aborted rollout, ERROR, Traceback, and Exception under the validation output logs returned no matches.

Note

Medium Risk
Medium risk because it changes the on-wire/transport representation of routed_experts end-to-end (inference responses, orchestrator parsing, batching/packing, and trainer tensor reconstruction), and adds a vLLM VllmConfig.__post_init__ monkey patch that could affect inference startup/validation in disaggregated setups.

Overview
Updates routed-experts handling end-to-end to a new compact format. The vLLM /inference/v1/generate path now captures per-choice routed_experts and serializes it as {"data": base64(raw uint8 bytes), "shape": [...]} (via new inference/vllm/routed_experts.py), replacing the prior numpy/list-style encoding.

Propagates the new representation through the training pipeline. The orchestrator decodes the compact payload to numpy for step stitching, then packs it into a new transport.types.RoutedExperts struct (raw bytes + shape + dtype) carried by TrainingSample/MicroBatch; trainer batching now slices/appends/pads this byte payload and reconstructs tensors with torch.frombuffer.

Adds safety and compatibility guardrails. RL config validation now rejects router replay when inference.kv_cache_offload is enabled, and inference installs a vLLM monkey patch to allow routed-experts capture when using the NIXL KV connector (while still rejecting unsupported PP/v2 runner cases). Dependencies are updated to add pybase64 and bump vllm-router to 0.1.25, with tests updated accordingly.

Reviewed by Cursor Bugbot for commit c13b0b3. Bugbot is set up for automated code reviews on this repo. Configure here.

@S1ro1 S1ro1 force-pushed the feat/r3-v3-routed-experts branch from bf79561 to 721a874 Compare May 13, 2026 12:13
@S1ro1 S1ro1 force-pushed the feat/r3-v3-routed-experts branch from bc91c30 to e55328f Compare May 14, 2026 14:09
@S1ro1 S1ro1 force-pushed the feat/r3-v3-routed-experts branch from e55328f to 1fea38e Compare May 14, 2026 14:13
@S1ro1 S1ro1 marked this pull request as ready for review May 14, 2026 15:52
Comment thread src/prime_rl/orchestrator/trajectories.py
Comment thread src/prime_rl/trainer/rl/data.py
samsja
samsja previously approved these changes May 15, 2026
Comment thread pyproject.toml Outdated
S1ro1 and others added 6 commits May 16, 2026 03:08
* Guard checkpoint disk metrics mkdir

* Remove test_trainer_utils.py per review feedback

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Simplify ckpt disk metrics guard

Drop the rank-0 gate and the disk_usage path fallback per review feedback.
Catching FileExistsError on mkdir is sufficient: every rank that races on
mkdir either wins or harmlessly catches the BeegFS race, and shutil.disk_usage
can then operate on the now-existing ckpt_dir.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@S1ro1 S1ro1 force-pushed the feat/r3-v3-routed-experts branch 2 times, most recently from 64d3f2c to cb3c559 Compare May 19, 2026 15:46
@S1ro1 S1ro1 force-pushed the feat/r3-v3-routed-experts branch from cb3c559 to 4402d7e Compare May 19, 2026 15:53
S1ro1 added 2 commits May 21, 2026 20:45
…erts

# Conflicts:
#	pyproject.toml
#	src/prime_rl/inference/patches.py
#	src/prime_rl/inference/vllm/serving_chat_with_tokens.py
#	src/prime_rl/inference/vllm/serving_tokens.py
#	src/prime_rl/orchestrator/trajectories.py
#	src/prime_rl/trainer/batch.py
#	src/prime_rl/trainer/rl/data.py
#	tests/unit/orchestrator/test_batch.py
#	uv.lock
Comment thread src/prime_rl/trainer/batch.py
@S1ro1 S1ro1 force-pushed the feat/r3-v3-routed-experts branch from 11fe0ad to de71036 Compare May 21, 2026 21:31
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit aa1fc36. Configure here.

Comment thread src/prime_rl/orchestrator/trajectories.py Outdated
@S1ro1 S1ro1 changed the title feat: wire r3 v3 routed experts replay [R3]: Move to new vLLM routed experts format May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants