[R3]: Move to new vLLM routed experts format by S1ro1 · Pull Request #2487 · PrimeIntellect-ai/prime-rl

S1ro1 · 2026-05-13T11:59:12Z

PR is ready - verified with verifiers/renderers (we need to pin to main), however waiting: vllm-project/vllm#39568 to be included in vLLM release - expected 0.21.1

expose choices[i].routed_experts from the prime-rl vLLM token path as compact raw-uint8 JSON payloads: {"data": base64(raw_bytes), "shape": [...]}
stitch P/D routed experts by joining RequestOutput.prompt_routed_experts with per-completion decode experts before serializing
keep routed-experts data opaque through verifiers during token truncation; prime-rl decodes at the orchestrator boundary
preserve this branch's routed-experts source of truth: RoutedExperts(data, shape, dtype), explicit dtype maps, and _pack_routed_experts / _unpack_routed_experts for multi-turn stitching
update trainer packing/loading to slice, append, pad, and reconstruct the RoutedExperts transport struct with torch.frombuffer
pin vllm-router to 0.1.25 for the matching raw-uint8 schema and add pybase64
pin deps/verifiers to the cleaned production-equivalent routed-experts response path from verifiers PR [Feat] Multi-lora layer, data packing, optimizer, broadcast and scheduler #1433
disable vLLM async scheduling only for the NIXL routed-experts capture path, where async scheduling leaves placeholder sampled-token state during capture
fix rendered multi-node orchestrator args to use the student client config keys

Related PRs

Router payload format: Feat: downstream perf improvements - bump .25 router#34
Router release trigger: release: v0.1.25 router#35
Verifiers payload format: [Router Replay]: Improve performance by removing Pydantic validation verifiers#1394
Verifiers cleanup against main: Clean routed experts response path verifiers#1433

Verification

uv sync --all-extras
uv run ruff check src/prime_rl/inference/patches.py src/prime_rl/inference/vllm/routed_experts.py
bash -n /beegfs/outputs/qwen3-30b-a3b-router-replay-diag-r3-v3-clean/rl.sbatch
3-node Qwen3 30B A3B routed-experts validation, Slurm job 19354, output /beegfs/outputs/qwen3-30b-a3b-router-replay-diag-r3-v3-clean, node 53 excluded. Orchestrator completed 5/5 steps using the direct renderer rollout client; trainer completed steps 0-3 with finite grad norms and began step 4 before the generated script terminated remaining processes after orchestrator completion.
Log scan for Failed to merge, Rollout error, Aborted rollout, ERROR, Traceback, and Exception under the validation output logs returned no matches.

Note

Medium Risk
Medium risk because it changes the on-wire/transport representation of routed_experts end-to-end (inference responses, orchestrator parsing, batching/packing, and trainer tensor reconstruction), and adds a vLLM VllmConfig.__post_init__ monkey patch that could affect inference startup/validation in disaggregated setups.

Overview
Updates routed-experts handling end-to-end to a new compact format. The vLLM /inference/v1/generate path now captures per-choice routed_experts and serializes it as {"data": base64(raw uint8 bytes), "shape": [...]} (via new inference/vllm/routed_experts.py), replacing the prior numpy/list-style encoding.

Propagates the new representation through the training pipeline. The orchestrator decodes the compact payload to numpy for step stitching, then packs it into a new transport.types.RoutedExperts struct (raw bytes + shape + dtype) carried by TrainingSample/MicroBatch; trainer batching now slices/appends/pads this byte payload and reconstructs tensors with torch.frombuffer.

Adds safety and compatibility guardrails. RL config validation now rejects router replay when inference.kv_cache_offload is enabled, and inference installs a vLLM monkey patch to allow routed-experts capture when using the NIXL KV connector (while still rejecting unsupported PP/v2 runner cases). Dependencies are updated to add pybase64 and bump vllm-router to 0.1.25, with tests updated accordingly.

^{Reviewed by Cursor Bugbot for commit c13b0b3. Bugbot is set up for automated code reviews on this repo. Configure here.}

* Guard checkpoint disk metrics mkdir * Remove test_trainer_utils.py per review feedback Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Simplify ckpt disk metrics guard Drop the rank-0 gate and the disk_usage path fallback per review feedback. Catching FileExistsError on mkdir is sufficient: every rank that races on mkdir either wins or harmlessly catches the BeegFS race, and shutil.disk_usage can then operate on the now-existing ckpt_dir. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…erts # Conflicts: # pyproject.toml # src/prime_rl/inference/patches.py # src/prime_rl/inference/vllm/serving_chat_with_tokens.py # src/prime_rl/inference/vllm/serving_tokens.py # src/prime_rl/orchestrator/trajectories.py # src/prime_rl/trainer/batch.py # src/prime_rl/trainer/rl/data.py # tests/unit/orchestrator/test_batch.py # uv.lock

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit aa1fc36. Configure here.}

S1ro1 added 2 commits May 13, 2026 17:41

feat: wire r3 v3 routed experts stack

4895316

feat: reset routing caches on policy update

721a874

S1ro1 force-pushed the feat/r3-v3-routed-experts branch from bf79561 to 721a874 Compare May 13, 2026 12:13

S1ro1 added 3 commits May 13, 2026 18:02

fix: rely on native vllm routed experts

baa6935

fix: pin routed experts dependencies for ci

18e9a7a

fix: update scheduler tests for prefix reset

90d2e3a

S1ro1 force-pushed the feat/r3-v3-routed-experts branch from bc91c30 to e55328f Compare May 14, 2026 14:09

fix: clean routed experts replay integration

1fea38e

S1ro1 force-pushed the feat/r3-v3-routed-experts branch from e55328f to 1fea38e Compare May 14, 2026 14:13

S1ro1 added 3 commits May 14, 2026 20:39

fix: keep routed experts transport first class

2c019e1

fix: keep routed experts on samples

803b4ae

fix: use upstream vllm nightly wheel

9092eca

S1ro1 marked this pull request as ready for review May 14, 2026 15:52

S1ro1 and others added 2 commits May 14, 2026 21:23

fix: pin latest routed experts verifiers

f49caec

Merge branch 'main' into feat/r3-v3-routed-experts

9438623

cursor Bot reviewed May 14, 2026

View reviewed changes

Comment thread src/prime_rl/orchestrator/trajectories.py

S1ro1 added 3 commits May 15, 2026 00:28

fix: pin routed experts dependencies

61a0388

fix: allow routed experts with nixl

094d233

style: format nixl patch

d6d06b4

cursor Bot reviewed May 14, 2026

View reviewed changes

Comment thread src/prime_rl/trainer/rl/data.py

samsja previously approved these changes May 15, 2026

View reviewed changes

Use raw uint8 routed experts payloads

9317cef

S1ro1 dismissed samsja’s stale review via 9317cef May 15, 2026 21:20

cursor Bot reviewed May 15, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

S1ro1 and others added 6 commits May 16, 2026 03:08

Remove unrelated rlm-swe dependency

3cb8345

Pin vllm-router 0.1.25 wheel

66a2984

Keep verifiers routed experts opaque

777aae7

Forward renderer thinking preservation config

a74e7f5

Avoid duplicate routed experts in token responses

a723ac0

S1ro1 added 2 commits May 19, 2026 19:46

Pin verifiers routed experts sidecar

e2cffa1

Pin cleaned verifiers routed experts handling

0edc0c5

S1ro1 force-pushed the feat/r3-v3-routed-experts branch 2 times, most recently from 64d3f2c to cb3c559 Compare May 19, 2026 15:46

Pin rebased verifiers routed experts handling

4402d7e

S1ro1 force-pushed the feat/r3-v3-routed-experts branch from cb3c559 to 4402d7e Compare May 19, 2026 15:53

S1ro1 added 2 commits May 21, 2026 20:45

fix: remove unrelated prime-rl changes

7076bb1

cursor Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/prime_rl/trainer/batch.py

S1ro1 force-pushed the feat/r3-v3-routed-experts branch from 11fe0ad to de71036 Compare May 21, 2026 21:31

Merge branch 'main' into feat/r3-v3-routed-experts

aa1fc36

cursor Bot reviewed May 21, 2026

View reviewed changes

Comment thread src/prime_rl/orchestrator/trajectories.py Outdated

S1ro1 and others added 5 commits May 22, 2026 05:07

fix: pack routed experts as typed payloads

62cc96b

refactor: inline routed experts trajectory packing

ae6b8b3

fix: restore trajectory tokenization helpers

6de7fd1

refactor: simplify routed experts packing

f50ad90

Merge branch 'main' into feat/r3-v3-routed-experts

48fc98c

S1ro1 changed the title ~~feat: wire r3 v3 routed experts replay~~ [R3]: Move to new vLLM routed experts format May 22, 2026

Merge branch 'main' into feat/r3-v3-routed-experts

c13b0b3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R3]: Move to new vLLM routed experts format#2487

[R3]: Move to new vLLM routed experts format#2487
S1ro1 wants to merge 33 commits into
mainfrom
feat/r3-v3-routed-experts

S1ro1 commented May 13, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

S1ro1 commented May 13, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR is ready - verified with verifiers/renderers (we need to pin to main), however waiting: vllm-project/vllm#39568 to be included in vLLM release - expected 0.21.1

Related PRs

Verification

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

S1ro1 commented May 13, 2026 •

edited by cursor Bot

Loading