[train] Make vLLMRouter the default router in the new inference codepath#1394
[train] Make vLLMRouter the default router in the new inference codepath#1394
vLLMRouter the default router in the new inference codepath#1394Conversation
Add VLLMRouter class that spawns the vllm-router Rust binary as a subprocess, providing a drop-in alternative to the Python InferenceRouter. Uses consistent_hash policy by default which respects X-Session-ID headers for session-aware routing. Configurable via `generator.inference_engine.router_type="vllm-router"` (default: "internal"). Optional dependency: `pip install skyrl[vllm-router]`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Uses setdefault so users can still override with RUST_LOG=info/debug. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Both "default" and "vllm-router" are internally managed by SkyRL. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
- Remove InferenceRouter and its tests - Remove router_type config — VLLMRouter is now the only router - Add vllm-router to skyrl-train, fsdp, and megatron deps - Remove vllm-router optional extra (no longer needed) - Update all references: main_base, benchmarks, GPU CI, test utils Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
All conflicts resolved in favor of VLLMRouter-only (HEAD). New content from main (render endpoint, session_id extraction) preserved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
vLLMRouter the default router in the new inference codepath
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request removes the Python-based InferenceRouter implementation and standardizes the system on the Rust-based vllm-router via the VLLMRouter wrapper. Key changes include the deletion of router.py and its associated tests, updates to pyproject.toml to include vllm-router as a standard Linux dependency for training and FSDP extras, and the refactoring of CI scripts, benchmarks, and core entry points to use VLLMRouter exclusively. I have no feedback to provide.
There was a problem hiding this comment.
Devin Review found 1 potential issue.
⚠️ 1 issue in files not directly in the diff
⚠️ Config field router_type is orphaned dead code referencing deleted InferenceRouter (skyrl/train/config/config.py:481-485)
The router_type field at skyrl/train/config/config.py:481-485 is still present with its default "default" and docstring referencing InferenceRouter, but the only consumer—_create_router() in skyrl/train/entrypoints/main_base.py—was deleted in this PR. The field is now completely dead: setting router_type to any value (including "default" or "vllm-router") has no effect whatsoever, and the docstring misleadingly tells users they can choose between InferenceRouter (which no longer exists) and vllm-router. This is an incomplete migration—the config field should have been removed or updated alongside the code that consumed it.
View 4 additional findings in Devin Review.
Signed-off-by: SumanthRH <sumanthrh99@gmail.com>
|
What does this PR do?
Makes
vLLMRouterthe default router in the new inference codepath, following the migration plan in #1385Test Plan
I ran the DAPO recipe on Qwen 1.7B https://github.com/NovaSky-AI/SkyRL/blob/e3dc00d43430f5d26a6af086d9f1a0f2dd2faef4/examples/train/algorithms/dapo/run_dapo_qwen3_1.7b_aime.sh
as well as the SkyRLSQL recipe https://github.com/NovaSky-AI/SkyRL/blob/e3dc00d43430f5d26a6af086d9f1a0f2dd2faef4/examples/train/text_to_sql/run_skyrl_sql.sh
Training progresses in a stable way without generation issues
DAPO recipe: Results
SkyRLSQL Recipe: Results
TODO: