-
Notifications
You must be signed in to change notification settings - Fork 286
Description
Note to Community
This roadmap is a living document, and we welcome community input and feedback. We will update this roadmap to link to specific Issues and PRs for each sub-task.
Overview
SkyRL's focus in H1 2026 will be pushing performance for large scale MoE async RL and improving usability via the tinker API.
Tinker-ification
Overview: Improving SkyRL's Tinker API server and expanding the supported set of configurations and modalities.
The first step here has been the unification of the skyrl-train and skyrl-tx packages into skyrl #1145
For a detailed list of issues, please see #1380
Large Scale MoE RL
Overview: Integrate the latest algorithmic improvements and improve the scalability of SkyRL for RL training on large MoE models with Megatron + vLLM.
For a detailed list of megatron backend specific tasks, see #1392
- Support R3 - Rollout Routing Replay ([skyrl-train] Enable routing replay in SkyRL #815)
- Optimize data transfer between training and inference for MoE training
- Improve the megatron training backend (SkyRL Megatron Development Tracker H1 2026 #1392)
- Provide robust support (LoRA + full finetuning) for RL training on large scale MoE models (Kimi K2.5, GLM-4.7/5, Qwen3.5-397B-A17B, etc..)
- Provide E2E recipes for async RL on 300B+ parameter models
- Support Ray RDT for P2P weight syncing ([skyrl-train] Integrate Ray Core RDT for weight syncing #977)
- Improve support and provide recipes for MoE + LoRA with the megatron backend
Step Wise + Async RL
- Continue to improve support for fully async RL at scale
- Improve support for step-wise training ([train][StepWise] Feature parity of step-wise training and default (non step-wise) training, and improvements #1278) and provide training examples for self-summarization
Simplify the Inference Stack
Overview: Simplify the inference engine stack in SkyRL to standardize around HTTP, allowing seamless integrations with high-performance routing and scaling layers for large scale RL. Introduce native APIs into vLLM for RL to improve ease-of-use for RL frameworks like SkyRL.
- Introduce a new HTTP-based inference server codepath
- Introduce native weight syncing APIs into vLLM ([RFC]: Native Weight Syncing APIs vllm-project/vllm#31848)
- Improve pause/ resume semantics in vLLM for wide-EP settings ([RFC]: Extending vLLM Pause/Resume API to support in-flight requests and DPEP scenarios vllm-project/vllm#32103)
- Integrate native weight syncing APIs into SkyRL
- Migrate fully to the new inference codepath
- Improve support for wide-EP serving
- Provide support for P/D deployments
Agent framework Integrations
Overview: Improve integrations with agent frameworks like Harbor
- Improve step wise training in SkyRL with custom generators [train][StepWise] Feature parity of step-wise training and default (non step-wise) training, and improvements #1278
- [Harbor] Support agents beyond Terminus 2 [Harbor] Support agents beyond terminus 2 such as open hands #1184
VLM Support
Overview: Support VLM training in SkyRL.
For more details, please see #1200
Improve entrypoint experience in SkyRL
Overview: Improve usability of SkyRL by providing a more pythonic instantation experience and improving the CLI
- Provide pythonic configuration and instantiation ([train] Pythonic Configs 1/N - Introduce configuration dataclasses for the current YAML and migrate tests #1001 , [train] Pythonic Configs 2/N: Switch to new dataclasses in entrypoint scripts; Change instantiation signatures #1187 )
- Improve CLI friction points with the SkyRL-train backend [tinker] Implement better validation for CLI parameters with the skyrl-train backend #1279