feat: Add SkyRL backend for unified trainer by jeewoo-lee · Pull Request #407 · rllm-org/rllm

jeewoo-lee · 2026-02-28T01:31:52Z

Summary

Adds SkyRLBackend, SkyRLEngine, data adapters, launcher, and Hydra config
Improves lazy imports in rollout/init.py with try/except guards so missing backend deps
don't crash on import

New files

Path	Description
`experimental/skyrl/skyrl_backend.py`	`BackendProtocol` implementation for SkyRL
`experimental/skyrl/skyrl_launcher.py`	Ray-based launcher for SkyRL workers
`experimental/skyrl/data_adapter.py`	Converts between rLLM Episodes and SkyRL `TrainingInputBatch`
`experimental/skyrl/transform.py`	Episode-to-DataProto transforms for SkyRL
`experimental/skyrl/skyrl_metrics_utils.py`	SkyRL-specific metric computation
`experimental/rollout/skyrl_engine.py`	Adapts SkyRL's `InferenceEngineClient` to rLLM's `RolloutEngine`
`experimental/config/rllm/backend/skyrl.yaml`	Hydra config bridging SkyRL-native keys to rLLM-common keys
`experimental/test/test_skyrl_*.py/.sh`	Test scripts for simple math and solver-judge

Modified files

pyproject.toml — adds [skyrl] optional dependency group
rollout/__init__.py — adds SkyRLEngine to lazy imports, wraps all engine/type imports in
try/except guards
.gitignore — adds /skyrl and claude/*
config/unified.yaml — adds skyrl config reference

…t sync logic

…fig now added. need to verify this in runtime

…into unified-trainer

…fied-trainer Getting changes Aaron made to fix bugs.

Successfully integrated skyrl to run training. Remaining tasks: - Make rollout engine accept validate=True - Use rllm advantage in rollout engine

…perimental/skyrl/

…ved skyrl_engine into experimental folder

upstream/main to keep the PR focused on SkyRL integration only.

jeewoo-lee · 2026-02-28T01:33:29Z

@listar2000 I have made PR for skyrl. Thank you!

listar2000 · 2026-03-01T00:51:00Z

rllm/experimental/config/unified.yaml

I've lately changed this pattern a bit -- now put this package-dependent searchpath into your skyrl.yaml -- maybe go to verl.yaml for reference. This prevents import warning for non-skyrl users.

Also make sure you check that the script runs Okay with this change.

listar2000

Thanks for the great work @jeewoo-lee -- please see my comments (there are quite a few -- but overall I'm very positive about these contributions. This PR is mergeable once these comments are addressed).

listar2000 · 2026-03-01T00:54:19Z

rllm/experimental/rollout/__init__.py

Have you confirmed that we have to do the below in order to eliminate import error?

I've installed a fresh rllm on a new machine and want to train with Tinker -- turns out there's no issue with the existing code. So I think we can revert these changes (unless the error happened to you during a fresh installation).

listar2000 · 2026-03-01T00:56:46Z

rllm/experimental/rollout/skyrl_engine.py

+from pathlib import Path
+
+# Add skyrl-train to Python path
+skyrl_train_path = Path(__file__).parent.parent.parent.parent / "skyrl" / "skyrl-train"


Any reason that we might need these hard-coded paths (instead of doing normal import)? I'm worried that the users might not install the skyrl as source code and put them in the location you specify.

listar2000 · 2026-03-01T01:00:33Z

rllm/experimental/skyrl/skyrl_launcher.py

+        else:
+            return None
+
+    def _convert_rllm_dataset_to_skyrl_file(self, rllm_dataset: Dataset | None) -> str | None:


Despite seem to work -- this function is a bit strange in the sense that rLLM actually creates two .parquet file for each dataset: one regular and the other one for Verl. Are both imcompatible for SkyRL to use? Also isn't the construction of messages handled by the workflow side?

Also these functions (utilities) should not be in the launcher file.

listar2000 · 2026-03-01T01:03:43Z

rllm/experimental/skyrl/skyrl_metrics_utils.py

Do you have any recent wandb training log? Just want to verify that the logged metrics match the other backends (esp Tinker).

listar2000 · 2026-03-01T01:04:39Z

rllm/experimental/skyrl/transform.py

+from typing import TYPE_CHECKING
+
+from rllm.agents.agent import TrajectoryGroup
+from rllm.engine.rollout import ModelOutput


Use the ModelOutput in experimental.rollout.

listar2000 · 2026-03-01T01:05:42Z

rllm/experimental/test/test_skyrl_simple_math.py

+    - trainer.algorithm.advantage_estimator: Advantage estimator (grpo, gae, rloo, reinforce++)
+    - algorithm.use_rllm: Whether to use rLLM-native advantage computation (default: false)
+    - rllm.stepwise_advantage.enable: Enable stepwise advantage computation
+    - rllm.stepwise_advantage.mode: Stepwise mode (broadcast or per_step)


Per_step is now deprecated -- maybe also check your code for related parts (no need for these logic).

listar2000 · 2026-03-01T01:08:16Z

Btw make sure you pull from main first as I've merged a few other PRs lately @jeewoo-lee

…t tracking.py to upstream - Move hydra searchpath (verl + skyrl) into unified.yaml for centralized config. Hydra searchpath entries are resolved lazily, so pkg://skyrl_train.config won't error for non-skyrl users — it only matters when rllm/backend=skyrl is selected. - Add logger config bridging in skyrl.yaml - Add Modal launcher (modal_run.py) and training script for solver-judge workflow - Add dataset fallback loading in test_skyrl_solver_judge.py - Revert tracking.py to upstream (restore batch uploads, session URLs, eval logging)

# Conflicts: # .gitignore

jeewoo-lee · 2026-03-14T02:08:58Z

Hi @listar2000, I made the changes:

Unified config searchpath — Kept pkg://skyrl_train.config in unified.yaml. Hydra searchpath entries are resolved lazily, so the skyrl entry won't cause errors for non-skyrl users.
Import guard in rollout/__init__.py — Removed the try/except wrappers. Imports are now direct and any ImportError propagates naturally.
Hard-coded paths in skyrl_engine.py — Replaced all sys.path.insert() / pathlib hacks with standard Python imports. If SkyRL dependencies are missing, a clear ImportError with installation instructions is raised.
Dataset conversion — Added _is_skyrl_compatible_dataset() check in data_adapter.py. Compatible parquet files are reused directly; conversion only happens when necessary.
Utility functions in launcher — Moved all metric utilities out of skyrl_launcher.py into a dedicated skyrl_metrics_utils.py, following the same pattern as tinker_metrics_utils.py.
Wandb training log — Will follow up with a wandb run link once training completes on Modal.
ModelOutput import in transform.py — Fixed to import from rllm.experimental.rollout as suggested.
per_step deprecation — Removed all per_step / stepwise_advantage logic. Only broadcast mode is used now.
Pull from main — Merged latest upstream/main.
Modal integration — Added modal_run.py (Modal launcher for building and running rLLM + SkyRL on cloud GPUs) and run_skyrl_solver_judge_modal.sh (solver-judge training script for Modal).

listar2000 and others added 30 commits December 16, 2025 23:00

add non-intrusive experimental branch

fefee7c

change gitignore to resume tracking non-root verl folders

ee401f3

turn unified API to async

1165a34

refactor configurations for multiple backends

494bc02

fix hanging after error and some tinker bugs

1bd261f

unified trainer and added skyrl to .gitignore

5c313ed

started implementing skyrl for unified trainer

3228bac

implementing skyrl for unified trainer. working on figuring out weigh…

5dec525

…t sync logic

fix hanging error after tinker training end and tracking issue

5904925

add launcher and new verl test

6aa934d

bugfix for verl backend

ff04e1f

verl backend bugfix

7411289

working on SkyRLExp

f0ce107

Refactired skyrl_launcher to use internal initialize_ray() method

068604e

implemented skyrl backend. dataloader, advantage computation, and con…

dc77a41

…fig now added. need to verify this in runtime

skyrl specific logger removed because unified trainer handles it

5306912

refactored to inherit RayPPOTrainer

6b18f87

Resolved async and sync mismatch

9664d63

add dependency merge with aarondev

f51f571

Merge branch 'unified-trainer' of https://github.com/jeewoo-lee/rllm …

0931db8

…into unified-trainer

made some small changes

0cb915c

added claude to gitignore

10068b2

Merge branch 'unified-trainer' of github.com:jeewoo-lee/rllm into uni…

bfe94ce

…fied-trainer Getting changes Aaron made to fix bugs.

created data adapter to address key mismatch

0ac3bc4

fixing data adapter to preserve message

cbcf6e7

terminating ray before training

0e429ae

changing skyrl config to fix recursion issue

8b446d8

Skyrl-integration. before fixes

451357f

Integrate skyrl backend for training

731f54f

Successfully integrated skyrl to run training. Remaining tasks: - Make rollout engine accept validate=True - Use rllm advantage in rollout engine

added skyrl dependencies. Moved skyrl dependency requirements into ex…

8b9bb01

…perimental/skyrl/

jeewoo-lee and others added 21 commits February 2, 2026 01:30

cleaned up dependency installations and deleted skyrl_trainer

9e2bc27

merged skyrl integration

f18a755

Fixed formatting

db4bbb6

created solver-judge example. Refactored to remove rllm_generator. Mo…

5b2673e

…ved skyrl_engine into experimental folder

fixing reward issue

02ce885

fixed trajectory grouping

5911065

changed test_skyrl_solver_judge.sh

0815ba5

Pass pre-computed prompt_ids instead of raw messages

6e69c11

fixed metrics bug

74eef9e

changed config to match other tests

60c261d

changed configt

33eaabb

added comments to the QwenParser

bf4ebe5

changed engine import path to experimental

01ced95

refactored to code to follow experimental.rollout pattern

4b944ec

encapsulated metrics

3f303c6

fixed import errors so that we only need to install each backend

46d3208

merged skyrl changes with upstream

4ef9cef

try to run pre-commit

bbcbcb7

fixed formatting

19f72b1

Revert verl_backend.py, base.yaml, and unified_workflow_engine.py to

c019496

upstream/main to keep the PR focused on SkyRL integration only.

Revert metrics.py to upstream version

2266e2c

listar2000 reviewed Mar 1, 2026

View reviewed changes

listar2000 requested changes Mar 1, 2026

View reviewed changes

jeewoo-lee added 4 commits March 8, 2026 15:52

dealt with pr comments

1ec6cfc

Merge remote-tracking branch 'upstream/main' into pr-patch

a9463a7

Merge remote-tracking branch 'upstream/main' into pr-patch

83bb975

# Conflicts: # .gitignore

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add SkyRL backend for unified trainer#407

feat: Add SkyRL backend for unified trainer#407
jeewoo-lee wants to merge 66 commits intorllm-org:mainfrom
jeewoo-lee:main

jeewoo-lee commented Feb 28, 2026

Uh oh!

jeewoo-lee commented Feb 28, 2026

Uh oh!

listar2000 Mar 1, 2026

Uh oh!

listar2000 left a comment

Uh oh!

listar2000 Mar 1, 2026

Uh oh!

listar2000 Mar 1, 2026

Uh oh!

listar2000 Mar 1, 2026

Uh oh!

listar2000 Mar 1, 2026

Uh oh!

listar2000 Mar 1, 2026

Uh oh!

listar2000 Mar 1, 2026

Uh oh!

listar2000 Mar 1, 2026

Uh oh!

listar2000 commented Mar 1, 2026

Uh oh!

jeewoo-lee commented Mar 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jeewoo-lee commented Feb 28, 2026

Summary

New files

Modified files

Uh oh!

jeewoo-lee commented Feb 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

listar2000 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

listar2000 commented Mar 1, 2026

Uh oh!

jeewoo-lee commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jeewoo-lee commented Mar 14, 2026 •

edited

Loading