Skip to content

feat: Add SkyRL backend for unified trainer#407

Open
jeewoo-lee wants to merge 66 commits intorllm-org:mainfrom
jeewoo-lee:main
Open

feat: Add SkyRL backend for unified trainer#407
jeewoo-lee wants to merge 66 commits intorllm-org:mainfrom
jeewoo-lee:main

Conversation

@jeewoo-lee
Copy link
Contributor

Summary

  • Adds SkyRLBackend, SkyRLEngine, data adapters, launcher, and Hydra config
  • Improves lazy imports in rollout/init.py with try/except guards so missing backend deps
    don't crash on import

New files

Path Description
experimental/skyrl/skyrl_backend.py BackendProtocol implementation for SkyRL
experimental/skyrl/skyrl_launcher.py Ray-based launcher for SkyRL workers
experimental/skyrl/data_adapter.py Converts between rLLM Episodes and SkyRL TrainingInputBatch
experimental/skyrl/transform.py Episode-to-DataProto transforms for SkyRL
experimental/skyrl/skyrl_metrics_utils.py SkyRL-specific metric computation
experimental/rollout/skyrl_engine.py Adapts SkyRL's InferenceEngineClient to rLLM's RolloutEngine
experimental/config/rllm/backend/skyrl.yaml Hydra config bridging SkyRL-native keys to rLLM-common keys
experimental/test/test_skyrl_*.py/.sh Test scripts for simple math and solver-judge

Modified files

  • pyproject.toml — adds [skyrl] optional dependency group
  • rollout/__init__.py — adds SkyRLEngine to lazy imports, wraps all engine/type imports in
    try/except guards
  • .gitignore — adds /skyrl and claude/*
  • config/unified.yaml — adds skyrl config reference

listar2000 and others added 30 commits December 16, 2025 23:00
…fig now added. need to verify this in runtime
…fied-trainer

Getting changes Aaron made to fix bugs.
Successfully integrated skyrl to run training. Remaining tasks:
- Make rollout engine accept validate=True
- Use rllm advantage in rollout engine
@jeewoo-lee
Copy link
Contributor Author

@listar2000 I have made PR for skyrl. Thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've lately changed this pattern a bit -- now put this package-dependent searchpath into your skyrl.yaml -- maybe go to verl.yaml for reference. This prevents import warning for non-skyrl users.

Also make sure you check that the script runs Okay with this change.

Copy link
Collaborator

@listar2000 listar2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work @jeewoo-lee -- please see my comments (there are quite a few -- but overall I'm very positive about these contributions. This PR is mergeable once these comments are addressed).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you confirmed that we have to do the below in order to eliminate import error?

I've installed a fresh rllm on a new machine and want to train with Tinker -- turns out there's no issue with the existing code. So I think we can revert these changes (unless the error happened to you during a fresh installation).

from pathlib import Path

# Add skyrl-train to Python path
skyrl_train_path = Path(__file__).parent.parent.parent.parent / "skyrl" / "skyrl-train"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason that we might need these hard-coded paths (instead of doing normal import)? I'm worried that the users might not install the skyrl as source code and put them in the location you specify.

else:
return None

def _convert_rllm_dataset_to_skyrl_file(self, rllm_dataset: Dataset | None) -> str | None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite seem to work -- this function is a bit strange in the sense that rLLM actually creates two .parquet file for each dataset: one regular and the other one for Verl. Are both imcompatible for SkyRL to use? Also isn't the construction of messages handled by the workflow side?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also these functions (utilities) should not be in the launcher file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any recent wandb training log? Just want to verify that the logged metrics match the other backends (esp Tinker).

from typing import TYPE_CHECKING

from rllm.agents.agent import TrajectoryGroup
from rllm.engine.rollout import ModelOutput
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the ModelOutput in experimental.rollout.

- trainer.algorithm.advantage_estimator: Advantage estimator (grpo, gae, rloo, reinforce++)
- algorithm.use_rllm: Whether to use rLLM-native advantage computation (default: false)
- rllm.stepwise_advantage.enable: Enable stepwise advantage computation
- rllm.stepwise_advantage.mode: Stepwise mode (broadcast or per_step)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per_step is now deprecated -- maybe also check your code for related parts (no need for these logic).

@listar2000
Copy link
Collaborator

Btw make sure you pull from main first as I've merged a few other PRs lately @jeewoo-lee

…t tracking.py to upstream

- Move hydra searchpath (verl + skyrl) into unified.yaml for centralized config.
  Hydra searchpath entries are resolved lazily, so pkg://skyrl_train.config
  won't error for non-skyrl users — it only matters when rllm/backend=skyrl is selected.
- Add logger config bridging in skyrl.yaml
- Add Modal launcher (modal_run.py) and training script for solver-judge workflow
- Add dataset fallback loading in test_skyrl_solver_judge.py
- Revert tracking.py to upstream (restore batch uploads, session URLs, eval logging)
@jeewoo-lee
Copy link
Contributor Author

jeewoo-lee commented Mar 14, 2026

Hi @listar2000, I made the changes:

  1. Unified config searchpath — Kept pkg://skyrl_train.config in unified.yaml. Hydra searchpath entries are resolved lazily, so the skyrl entry won't cause errors for non-skyrl users.
  2. Import guard in rollout/__init__.py — Removed the try/except wrappers. Imports are now direct and any ImportError propagates naturally.
  3. Hard-coded paths in skyrl_engine.py — Replaced all sys.path.insert() / pathlib hacks with standard Python imports. If SkyRL dependencies are missing, a clear ImportError with installation instructions is raised.
  4. Dataset conversion — Added _is_skyrl_compatible_dataset() check in data_adapter.py. Compatible parquet files are reused directly; conversion only happens when necessary.
  5. Utility functions in launcher — Moved all metric utilities out of skyrl_launcher.py into a dedicated skyrl_metrics_utils.py, following the same pattern as tinker_metrics_utils.py.
  6. Wandb training log — Will follow up with a wandb run link once training completes on Modal.
  7. ModelOutput import in transform.py — Fixed to import from rllm.experimental.rollout as suggested.
  8. per_step deprecation — Removed all per_step / stepwise_advantage logic. Only broadcast mode is used now.
  9. Pull from main — Merged latest upstream/main.
  10. Modal integration — Added modal_run.py (Modal launcher for building and running rLLM + SkyRL on cloud GPUs) and run_skyrl_solver_judge_modal.sh (solver-judge training script for Modal).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants