Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
872 commits
Select commit Hold shift + click to select a range
dcc70bf
docs: remove checkpoint 0 column from training environments table
Jan 27, 2026
c226525
docs: restructure results with run_id as header for each run
Jan 27, 2026
c934485
docs: add v0.2.2 analysis comparing baseline to current run mk6nr5ij
Jan 27, 2026
29cda8e
feat(fleet): add Tinker backend for Fleet task training (#63)
dzorlu Jan 28, 2026
a55a407
chore: use #fleet-training-runs-test for Tinker workflow (#64)
dzorlu Jan 28, 2026
fb9a279
fix(ci): install OpenEnv with [fleet] extra for MCP support (#65)
dzorlu Jan 28, 2026
eada50c
fix(tinker): use Qwen3-VL-30B-A3B-Instruct model (#66)
dzorlu Jan 28, 2026
d25f5c0
fix(tinker): ensure tokenizer returns list of ints for ModelInput (#67)
dzorlu Jan 28, 2026
f71302b
fix(tinker): correct sample() API signature (#68)
dzorlu Jan 28, 2026
4494721
fix(tinker): use correct SampledSequence attribute names (#69)
dzorlu Jan 28, 2026
ce51dd0
fix(tinker): add max_input_length check to prevent context overflow (…
dzorlu Jan 28, 2026
ccacf88
fix(tinker): filter overlong sequences before training (#71)
dzorlu Jan 28, 2026
8b6338f
perf: parallelize rollout collection with asyncio.gather (#72)
dzorlu Jan 28, 2026
17f65db
fix(ci): default to 8B model and fix task count syntax error (#73)
dzorlu Jan 28, 2026
f150b51
fix(ci): add cancelled notification and default to 50 steps (#74)
dzorlu Jan 28, 2026
9fffc78
feat: add progress logging during rollout collection (#75)
dzorlu Jan 28, 2026
fdcaedc
fix(tinker): MCP timeout fixes and logging improvements (#76)
dzorlu Jan 28, 2026
3f9031c
fix(tinker): reduce max concurrent rollouts from 4 to 2 (#77)
dzorlu Jan 28, 2026
9924f5d
feat(tinker): add tqdm progress bar to training loop (#78)
dzorlu Jan 28, 2026
289a40c
docs: clarify PR workflow - user merges after review (#79)
dzorlu Jan 28, 2026
63e2d0e
docs: add Fleet training runbook (#80)
dzorlu Jan 28, 2026
abe78b7
fix(tinker): revert to standard runner and reduce concurrency to 2 (#81)
dzorlu Jan 28, 2026
46a7b64
fix: use ThreadPoolExecutor for MCP connection isolation (#82)
dzorlu Jan 29, 2026
bd4667b
fix: increase Fleet instance TTL from 10 min to 30 min (#83)
dzorlu Jan 29, 2026
572cea8
fix: update Slack channel to #fleet-training-runs (#84)
dzorlu Jan 29, 2026
55f4805
fix(tinker): increase TTL to 2 hours and fix Pydantic attribute acces…
dzorlu Jan 29, 2026
c7c51f3
feat(tinker): add turn-level logging with timing (#86)
dzorlu Jan 29, 2026
e28c10b
feat(fleet): add step timing logs to FleetTaskEnv (#87)
dzorlu Jan 29, 2026
b2e682b
fix(tinker): use async sampling to avoid event loop blocking (#88)
dzorlu Jan 29, 2026
37bbc0e
fix(tinker): sample_async returns result directly, not a future (#89)
dzorlu Jan 29, 2026
0191d96
chore: remove per-turn console logging (timing still logged to WandB)…
dzorlu Jan 29, 2026
e9060c0
fix(wandb): add commit=True to force immediate sync (#91)
dzorlu Jan 29, 2026
61117b5
feat: v0.3 dataset split - add ticketmaster to eval for trace analysi…
dzorlu Jan 29, 2026
0f48875
fix: disable checkpointing by default (fixes 409 conflict) (#93)
dzorlu Jan 29, 2026
55a8226
feat: add data_version input with S3 validation (#94)
dzorlu Jan 29, 2026
763f345
Gradient Based Metrics to Evaluate Quality of RL Training (#96)
bulb-fleet Feb 3, 2026
7e9da64
New Task for: Gradient Metrics for Qwen3 training. (#97)
bulb-fleet Feb 3, 2026
d2e4a95
refac: reduce duration of qwen3 training. (#98)
bulb-fleet Feb 3, 2026
33fd7d4
Fix: Appropriate detaching of Gradient Histograms post Pdist calculat…
bulb-fleet Feb 4, 2026
bcc0540
Fix: Collate distributed tensors to full_tensors() for histogram gene…
bulb-fleet Feb 4, 2026
358437e
Extra Gradient Metrics Tuned for Interpretability. (#101)
bulb-fleet Feb 4, 2026
2e018ad
Tune: Flatten Cumulative Update Histogram for wandb plotting convenie…
bulb-fleet Feb 4, 2026
ace3a35
docs: merge model_issues.md and project.md into experiments.md (#103)
dzorlu Feb 4, 2026
6c6775a
feat: unify metric aggregation between SkyRL and Tinker (#95)
dzorlu Feb 4, 2026
27d23be
Metric: Effective Rank as a proxy to Capacity Utilization. (#104)
bulb-fleet Feb 4, 2026
fd62b51
New Task for Qwen-2.5-1.5B model. (#106)
bulb-fleet Feb 4, 2026
581f2d9
fix: reduce gpu_memory_utilization for v0.3 OOM (#107)
dzorlu Feb 4, 2026
dec64b0
fix: accumulate gmetrics across the training run. (#109)
bulb-fleet Feb 4, 2026
67d8fe1
Bulb/metrics accum strategy (#110)
bulb-fleet Feb 4, 2026
c434a67
Bulb/metrics accum strategy (#112)
bulb-fleet Feb 4, 2026
68cd834
use _local_tensor instead of full_tensor to support distributed gradi…
bulb-fleet Feb 4, 2026
9b20534
Bulb/metrics accum strategy (#114)
bulb-fleet Feb 4, 2026
b69e2e4
feat: add 20% max per-environment training cap (v0.3.1) (#111)
dzorlu Feb 4, 2026
6b8eab8
fix: Handle token-level rewards in compute_reward_metrics (#115)
dzorlu Feb 5, 2026
b4ce252
fix: Return zero-reward trajectory on env init failure instead of cra…
dzorlu Feb 5, 2026
279acb8
perf: reduce eval set size (50 prompts max, eval_interval=40) (#117)
dzorlu Feb 5, 2026
8788577
feat: log env init failure count and percentage (#118)
dzorlu Feb 5, 2026
0184dde
fix: return rollout_logprobs=[0.0] for failed trajectories (#119)
dzorlu Feb 5, 2026
7fa4aac
feat: add per-trajectory timeout to prevent stuck trajectories (#120)
dzorlu Feb 5, 2026
0a56e36
fix: robust parse_tool_call handling for edge cases (#121)
dzorlu Feb 5, 2026
cf51178
fix: gmetric tracking is policy-only + optimize memory, move gstats s…
bulb-fleet Feb 6, 2026
ae4f9a7
feat: add hybrid environment sampling for training batches (#122)
dzorlu Feb 6, 2026
40147ca
fix: consistent reward format for timeout/failed trajectories (#124)
dzorlu Feb 6, 2026
f3412f4
feat: increase batch size and GPUs for hybrid env sampling (#125)
dzorlu Feb 6, 2026
d70a515
fix: use batch_sampler for HybridEnvSampler in DataLoader (#126)
dzorlu Feb 6, 2026
4798c3f
fix: use standard DataLoader for hybrid sampling (StatefulDataLoader …
dzorlu Feb 6, 2026
cdc2df7
Reduce eval_interval and increase job timeout for longer training run…
dzorlu Feb 6, 2026
646ee9a
Enable use_hybrid_env_sampling to ensure every env is sampled every b…
dzorlu Feb 6, 2026
9493069
Fix: catch ValueError in parse_tool_call for large integers (#130)
dzorlu Feb 6, 2026
e4eddc2
Add Hyperbolic and Nebius cloud providers for GPU provisioning (#135)
dzorlu Feb 8, 2026
2b90215
Fix cloud providers: remove Hyperbolic, fix Nebius credentials (#136)
dzorlu Feb 8, 2026
4351969
Add memory requirements to filter out low-spec GPU instances (#137)
dzorlu Feb 8, 2026
d2445a0
fix: remove H100:8 fallback from Qwen3-8B training task (#138)
dzorlu Feb 8, 2026
e0a7e83
feat: add separate 8-GPU task YAML for B200:8 providers (#139)
dzorlu Feb 8, 2026
b3df087
fix: add all providers to 8-GPU task YAML (#140)
dzorlu Feb 8, 2026
35fe0b9
feat: add H200 as fallback to B200 in GPU task YAMLs (#141)
dzorlu Feb 8, 2026
fcfe5b4
fix: always log eval_before_train at step 0 (#144)
dzorlu Feb 8, 2026
b1baf2d
fix: make trajectory timeout work for blocking calls (#143)
dzorlu Feb 8, 2026
6e654a1
fix: avoid GHA 6-hour job timeout by detaching immediately (#145)
dzorlu Feb 8, 2026
ee780c8
fix: Handle cluster not found error, show who triggered, add concurrency
Feb 8, 2026
4362981
Revert "fix: Handle cluster not found error, show who triggered, add …
Feb 8, 2026
8861af9
fix: Handle cluster not found, show who triggered, add concurrency (#…
dzorlu Feb 8, 2026
144f0c6
debug: Add verbose output for sky status --refresh
Feb 8, 2026
42b7806
feat: Use self-hosted EC2 runner for training workflows (#148)
dzorlu Feb 8, 2026
dbe7e05
fix(fleet): inject env_variables into system prompt (#149)
dzorlu Feb 8, 2026
9a56cdb
fix: update argparse default for eval_ratio to 0.20 (#150)
dzorlu Feb 8, 2026
41ff330
feat: add Slack notification for cancelled training runs (#151)
dzorlu Feb 8, 2026
5c32293
feat(fleet): add context management tools for long trajectories (#152)
dzorlu Feb 8, 2026
c3111e2
docs: add eval trajectory analysis instructions to CLAUDE.md
Feb 9, 2026
e70301c
Update trajectory metrics: Error Rate → Success Rate
Feb 9, 2026
b114c9c
Remove error pattern documentation requirement
Feb 9, 2026
51e7e29
Restore error pattern documentation as separate section
Feb 9, 2026
48cef9c
Allow concurrent training runs for experimentation (#154)
dzorlu Feb 9, 2026
ac11daf
Add runner pool setup script and documentation (#155)
dzorlu Feb 9, 2026
2027f19
Add enable_context_tools to config schema (#156)
dzorlu Feb 9, 2026
68960ae
Fix enable_context_tools config and chat history sync (#157)
dzorlu Feb 9, 2026
cf30696
Add nebius to SkyPilot install in runner setup
Feb 9, 2026
6260f45
Update runner pool docs with new fleet-runner-1 instance
Feb 9, 2026
a042da6
docs: Mark fleet-runner-1 as Active
Feb 9, 2026
0ab06c0
docs: Update runner pool with instance IDs and types
Feb 9, 2026
1b1be26
fix: Reduce max_input_length to 40K for context tools headroom
Feb 9, 2026
f169d7d
fix: Add length check after re-tokenization in step-wise mode
Feb 9, 2026
7459349
Revert max_input_length to 48K
Feb 9, 2026
ad62881
Revert step-wise length check commits (will re-apply as PR)
Feb 9, 2026
aa159c0
fix: Add length check after re-tokenization in step-wise mode (#158)
dzorlu Feb 9, 2026
6566551
Update training-runbook.md
dzorlu Feb 9, 2026
1b83034
fix: Return StepWiseOutput in early returns when step_wise_trajectori…
dzorlu Feb 9, 2026
02b48d3
docs: Add variance_per_prompt analysis instructions to CLAUDE.md (#160)
dzorlu Feb 10, 2026
c283271
fix: Add database exploration hint for fostgres environment (#161)
dzorlu Feb 10, 2026
6fba0ce
fix: Require final answer before <done> signal (#162)
dzorlu Feb 10, 2026
e598dfc
fix: Reduce batch size to fix OOM in v0.3.5 (#163)
dzorlu Feb 10, 2026
9a5de3c
feat: Add step-wise training YAML config (#164)
dzorlu Feb 10, 2026
30d3998
fix: Increase disk_size for step-wise training (#167)
dzorlu Feb 11, 2026
475c288
feat: Robust self-hosted runner infrastructure (#166)
dzorlu Feb 11, 2026
cba020f
Add VL task option to workflow and task YAML (#172)
dzorlu Feb 12, 2026
a0084f4
Fix runner health check: add actions:read permission
Feb 12, 2026
e22bae0
Use GH_PAT for runner health check (requires PAT with repo scope)
Feb 12, 2026
07932f4
Refactor health check to use AWS EC2 status instead of GitHub API
Feb 12, 2026
4654da3
Fix health check: use portable shell syntax instead of bash associati…
Feb 12, 2026
201b098
Debug health check
Feb 12, 2026
d815d61
Fix health check: handle errors and show stderr
Feb 12, 2026
85ad887
Clean up health check debug code
Feb 12, 2026
38f9987
Make runner health check more robust
Feb 12, 2026
df52e01
Add Qwen3-32B training config (#171)
dzorlu Feb 12, 2026
72585ff
fix(32b): use B200:8 to fix OOM during backward pass (#175)
dzorlu Feb 13, 2026
d09c690
fix(32b): Increase disk_size and limit checkpoint retention (#176)
dzorlu Feb 13, 2026
fcd9394
Add xLAM-2-70b-fc-r training config (#177)
dzorlu Feb 14, 2026
9e2b322
Fix xLAM-70b OOM: reduce policy_mini_batch_size 16→8 (#178)
dzorlu Feb 14, 2026
a0872d9
Add GLM-4.7-Flash training config (#179)
dzorlu Feb 14, 2026
f684f87
fix(glm-4.7-flash): install transformers from source for glm4_moe_lit…
dzorlu Feb 14, 2026
e482327
feat: support comma-separated env_keys filter in GHA workflows (#181)
dzorlu Feb 14, 2026
53c6bb8
Fix GLM-4.7-Flash: use vLLM nightly for glm4_moe_lite support (#182)
dzorlu Feb 14, 2026
f5b9677
Fix GLM-4.7-Flash: add vLLM + transformers to uv run --with (#183)
dzorlu Feb 14, 2026
aa2816b
fix(glm-4.7-flash): use vLLM nightly for transformers 5.x compatibili…
dzorlu Feb 14, 2026
a16bff9
fix(glm-4.7-flash): install all deps in setup, remove --isolated (#185)
dzorlu Feb 14, 2026
245e13b
fix(glm-4.7-flash): install torchvision after vLLM nightly (#186)
dzorlu Feb 14, 2026
6ff1150
fix(glm-4.7-flash): use CUDA 12.8 index for B200 Blackwell torchvisio…
dzorlu Feb 15, 2026
1a1f869
fix(glm-4.7-flash): install torch+torchvision together from nightly/c…
dzorlu Feb 15, 2026
969ee05
fix(glm-4.7-flash): uninstall torchvision to avoid nms error (#189)
dzorlu Feb 15, 2026
81cc8f3
fix(glm-4.7-flash): install vLLM last to preserve PyTorch ABI (#190)
dzorlu Feb 15, 2026
82724b5
fix(glm-4.7-flash): uninstall torchvision AFTER vLLM (#191)
dzorlu Feb 15, 2026
ed4f7d2
fix(glm-4.7-flash): install vLLM first, transformers after (#192)
dzorlu Feb 15, 2026
556f618
debug(glm-4.7-flash): add diagnostics to debug ABI error (#193)
dzorlu Feb 15, 2026
278b65f
fix(glm-4.7-flash): add --index-strategy unsafe-best-match for vLLM (…
dzorlu Feb 15, 2026
5351176
Revert "fix(glm-4.7-flash): add --index-strategy unsafe-best-match fo…
dzorlu Feb 15, 2026
f3859fb
fix(glm-4.7-flash): install vLLM from source instead of nightly wheel…
dzorlu Feb 15, 2026
6375011
fix(32b): use main branch for workdir instead of feat/vl-multimodal-s…
dzorlu Feb 15, 2026
afc9ffc
fix(glm-4.7-flash): revert to nightly wheels (source build needs nvcc…
dzorlu Feb 15, 2026
a1125bc
fix(glm-4.7-flash): use official vLLM install from HuggingFace model …
dzorlu Feb 15, 2026
7c2a923
feat(glm-4.7-flash): enable preserved thinking for multi-turn tool us…
dzorlu Feb 15, 2026
62c6833
fix(glm-4.7-flash): add --upgrade flag to vLLM install (#203)
dzorlu Feb 15, 2026
0bafa00
fix(glm-4.7-flash): use uv sync --inexact to keep vLLM (#204)
dzorlu Feb 15, 2026
220c077
fix(glm-4.7-flash): use uv pip install instead of uv sync (#205)
dzorlu Feb 15, 2026
b3245f8
feat(glm-4.7-flash): increase GPU count from 4 to 8 (#206)
dzorlu Feb 15, 2026
9c9001a
fix(glm-4.7-flash): use correct GPU names per provider (#207)
dzorlu Feb 15, 2026
28d9954
fix(glm-4.7-flash): remove Nebius (fabric config issues) (#208)
dzorlu Feb 15, 2026
87d86ea
fix(glm-4.7-flash): use --no-deps to prevent PyTorch downgrade (#209)
dzorlu Feb 15, 2026
b4751eb
fix(glm-4.7-flash): don't use 'uv run' in run phase (#210)
dzorlu Feb 15, 2026
7d88143
fix(glm-4.7-flash): remove 'uv run' from setup PREPARE_CMD (#211)
dzorlu Feb 15, 2026
7f8437c
fix(glm-4.7-flash): install missing deps (datasets, ray, etc.) (#212)
dzorlu Feb 15, 2026
ab9a26c
fix(glm-4.7-flash): remove 'ray stop' (kills SkyPilot's Ray) (#213)
dzorlu Feb 15, 2026
76d748a
fix(glm-4.7-flash): add ALL deps from pyproject.toml (#214)
dzorlu Feb 15, 2026
d4d52cf
fix(glm47-flash): install skyrl-gym from local path not PyPI (#216)
dzorlu Feb 15, 2026
36b9495
fix(glm47-flash): use + prefix for new Hydra config keys (#217)
dzorlu Feb 15, 2026
1ab3248
fix(glm47-flash): add flash-attn and reorganize setup steps (#218)
dzorlu Feb 15, 2026
920846d
fix(glm47-flash): remove FLASH_ATTENTION_SKIP_CUDA_BUILD flag (#219)
dzorlu Feb 15, 2026
bc96c5b
fix(glm47-flash): override pyproject.toml SKIP_CUDA_BUILD setting (#220)
dzorlu Feb 15, 2026
3110b5f
debug(glm47-flash): comprehensive flash-attn build debugging (#221)
dzorlu Feb 15, 2026
a924de3
fix(glm): skip flash-attn, use FLASHINFER attention backend (#222)
dzorlu Feb 15, 2026
3e21f7e
fix(glm): install flash-attn Python utils for SkyRL training (#223)
dzorlu Feb 15, 2026
183bf20
fix(vllm): add compatibility for vLLM nightly API restructuring (#224)
dzorlu Feb 15, 2026
5f2de66
fix(vllm): correct protocol import paths for vLLM nightly (#225)
dzorlu Feb 15, 2026
9a6b122
fix(vllm): correct ErrorResponse and ErrorInfo import paths for vLLM …
dzorlu Feb 15, 2026
b445d22
fix: Add flash_attn.ops.triton.rotary stub for vLLM rotary embeddings…
dzorlu Feb 15, 2026
da67753
fix(glm-flash): Add package metadata to flash_attn stub (#228)
dzorlu Feb 15, 2026
7cf5a97
fix(glm-flash): Use PEP 440 compliant version for flash_attn stub (#229)
dzorlu Feb 16, 2026
856c77b
fix(glm-flash): Use SDPA attention instead of flash_attention_2 (#230)
dzorlu Feb 16, 2026
84c1cd4
fix(glm-flash): Use + prefix for Hydra config key (#231)
dzorlu Feb 16, 2026
8582ce2
fix: enable S3 checkpoint resume for cross-VM training recovery (#232)
dzorlu Feb 16, 2026
daef1ad
feat(runners): auto-recover unhealthy instances via EC2 stop/start (#…
dzorlu Feb 16, 2026
8c17ee2
fix(glm-flash): Override attn_implementation via model_config_kwargs …
dzorlu Feb 16, 2026
bb9434d
Remove W&B run resume to avoid overwriting existing run data (#235)
dzorlu Feb 16, 2026
858231b
fix(runners): increase recovery timeout to 10min, skip busy runners (…
dzorlu Feb 16, 2026
d2711f4
Add tool error tracking by environment in wandb (#237)
dzorlu Feb 16, 2026
4b9def1
Fix runner health self-healing: add cooldown-based retry, fix permiss…
dzorlu Feb 16, 2026
2db6681
feat: alert Slack when tool_error_rate exceeds 50% during training (#…
dzorlu Feb 16, 2026
a5d2005
fix: prevent concurrent Runner Health Check runs (#241)
dzorlu Feb 17, 2026
3e12526
fix: parallelize runner recovery and fix busy runner 403 (#243)
dzorlu Feb 17, 2026
cab11e1
fix: add disk cleanup to prevent runners from becoming impaired (#244)
dzorlu Feb 17, 2026
5216573
fix: update runner specs to t3.xlarge and add swap setup (#245)
dzorlu Feb 17, 2026
b4bbab7
feat: update DATA_VERSION to v4 for all Fleet training tasks
Feb 19, 2026
9614803
feat(dataset): add difficulty filter for training
Feb 19, 2026
d4bb9d7
fix(dataset): exclude dropbox env and zillow tasks missing CURRENT_DATE
Feb 19, 2026
75533b6
feat(32b): higher LR (1e-5) + disable KL loss for PPO clipping (#246)
dzorlu Feb 19, 2026
035abda
Add SWE Task Generator Integration for GRPO Training (#133)
erranlli Feb 19, 2026
9806c94
feat(workflow): add difficulty filter dropdown to training UI
Feb 19, 2026
f559ec5
feat: add retry mechanism for cluster provisioning (#247)
dzorlu Feb 19, 2026
4968ec7
feat: Add per-env error metrics and logging (#248)
dzorlu Feb 19, 2026
1d3b19e
fix: Bypass ThreadPoolExecutor for async-capable envs to prevent zomb…
dzorlu Feb 22, 2026
67c08da
fix: Add Ray binary permission fix for RunPod (#251)
dzorlu Feb 24, 2026
3671b83
fix: move Ray permission fix to setup section (before provisioning) (…
dzorlu Feb 24, 2026
c27494a
fix: increase 8B disk_size from 200GB to 500GB to prevent worker OOM …
dzorlu Feb 25, 2026
9f0fc21
fix: Increase max_prompt_length from 512 to 2048 (#256)
dzorlu Feb 26, 2026
51b19c9
Fix: tool results dropped when response contains error: null (#257)
dzorlu Feb 26, 2026
ed662e5
feat: Wire LOGFIRE_TOKEN to fleet env telemetry (#258)
dzorlu Feb 26, 2026
cfbf016
Switch OpenEnv branch to deniz/fleet-logfire (#259)
dzorlu Feb 26, 2026
9e3eb36
fix: Install openenv[fleet] extra to include logfire dependency (#260)
dzorlu Feb 26, 2026
4d69faa
fix: Enable PrimeIntellect provider + fix H200 GPU naming (#261)
dzorlu Feb 26, 2026
a91afcf
revert: Fix Ray worker env by reverting openenv[fleet] syntax (#262)
dzorlu Feb 26, 2026
6ec0333
fix: Pass RUN_ID to SkyPilot so Slack and WandB use same run name (#264)
dzorlu Feb 26, 2026
875ceca
feat: Add SkyPilot task YAML for Harbor GRPO training (Qwen3-8B) (#265)
dzorlu Feb 27, 2026
3e32eac
feat: Add harbor-grpo-qwen3-8b to training workflow dropdown
Feb 27, 2026
e20a8f9
Revert "feat: Add harbor-grpo-qwen3-8b to training workflow dropdown"
Feb 27, 2026
ffd87bd
feat: Add harbor-grpo-qwen3-8b to training workflow dropdown (#266)
dzorlu Feb 27, 2026
47e4aaa
fix: Pass DAYTONA_API_KEY to SkyPilot for Harbor training (#267)
dzorlu Feb 27, 2026
ce99abf
fix: Add harbor dependency and optional extra to pyproject.toml (#268)
dzorlu Feb 27, 2026
02abb6f
fix: Relax datasets pin to >=4.0.0 for harbor compatibility (#269)
dzorlu Feb 27, 2026
f0f743f
fix: Install harbor via pip separately to avoid datasets (#270)
dzorlu Feb 27, 2026
21d395f
revert: Pin datasets==4.0.0 (revert PR #269) (#271)
dzorlu Feb 27, 2026
9f6de1d
fix: Add harbor examples from upstream and use --no-deps for pip inst…
dzorlu Feb 27, 2026
50e61e1
Update training workflow default to v5 dataset (6,618 tasks)
Mar 2, 2026
8bb4a52
Fix v5 task count in workflow description (5,674 not 6,618)
Mar 2, 2026
1dcce3a
fix: Extend max_input_length to 64K and guard against overlong prompt…
dzorlu Mar 2, 2026
ea39eeb
fix: Reduce micro batch sizes for 64K context to avoid OOM (#276)
dzorlu Mar 3, 2026
475514f
update v5 task count in workflow description
Mar 3, 2026
07bd6e0
update v5 dataset count in workflow description (5,654 tasks)
Mar 3, 2026
2b53b8b
Remove modality dropdown from GHA workflow; derive from task YAML
Mar 3, 2026
9dac892
feat: modality-aware instance TTL default (CUA: 1800s, tool_use: 600s)
Mar 3, 2026
edcbad1
Increase fleet instance TTL from 600s to 900s to reduce 502s (#278)
dzorlu Mar 4, 2026
b9ca043
Add GCP spot H200 as fallback compute option (#282)
dzorlu Mar 5, 2026
ab584bb
feat: Qwen3.5-9B as default tool-use training model (#281)
dzorlu Mar 9, 2026
84e523d
Move all storage paths from root overlay to /workspace volume (#285)
dzorlu Mar 9, 2026
125a6d8
feat: wire Fleet trace upload into eval flow (#286)
dzorlu Mar 9, 2026
4584e2b
feat: thread partial_reward flag to OpenEnv (#287)
dzorlu Mar 11, 2026
9d6f36f
chore: update training config for v54 eval run
Mar 11, 2026
badd26c
feat: Use verifier reward for context-overflow trajectories (#290)
dzorlu Mar 12, 2026
88a2d79
fix: raise ulimit for open files in training run (#289)
dzorlu Mar 13, 2026
700ca68
Update dataset to v54 (patched verifiers) (#291)
dzorlu Mar 14, 2026
3edaac2
fix: pass@n requires full success (>= 1.0) (#293)
dzorlu Mar 16, 2026
00f7e24
feat: shared training scripts, multi-node, Qwen3.5-35B-A3B (#297)
dzorlu Mar 17, 2026
ee3afe7
fix: point task YAML workdir refs to main (#298)
dzorlu Mar 17, 2026
78bc5cc
fix: pin transformers==5.1.0 and disable enforce_eager for Qwen3.5 (#…
dzorlu Mar 17, 2026
3c5596e
fix: use transformers==5.3.0 for Qwen3.5-MoE support (#300)
dzorlu Mar 17, 2026
c2a6ddc
fix: auto-detect data root (/workspace vs $HOME) for cross-cloud comp…
dzorlu Mar 18, 2026
9e5110d
Add hint-augmented rollouts to rescue GRPO signal (#294)
dzorlu Mar 18, 2026
059c86e
fix: GCP gIB/RDMA handling + GKE spot pools + remove num_nodes input
dzorlu Mar 18, 2026
3e44a8f
fix: install skypilot[kubernetes] on runner for GKE support
Mar 18, 2026
d4820af
fix: add GKE kubeconfig setup for kubernetes cloud support
Mar 18, 2026
0699515
35B: partial_reward + 96K context + v55 (#305)
dzorlu Mar 19, 2026
761b582
Pin causal-conv1d>=1.6.0 to fix build failure with torch 2.10 + CUDA …
Mar 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 0 additions & 5 deletions .env

This file was deleted.

12 changes: 12 additions & 0 deletions .gemini/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
have_fun: false
code_review:
disable: false
comment_severity_threshold: MEDIUM
max_review_comments: -1
pull_request_opened:
help: false
# disable PR summaries
summary: false
code_review: true
include_drafts: false
ignore_patterns: []
89 changes: 89 additions & 0 deletions .github/workflows/cpu_ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
name: SkyRL

on:
push:
branches:
- main
- rc/**
pull_request:

permissions:
checks: write # for status checks to appear
contents: read

# Cancel runs for previous commits on the same branch
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
check_code_quality:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- name: Set up Python
# This is the version of the action for setting up Python, not the Python version.
uses: actions/setup-python@v5
with:
# Semantic version range syntax or exact version of a Python version
python-version: '3.12'
cache: 'pip'
- name: Install the latest version of uv
uses: astral-sh/setup-uv@v6
with:
activate-environment: true
- name: Run pre-commit hooks
run: uv pip install pre-commit; pre-commit run --all-files --config .pre-commit-config.yaml

skyrl_tests:
needs: check_code_quality
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: ./skyrl-train

steps:
- uses: actions/checkout@v4
- name: Set up Python
# This is the version of the action for setting up Python, not the Python version.
uses: actions/setup-python@v5
with:
# Semantic version range syntax or exact version of a Python version
python-version: '3.12'
cache: 'pip'
- name: Install the latest version of uv
uses: astral-sh/setup-uv@v6
with:
activate-environment: true
- name: Install skyrl-train
run: uv sync --frozen --extra dev # installs from lock file
- name: Run cpu tests
run: uv run --frozen pytest tests/cpu/

skyrl_gym_tests:
needs: check_code_quality
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: ./skyrl-gym

steps:
- uses: actions/checkout@v4
- name: Set up Python
# This is the version of the action for setting up Python, not the Python version.
uses: actions/setup-python@v5
with:
# Semantic version range syntax or exact version of a Python version
python-version: '3.12'
cache: 'pip'
- name: Install the latest version of uv
uses: astral-sh/setup-uv@v6
with:
activate-environment: true
- name: Install skyrl-gym
run: uv sync --frozen --extra dev # installs from lock file
- name: Run cpu tests
run: uv run --frozen pytest tests/
67 changes: 67 additions & 0 deletions .github/workflows/cpu_skyrl_tx.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
name: SkyRL-tx-CPU

on:
push:
branches: [ main ]
paths:
- 'skyrl-tx/**'
- '.github/workflows/cpu_skyrl_tx.yaml'
pull_request:
paths:
- 'skyrl-tx/**'
- '.github/workflows/cpu_skyrl_tx.yaml'
workflow_dispatch:

permissions:
checks: write # for status checks to appear
contents: read

# Cancel runs for previous commits on the same branch
concurrency:
group: skyrl-tx-${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
skyrl_tx_tests:
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: ./skyrl-tx
steps:
- uses: actions/checkout@v4
- name: Install uv
run: |
curl -LsSf https://astral.sh/uv/install.sh | sh
# - name: Check if reference docs are up to date
# run: |
# uv run --extra dev typer tx/run/main.py utils docs --name tx --output docs/reference.md && git diff --exit-code docs/reference.md
# - name: Test docs
# run: |
# uv run --extra dev mkdocs build --strict
- name: Run lint
run: |
uvx ruff check
# - name: Run type checking
# run: |
# uv run --extra tinker --extra dev ty check
- name: Run pytest
run: |
uv run --extra tinker --extra dev pytest --forked -s tests
- name: Run a single training step
run: |
uv run tx train --model pcmoritz/qwen3-tiny-test --dataset mahiatlinux/TinyStories-GPT4-V2-50K-SUBSET --output-dir /tmp --batch-size 2 --max-steps 1 --optimizer-args '{"learning_rate": 0.002, "weight_decay": 0.1}'
uv run tx train --model pcmoritz/qwen3-tiny-test --dataset mahiatlinux/TinyStories-GPT4-V2-50K-SUBSET --output-dir /tmp --batch-size 2 --max-steps 1 --optimizer-args '{"learning_rate": 0.002, "weight_decay": 0.1}' --tp-size 2
- name: Run a single fine-tuning step on a chat dataset
run: |
uv run --with jinja2 tx train --model pcmoritz/qwen3-tiny-test --dataset smangrul/ultrachat-feedback-10k-chatml --output-dir /tmp --batch-size 2 --max-steps 1 --loader tx.loaders.chat --load-checkpoint-path /tmp
- name: Run a single fine-tuning step with Qwen3 MoE
run: |
uv run --with huggingface_hub hf download trl-internal-testing/tiny-Qwen3MoeForCausalLM --local-dir /tmp/qwen3_moe
uv run --with jinja2 tx train --model trl-internal-testing/tiny-Qwen3MoeForCausalLM --dataset smangrul/ultrachat-feedback-10k-chatml --output-dir /tmp --batch-size 2 --max-steps 1 --loader tx.loaders.chat --load-checkpoint-path /tmp/qwen3_moe
- name: Test experiment tracker integration
run: |
WANDB_MODE=offline WANDB_API_KEY=dummy uv run --with wandb tx train --model pcmoritz/qwen3-tiny-test --dataset mahiatlinux/TinyStories-GPT4-V2-50K-SUBSET --output-dir /tmp --batch-size 2 --max-steps 1 --tracker wandb --tracker-args '{"name": "Qwen3-8B", "project": "tx"}'
- name: Run engine benchmarks
run: |
uv run --extra tinker --extra dev python benchmarks/benchmark_engine.py --base-model trl-internal-testing/tiny-Qwen3ForCausalLM --backend-config '{"max_lora_adapters": 3, "max_lora_rank": 1}' --num-warmup-steps 1 --num-steps 1 --num-requests 1 --seq-len 8 --sample-max-tokens 16
50 changes: 50 additions & 0 deletions .github/workflows/gpu_ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
name: SkyRL-GPU

on:
push:
branches:
- main
paths:
- 'skyrl-train/**'
- '!skyrl-train/docs/**'
- '!skyrl-train/examples/**'
- '.github/workflows/**'
workflow_dispatch:


permissions:
checks: write # for status checks to appear
contents: read

jobs:

skyrl_tests:
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: ./skyrl-train

steps:
- uses: actions/checkout@v4
- name: Set up Python
# This is the version of the action for setting up Python, not the Python version.
uses: actions/setup-python@v5
with:
# Semantic version range syntax or exact version of a Python version
python-version: '3.12'
cache: 'pip'
- name: Install the latest version of uv
uses: astral-sh/setup-uv@v6
with:
activate-environment: true
- name: Install basic dependencies
run: uv pip install anyscale==0.24.79 typer==0.9.0
# Run tests
- name: GPU tests
env:
ANYSCALE_CLI_TOKEN: ${{ secrets.ANYSCALE_CLI_TOKEN }}
ANYSCALE_HOST: https://console.anyscale.com
run: |
anyscale job submit -f ci/anyscale_gpu_ci.yaml --timeout 10000
anyscale job wait --cloud sky-anyscale-aws-us-east-1 --name skyrl-train-gpu-ci --timeout 10000
55 changes: 55 additions & 0 deletions .github/workflows/gpu_ci_megatron.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
name: SkyRL-GPU-Megatron

on:
push:
branches:
- main
paths:
- 'skyrl-train/skyrl_train/workers/**'
- 'skyrl-train/skyrl_train/distributed/megatron/**'
- 'skyrl-train/skyrl_train/config/**'
- 'skyrl-train/skyrl_train/trainer.py'
- 'skyrl-train/tests/gpu/gpu_ci/**'
- 'skyrl-train/pyproject.toml'
- '!skyrl-train/docs/**'
- '!skyrl-train/examples/**'
- '.github/workflows/**'
workflow_dispatch:


permissions:
checks: write # for status checks to appear
contents: read

jobs:

skyrl_tests_megatron:
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: ./skyrl-train

steps:
- uses: actions/checkout@v4
- name: Set up Python
# This is the version of the action for setting up Python, not the Python version.
uses: actions/setup-python@v5
with:
# Semantic version range syntax or exact version of a Python version
python-version: '3.12'
cache: 'pip'
- name: Install the latest version of uv
uses: astral-sh/setup-uv@v6
with:
activate-environment: true
- name: Install basic dependencies
run: uv pip install anyscale==0.24.79 typer==0.9.0
# Run tests
- name: GPU tests
env:
ANYSCALE_CLI_TOKEN: ${{ secrets.ANYSCALE_CLI_TOKEN }}
ANYSCALE_HOST: https://console.anyscale.com
run: |
anyscale job submit -f ci/anyscale_gpu_ci_megatron.yaml --timeout 5000
anyscale job wait --cloud sky-anyscale-aws-us-east-1 --name skyrl-train-gpu-ci-megatron --timeout 5000
47 changes: 47 additions & 0 deletions .github/workflows/gpu_e2e_ci.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: SkyRL-GPU-E2E-CI

on:
schedule:
- cron: '5 8 * * *' # Every day at 08:05 UTC (~00:05 PST / ~01:05 PDT)
workflow_dispatch:

permissions:
checks: write # for status checks to appear
contents: read

jobs:

skyrl_gpu_e2e_test:
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: ./skyrl-train

steps:
- uses: actions/checkout@v4
- name: Set up Python
# This is the version of the action for setting up Python, not the Python version.
uses: actions/setup-python@v5
with:
# Semantic version range syntax or exact version of a Python version
python-version: '3.12'
cache: 'pip'
- name: Install the latest version of uv
uses: astral-sh/setup-uv@v6
with:
activate-environment: true
- name: Install basic dependencies
run: uv pip install anyscale==0.24.79 typer==0.9.0
- name: Install envsubst
run: sudo apt-get update && sudo apt-get install -y gettext-base
- name: Basic convergence test
env:
ANYSCALE_CLI_TOKEN: ${{ secrets.ANYSCALE_CLI_TOKEN }}
ANYSCALE_HOST: https://console.anyscale.com
WANDB_API_KEY: ${{ secrets.WANDB_API_KEY }}
run: |
envsubst < ci/anyscale_gpu_e2e_test.yaml > ci/anyscale_gpu_e2e_test_envsubst.yaml
anyscale job submit -f ci/anyscale_gpu_e2e_test_envsubst.yaml --timeout 4500
anyscale job wait --cloud sky-anyscale-aws-us-east-1 --name skyrl-train-gpu-e2e-test --timeout 4500
rm -f ci/anyscale_gpu_e2e_test_envsubst.yaml
47 changes: 47 additions & 0 deletions .github/workflows/gpu_e2e_ci_megatron.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
name: SkyRL-GPU-E2E-CI-Megatron

on:
schedule:
- cron: '5 8 * * *' # Every day at 08:05 UTC (~00:05 PST / ~01:05 PDT)
workflow_dispatch:

permissions:
checks: write # for status checks to appear
contents: read

jobs:

skyrl_gpu_e2e_test_megatron:
runs-on: ubuntu-latest
defaults:
run:
shell: bash
working-directory: ./skyrl-train

steps:
- uses: actions/checkout@v4
- name: Set up Python
# This is the version of the action for setting up Python, not the Python version.
uses: actions/setup-python@v5
with:
# Semantic version range syntax or exact version of a Python version
python-version: '3.12'
cache: 'pip'
- name: Install the latest version of uv
uses: astral-sh/setup-uv@v6
with:
activate-environment: true
- name: Install basic dependencies
run: uv pip install anyscale==0.24.79 typer==0.9.0
- name: Install envsubst
run: sudo apt-get update && sudo apt-get install -y gettext-base
- name: Basic convergence test
env:
ANYSCALE_CLI_TOKEN: ${{ secrets.ANYSCALE_CLI_TOKEN }}
ANYSCALE_HOST: https://console.anyscale.com
WANDB_API_KEY: ${{ secrets.WANDB_API_KEY }}
run: |
envsubst < ci/anyscale_gpu_e2e_test_megatron.yaml > ci/anyscale_gpu_e2e_test_megatron_envsubst.yaml
anyscale job submit -f ci/anyscale_gpu_e2e_test_megatron_envsubst.yaml --timeout 4500
anyscale job wait --cloud sky-anyscale-aws-us-east-1 --name skyrl-train-gpu-e2e-test-megatron --timeout 4500
rm -f ci/anyscale_gpu_e2e_test_megatron_envsubst.yaml
Loading