Hyperparameter optimization sweep feature by RhizoNymph · Pull Request #2502 · PrimeIntellect-ai/prime-rl

RhizoNymph · 2026-05-15T04:01:54Z

Summary

Adds a hyperparameter sweep framework to prime-rl. The new sweep entrypoint materializes and launches studies for rl or sft target configs using grid search, seeded random search, or Optuna, with support for local execution, SLURM submission, and shared-trainer LoRA waves.

Sweep configs are TOML files that define the target entrypoint, base config(s), strategy, scheduler, optional objective / early stopping settings, and dotted-path [parameters] overrides into the target config:

uv run sweep @ examples/sweep/grid_local.toml
uv run sweep @ examples/sweep/optuna_pruner_asha.toml
uv run sweep @ examples/sweep/grid_multi_run_lora.toml

This is intentionally a large PR because the core sweep loop, materialization layer, schedulers, docs, examples, and tests are tightly connected. That said, I'm happy to split this into separate staged PRs if that would make review easier.

Each piece of the sweep infrastructure was also exercised end to end and successfully performed an optimization sweep. SLURM features were tested on a local 3 node cluster where each node has a 3090. Local parallel run was tested on a Prime Intellect rented node with 4 A6000s.

What's Included

Sweep framework

Adds src/prime_rl/sweep/ with the controller, materializer, search expansion, Optuna loops, scheduler integrations, early stopping, metric attribution, reproducibility metadata, and run-control helpers.
Writes stable per-trial artifacts under the study output directory, including overrides.toml, resolved.toml, command.txt, status.json, and a study-level manifest.json.
Validates resolved rl / sft configs before launch so bad parameter paths or invalid sampled values fail during materialization instead of after jobs have been submitted.
Supports resume invariants for existing studies, including checks for drift in entrypoint, objective, strategy, scheduler, and parameter ordering.

Strategies and schedulers

Adds grid and random search over choice, uniform, log-uniform, and integer-uniform parameter spaces.
Adds Optuna support behind the new hpo extra, including TPE/random/CMA-ES/QMC samplers, SQLite-backed resume, and median / ASHA / Hyperband pruners.
Adds local sequential and local parallel scheduling, with explicit static GPU groups for parallel local runs.
Adds SLURM submission for static sweeps plus parallel SLURM Optuna ask/tell support.
Adds multi_run_lora scheduling for RL sweeps that share one trainer across multiple LoRA-backed orchestrator runs.
Adds patience and threshold early stopping for local and shared-trainer LoRA flows.

Entrypoints and launch plumbing

Adds sweep as the user-facing sweep entrypoint.
Adds rl-multi-run for shared-trainer LoRA waves, where a single trainer and inference stack serve multiple pre-materialized orchestrator run directories.
Extracts shared launch primitives from rl.py into src/prime_rl/entrypoints/launch.py, including GPU mapping, subprocess startup, monitor threads, and supervised shutdown.

Metrics and monitoring

Adds FileMonitor, which writes step-indexed JSONL metrics when PRIME_RL_SWEEP_METRICS_JSONL is set.
Routes sweep trial metrics into metrics.jsonl so the controller can read final objectives and Optuna pruners can tail intermediate values.
Keeps non-sweep runs unchanged because the file monitor is only attached when the sweep env var is present.
Updates SFT training to save a final summary so SFT objectives can be attributed consistently.
Adds a WANDB_SHARED_PRIMARY=1 override so rl-multi-run can keep the launcher process as the primary W&B run owner.

SLURM and runtime fixes

Serializes uv sync --all-extras in SLURM templates with flock and sets UV_NO_SYNC=1 after sync so concurrent sweep jobs sharing one venv do not clobber each other.
Updates multi-node RL SLURM teardown to write a .training_complete sentinel and self-cancel after the trainer exits, allowing sweep controllers to distinguish expected self-teardown from an external cancellation.
Fixes the orchestrator advantage union annotation to avoid duplicate-subcommand warnings from Tyro's union expansion.
Adds the hpo extra with optuna>=4.
Pins CUDA runtime / NVRTC packages needed by vLLM's CUDA 13-linked extension on x86_64 systems where only the driver and Torch's bundled CUDA runtime are present.

Docs, examples, and skills

Adds docs/sweeps.md covering config structure, strategies, schedulers, objectives, pruning, early stopping, artifacts, resume behavior, and failure handling.
Links the sweep docs from the docs index / Mint config and adds a sweep section to docs/entrypoints.md.
Adds sweep examples under examples/sweep/ for local, SLURM, Optuna, pruning, resume, shared-trainer LoRA, early stopping, RL, and SFT workflows.
Updates the config and entrypoint skills with sweep parameter-path rules, restrictions, and resume invariants.

Coverage

Adds unit coverage for sweep config validation, search expansion, materialization, controller dispatch, scheduler behavior, Optuna ask/tell flows, pruning, shared-trainer LoRA reconciliation, early stopping, metrics parsing, and file-monitor semantics.

RhizoNymph added 6 commits May 14, 2026 20:24

fix: avoid tyro duplicate-subcommand warning on advantage field

95d2347

feat: add step-indexed FileMonitor sidecar for sweep metrics

b0dc8fb

chore: serialize uv sync in sbatch templates to prevent shared-venv race

eb28ae5

refactor: extract launch helpers and add rl-multi-run entrypoint

f4c70ca

feat: add hyperparameter sweep framework

d928993

docs: hyperparameter sweep examples and documentation

489e329

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyperparameter optimization sweep feature#2502

Hyperparameter optimization sweep feature#2502
RhizoNymph wants to merge 6 commits into
PrimeIntellect-ai:mainfrom
RhizoNymph:feat/hpo-sweeps

RhizoNymph commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RhizoNymph commented May 15, 2026

Summary

What's Included

Sweep framework

Strategies and schedulers

Entrypoints and launch plumbing

Metrics and monitoring

SLURM and runtime fixes

Docs, examples, and skills

Coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant