Skip to content

Hyperparameter optimization sweep feature#2502

Draft
RhizoNymph wants to merge 6 commits into
PrimeIntellect-ai:mainfrom
RhizoNymph:feat/hpo-sweeps
Draft

Hyperparameter optimization sweep feature#2502
RhizoNymph wants to merge 6 commits into
PrimeIntellect-ai:mainfrom
RhizoNymph:feat/hpo-sweeps

Conversation

@RhizoNymph
Copy link
Copy Markdown

Summary

Adds a hyperparameter sweep framework to prime-rl. The new sweep entrypoint materializes and launches studies for rl or sft target configs using grid search, seeded random search, or Optuna, with support for local execution, SLURM submission, and shared-trainer LoRA waves.

Sweep configs are TOML files that define the target entrypoint, base config(s), strategy, scheduler, optional objective / early stopping settings, and dotted-path [parameters] overrides into the target config:

uv run sweep @ examples/sweep/grid_local.toml
uv run sweep @ examples/sweep/optuna_pruner_asha.toml
uv run sweep @ examples/sweep/grid_multi_run_lora.toml

This is intentionally a large PR because the core sweep loop, materialization layer, schedulers, docs, examples, and tests are tightly connected. That said, I'm happy to split this into separate staged PRs if that would make review easier.

Each piece of the sweep infrastructure was also exercised end to end and successfully performed an optimization sweep. SLURM features were tested on a local 3 node cluster where each node has a 3090. Local parallel run was tested on a Prime Intellect rented node with 4 A6000s.

What's Included

Sweep framework

  • Adds src/prime_rl/sweep/ with the controller, materializer, search expansion, Optuna loops, scheduler integrations, early stopping, metric attribution, reproducibility metadata, and run-control helpers.
  • Writes stable per-trial artifacts under the study output directory, including overrides.toml, resolved.toml, command.txt, status.json, and a study-level manifest.json.
  • Validates resolved rl / sft configs before launch so bad parameter paths or invalid sampled values fail during materialization instead of after jobs have been submitted.
  • Supports resume invariants for existing studies, including checks for drift in entrypoint, objective, strategy, scheduler, and parameter ordering.

Strategies and schedulers

  • Adds grid and random search over choice, uniform, log-uniform, and integer-uniform parameter spaces.
  • Adds Optuna support behind the new hpo extra, including TPE/random/CMA-ES/QMC samplers, SQLite-backed resume, and median / ASHA / Hyperband pruners.
  • Adds local sequential and local parallel scheduling, with explicit static GPU groups for parallel local runs.
  • Adds SLURM submission for static sweeps plus parallel SLURM Optuna ask/tell support.
  • Adds multi_run_lora scheduling for RL sweeps that share one trainer across multiple LoRA-backed orchestrator runs.
  • Adds patience and threshold early stopping for local and shared-trainer LoRA flows.

Entrypoints and launch plumbing

  • Adds sweep as the user-facing sweep entrypoint.
  • Adds rl-multi-run for shared-trainer LoRA waves, where a single trainer and inference stack serve multiple pre-materialized orchestrator run directories.
  • Extracts shared launch primitives from rl.py into src/prime_rl/entrypoints/launch.py, including GPU mapping, subprocess startup, monitor threads, and supervised shutdown.

Metrics and monitoring

  • Adds FileMonitor, which writes step-indexed JSONL metrics when PRIME_RL_SWEEP_METRICS_JSONL is set.
  • Routes sweep trial metrics into metrics.jsonl so the controller can read final objectives and Optuna pruners can tail intermediate values.
  • Keeps non-sweep runs unchanged because the file monitor is only attached when the sweep env var is present.
  • Updates SFT training to save a final summary so SFT objectives can be attributed consistently.
  • Adds a WANDB_SHARED_PRIMARY=1 override so rl-multi-run can keep the launcher process as the primary W&B run owner.

SLURM and runtime fixes

  • Serializes uv sync --all-extras in SLURM templates with flock and sets UV_NO_SYNC=1 after sync so concurrent sweep jobs sharing one venv do not clobber each other.
  • Updates multi-node RL SLURM teardown to write a .training_complete sentinel and self-cancel after the trainer exits, allowing sweep controllers to distinguish expected self-teardown from an external cancellation.
  • Fixes the orchestrator advantage union annotation to avoid duplicate-subcommand warnings from Tyro's union expansion.
  • Adds the hpo extra with optuna>=4.
  • Pins CUDA runtime / NVRTC packages needed by vLLM's CUDA 13-linked extension on x86_64 systems where only the driver and Torch's bundled CUDA runtime are present.

Docs, examples, and skills

  • Adds docs/sweeps.md covering config structure, strategies, schedulers, objectives, pruning, early stopping, artifacts, resume behavior, and failure handling.
  • Links the sweep docs from the docs index / Mint config and adds a sweep section to docs/entrypoints.md.
  • Adds sweep examples under examples/sweep/ for local, SLURM, Optuna, pruning, resume, shared-trainer LoRA, early stopping, RL, and SFT workflows.
  • Updates the config and entrypoint skills with sweep parameter-path rules, restrictions, and resume invariants.

Coverage

  • Adds unit coverage for sweep config validation, search expansion, materialization, controller dispatch, scheduler behavior, Optuna ask/tell flows, pruning, shared-trainer LoRA reconciliation, early stopping, metrics parsing, and file-monitor semantics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant