Hyperparameter optimization sweep feature#2502
Draft
RhizoNymph wants to merge 6 commits into
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a hyperparameter sweep framework to prime-rl. The new
sweepentrypoint materializes and launches studies forrlorsfttarget configs using grid search, seeded random search, or Optuna, with support for local execution, SLURM submission, and shared-trainer LoRA waves.Sweep configs are TOML files that define the target entrypoint, base config(s), strategy, scheduler, optional objective / early stopping settings, and dotted-path
[parameters]overrides into the target config:This is intentionally a large PR because the core sweep loop, materialization layer, schedulers, docs, examples, and tests are tightly connected. That said, I'm happy to split this into separate staged PRs if that would make review easier.
Each piece of the sweep infrastructure was also exercised end to end and successfully performed an optimization sweep. SLURM features were tested on a local 3 node cluster where each node has a 3090. Local parallel run was tested on a Prime Intellect rented node with 4 A6000s.
What's Included
Sweep framework
src/prime_rl/sweep/with the controller, materializer, search expansion, Optuna loops, scheduler integrations, early stopping, metric attribution, reproducibility metadata, and run-control helpers.overrides.toml,resolved.toml,command.txt,status.json, and a study-levelmanifest.json.rl/sftconfigs before launch so bad parameter paths or invalid sampled values fail during materialization instead of after jobs have been submitted.Strategies and schedulers
hpoextra, including TPE/random/CMA-ES/QMC samplers, SQLite-backed resume, and median / ASHA / Hyperband pruners.multi_run_lorascheduling for RL sweeps that share one trainer across multiple LoRA-backed orchestrator runs.Entrypoints and launch plumbing
sweepas the user-facing sweep entrypoint.rl-multi-runfor shared-trainer LoRA waves, where a single trainer and inference stack serve multiple pre-materialized orchestrator run directories.rl.pyintosrc/prime_rl/entrypoints/launch.py, including GPU mapping, subprocess startup, monitor threads, and supervised shutdown.Metrics and monitoring
FileMonitor, which writes step-indexed JSONL metrics whenPRIME_RL_SWEEP_METRICS_JSONLis set.metrics.jsonlso the controller can read final objectives and Optuna pruners can tail intermediate values.WANDB_SHARED_PRIMARY=1override sorl-multi-runcan keep the launcher process as the primary W&B run owner.SLURM and runtime fixes
uv sync --all-extrasin SLURM templates withflockand setsUV_NO_SYNC=1after sync so concurrent sweep jobs sharing one venv do not clobber each other..training_completesentinel and self-cancel after the trainer exits, allowing sweep controllers to distinguish expected self-teardown from an external cancellation.advantageunion annotation to avoid duplicate-subcommand warnings from Tyro's union expansion.hpoextra withoptuna>=4.Docs, examples, and skills
docs/sweeps.mdcovering config structure, strategies, schedulers, objectives, pruning, early stopping, artifacts, resume behavior, and failure handling.docs/entrypoints.md.examples/sweep/for local, SLURM, Optuna, pruning, resume, shared-trainer LoRA, early stopping, RL, and SFT workflows.Coverage