Add strict package loaders for v1 tasksets and harnesses#1429
Conversation
ApprovabilityVerdict: Needs human review This PR introduces new public APIs ( You can customize Macroscope's approvability policy. Learn more. |
| TasksetInput: TypeAlias = Taskset | ||
| HarnessInput: TypeAlias = Harness | None |
| def resolve_taskset(value: object) -> Taskset: | ||
| if isinstance(value, Taskset): | ||
| return value |
There was a problem hiding this comment.
why do we ever need to call resolve on an alr resolved object? would only resolve config -> obj
4b83ce0 to
c28641f
Compare
c28641f to
43cd779
Compare
43cd779 to
c13ef08
Compare
ed462be to
e1748d7
Compare
e1748d7 to
47dd4d9
Compare
47dd4d9 to
ba77fb7
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 4141235. Configure here.
|
|
||
| [options] | ||
| exclude-newer = "0001-01-01T00:00:00Z" # This has no effect and is included for backwards compatibility when using relative exclude-newer values. | ||
| exclude-newer-span = "P7D" |

Description
Adds strict v1 component package loaders for tasksets and harnesses. This first pass assumes component packages are already installed/importable; automatic installation belongs in a follow-up preflight layer.
The canonical environment package pattern is:
EnvConfigis the package-loading envelope: it keepstasksetandharnessas config objects while preserving[eval.taskset]and[eval.harness]package fields from TOML/CLI until component loading. Package authors write typed loaders such asload_taskset(config: MyTasksetConfig)andload_harness(config: MyHarnessConfig); Verifiers imports the package, reads the loader signature, validates the explicit config data into that typed config class, and calls the loader with the typed object.Tasksets and harnesses load independently. If a taskset/harness pair needs to coordinate, the pair should click together at
vf.Env(taskset=..., harness=...)/ runtime resolution or fail there with a local error. Harness loaders do not receive tasksets.Includes:
vf.load_environment, includinguser/package-nameand barepackage-namenormalizationid/taskset_idandid/harness_idaliases with conflict checkstaskset_id/harness_iduv exclude-newer = "7 days"lock protection, with Prime-published package exceptions kept in[tool.uv.exclude-newer-package]Follow-Up: Auto Install
The install follow-up should live before config validation attempts package imports:
[eval.taskset],[eval.harness], and CLI overrides without validating the full typed config.vf-evalcall that resolver beforevf.load_environment(...); Prime CLI can call the same resolver or delegate to the same installer path.vf.load_environment,vf.load_taskset, andvf.load_harnessstrict and import-only after preflight. They should not embed Prime API/package-install side effects inside Pydantic validators or loader signature inspection.Testing
uv run pytest tests/test_v1_config_extension.py tests/test_v1_rlm_swe.py tests/test_eval_cli.py -quv run ruff check .uv run ty check verifiersuv run pre-commit run --all-filesb0d378d0passedNote
Add
vf.load_tasksetandvf.load_harnessstrict package loaders for v1 environmentsvf.load_tasksetandvf.load_harnessas top-level APIs in verifiers/v1/env.py and exports them viaverifiers/__init__.py. Each loader accepts a typed config or string package id, dynamically imports the component module, validates the loader signature, coerces config to the package-specific type, and returns a typed component instance.EnvConfigin verifiers/v1/config.py now resolvestaskset_id/harness_idduring validation by importing the component package, inferring the config type from the loader signature, and coercing the config data accordingly — without calling the loaders at validation time.Env.__init__now requiresTaskset/Harnessinstances (not config objects) intaskset=/harness=parameters; passing a config directly to those parameters now raisesTypeError.--v1flag is added to theprime env initCLI; generated environment templates now includeload_taskset,load_harness, and aload_environment(config: vf.EnvConfig)that wires both components.configparameter presence, annotation, and signature shape on package-provided loaders.Env(taskset=config, ...)andload_harness(None)call sites are now breaking errors; all callers must pass prebuilt component instances or use the new loader APIs.Changes since #1429 opened
prepare_base_env_configfunction withinverifiers/utils/env_utils.py[b0d378d]Macroscope summarized 4141235.
Note
High Risk
High risk because it changes the v1 environment construction contract (e.g.,
Envnow requires concreteTaskset/Harnessobjects and introduces strict package-ID based loading), which can break existing environment packages and TOML-driven runs if loaders/signatures/configs don’t match.Overview
Adds strict v1 component-package loading for
TasksetandHarness. Introduces publicvf.load_taskset/vf.load_harnessAPIs (exported fromverifiersandverifiers.v1) that import component packages by ID, validate loader signatures, coerce config into the loader’s typedconfigmodel, and return instantiated components.Shifts the canonical env authoring pattern to independent component loaders.
EnvConfignow treats[eval.taskset]/[eval.harness]as unresolved package envelopes (supportingidaliases) and resolves the concrete child config types via loader annotations during validation (without calling loaders).vf.Envinitialization is tightened to accept only prebuiltTaskset/Harnessobjects (orconfig=which is resolved via the new loaders).Updates templates/docs/skills and
prime env init(new--v1flag) to generate the newload_taskset/load_harness+load_environment(config: vf.EnvConfig)shape, updates TOML parsing/tests to include componentids, adds loader functions to bundled v1 packages, and adjusts several environments to require non-optional typedconfigparameters. Also tweaks RLM SWE task env var handling and restores globaluvexclude-newer = "7 days"configuration (with lockfile updates).Reviewed by Cursor Bugbot for commit b0d378d. Bugbot is set up for automated code reviews on this repo. Configure here.