Skip to content

Add strict package loaders for v1 tasksets and harnesses#1429

Merged
willccbb merged 12 commits into
mainfrom
codex/add-taskset-and-harness-loaders
May 22, 2026
Merged

Add strict package loaders for v1 tasksets and harnesses#1429
willccbb merged 12 commits into
mainfrom
codex/add-taskset-and-harness-loaders

Conversation

@willccbb
Copy link
Copy Markdown
Member

@willccbb willccbb commented May 21, 2026

Description

Adds strict v1 component package loaders for tasksets and harnesses. This first pass assumes component packages are already installed/importable; automatic installation belongs in a follow-up preflight layer.

The canonical environment package pattern is:

import verifiers as vf


def load_environment(config: vf.EnvConfig) -> vf.Env:
    taskset = vf.load_taskset(config.taskset)
    harness = vf.load_harness(config.harness)
    return vf.Env(taskset=taskset, harness=harness)

EnvConfig is the package-loading envelope: it keeps taskset and harness as config objects while preserving [eval.taskset] and [eval.harness] package fields from TOML/CLI until component loading. Package authors write typed loaders such as load_taskset(config: MyTasksetConfig) and load_harness(config: MyHarnessConfig); Verifiers imports the package, reads the loader signature, validates the explicit config data into that typed config class, and calls the loader with the typed object.

Tasksets and harnesses load independently. If a taskset/harness pair needs to coordinate, the pair should click together at vf.Env(taskset=..., harness=...) / runtime resolution or fail there with a local error. Harness loaders do not receive tasksets.

Includes:

  • package ID discovery matching vf.load_environment, including user/package-name and bare package-name normalization
  • symmetric id / taskset_id and id / harness_id aliases with conflict checks
  • full package ID propagation into taskset_id / harness_id
  • strict loader-signature failures for missing, untyped, broad, wrong-family, optional, or extra-parameter config annotations
  • v1 init templates and create-environments skill guidance using the canonical independent taskset+harness loader pattern
  • restored global uv exclude-newer = "7 days" lock protection, with Prime-published package exceptions kept in [tool.uv.exclude-newer-package]

Follow-Up: Auto Install

The install follow-up should live before config validation attempts package imports:

  • Add a shared preflight resolver that extracts only IDs from the top-level env, [eval.taskset], [eval.harness], and CLI overrides without validating the full typed config.
  • Have vf-eval call that resolver before vf.load_environment(...); Prime CLI can call the same resolver or delegate to the same installer path.
  • Keep vf.load_environment, vf.load_taskset, and vf.load_harness strict and import-only after preflight. They should not embed Prime API/package-install side effects inside Pydantic validators or loader signature inspection.
  • Install env/taskset/harness packages from their IDs, then re-enter the current typed validation flow unchanged.

Testing

  • uv run pytest tests/test_v1_config_extension.py tests/test_v1_rlm_swe.py tests/test_eval_cli.py -q
  • uv run ruff check .
  • uv run ty check verifiers
  • uv run pre-commit run --all-files
  • pre-commit/pre-push hooks: Ruff, Semgrep v1 policy, generated AGENTS check, and ty CI parity
  • GitHub Actions on b0d378d0 passed

Note

Add vf.load_taskset and vf.load_harness strict package loaders for v1 environments

  • Introduces vf.load_taskset and vf.load_harness as top-level APIs in verifiers/v1/env.py and exports them via verifiers/__init__.py. Each loader accepts a typed config or string package id, dynamically imports the component module, validates the loader signature, coerces config to the package-specific type, and returns a typed component instance.
  • EnvConfig in verifiers/v1/config.py now resolves taskset_id/harness_id during validation by importing the component package, inferring the config type from the loader signature, and coercing the config data accordingly — without calling the loaders at validation time.
  • Env.__init__ now requires Taskset/Harness instances (not config objects) in taskset=/harness= parameters; passing a config directly to those parameters now raises TypeError.
  • A new --v1 flag is added to the prime env init CLI; generated environment templates now include load_taskset, load_harness, and a load_environment(config: vf.EnvConfig) that wires both components.
  • Component loader validation in verifiers/v1/utils/component_utils.py enforces strict config parameter presence, annotation, and signature shape on package-provided loaders.
  • Risk: Env(taskset=config, ...) and load_harness(None) call sites are now breaking errors; all callers must pass prebuilt component instances or use the new loader APIs.

Changes since #1429 opened

  • Removed automatic harness ID injection in prepare_base_env_config function within verifiers/utils/env_utils.py [b0d378d]
  • Updated test suite to validate that harness components are not implicitly bound to environment packages [b0d378d]

Macroscope summarized 4141235.


Note

High Risk
High risk because it changes the v1 environment construction contract (e.g., Env now requires concrete Taskset/Harness objects and introduces strict package-ID based loading), which can break existing environment packages and TOML-driven runs if loaders/signatures/configs don’t match.

Overview
Adds strict v1 component-package loading for Taskset and Harness. Introduces public vf.load_taskset/vf.load_harness APIs (exported from verifiers and verifiers.v1) that import component packages by ID, validate loader signatures, coerce config into the loader’s typed config model, and return instantiated components.

Shifts the canonical env authoring pattern to independent component loaders. EnvConfig now treats [eval.taskset]/[eval.harness] as unresolved package envelopes (supporting id aliases) and resolves the concrete child config types via loader annotations during validation (without calling loaders). vf.Env initialization is tightened to accept only prebuilt Taskset/Harness objects (or config= which is resolved via the new loaders).

Updates templates/docs/skills and prime env init (new --v1 flag) to generate the new load_taskset/load_harness + load_environment(config: vf.EnvConfig) shape, updates TOML parsing/tests to include component ids, adds loader functions to bundled v1 packages, and adjusts several environments to require non-optional typed config parameters. Also tweaks RLM SWE task env var handling and restores global uv exclude-newer = "7 days" configuration (with lockfile updates).

Reviewed by Cursor Bugbot for commit b0d378d. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread verifiers/scripts/init.py
Comment thread verifiers/scripts/init.py
Comment thread verifiers/scripts/init.py
@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 21, 2026

Approvability

Verdict: Needs human review

This PR introduces new public APIs (vf.load_taskset, vf.load_harness) and changes the v1 environment construction pattern. The new component loader infrastructure, runtime behavior changes in environment variable handling, and open design questions from reviewers warrant human review.

You can customize Macroscope's approvability policy. Learn more.

@willccbb willccbb requested review from mikasenghaas and xeophon May 21, 2026 08:09
Comment thread verifiers/v1/env.py Outdated
Comment on lines +22 to +23
TasksetInput: TypeAlias = Taskset
HarnessInput: TypeAlias = Harness | None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we need those?

Comment thread verifiers/v1/env.py Outdated
Comment on lines 167 to 169
def resolve_taskset(value: object) -> Taskset:
if isinstance(value, Taskset):
return value
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we ever need to call resolve on an alr resolved object? would only resolve config -> obj

Comment thread verifiers/v1/env.py Outdated
Comment thread verifiers/v1/env.py Outdated
Comment thread verifiers/v1/env.py Outdated
Comment thread docs/environments.md
Copy link
Copy Markdown
Member

@mikasenghaas mikasenghaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, yea i like

Comment thread docs/reference.md Outdated
Comment thread verifiers/v1/env.py Outdated
Comment thread verifiers/v1/env.py Outdated
Comment thread verifiers/v1/config.py
@willccbb willccbb force-pushed the codex/add-taskset-and-harness-loaders branch from 4b83ce0 to c28641f Compare May 22, 2026 03:10
Comment thread verifiers/v1/utils/component_utils.py
@willccbb willccbb force-pushed the codex/add-taskset-and-harness-loaders branch from c28641f to 43cd779 Compare May 22, 2026 03:13
Comment thread verifiers/v1/env.py
Comment thread verifiers/v1/config.py
@willccbb willccbb force-pushed the codex/add-taskset-and-harness-loaders branch from 43cd779 to c13ef08 Compare May 22, 2026 03:21
Comment thread verifiers/scripts/init.py
@willccbb willccbb force-pushed the codex/add-taskset-and-harness-loaders branch from ed462be to e1748d7 Compare May 22, 2026 06:24
Comment thread environments/rlm_swe_v1/rlm_swe_v1.py
@willccbb willccbb force-pushed the codex/add-taskset-and-harness-loaders branch from e1748d7 to 47dd4d9 Compare May 22, 2026 06:40
Comment thread verifiers/v1/env.py
@willccbb willccbb force-pushed the codex/add-taskset-and-harness-loaders branch from 47dd4d9 to ba77fb7 Compare May 22, 2026 06:52
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 4141235. Configure here.

Comment thread verifiers/utils/env_utils.py Outdated
Comment thread uv.lock Outdated

[options]
exclude-newer = "0001-01-01T00:00:00Z" # This has no effect and is included for backwards compatibility when using relative exclude-newer values.
exclude-newer-span = "P7D"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we want to have this in

@willccbb willccbb merged commit 56bcc0d into main May 22, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants