Skip to content

Latest commit

 

History

History
180 lines (139 loc) · 6.59 KB

File metadata and controls

180 lines (139 loc) · 6.59 KB

Overview

Verifiers is our library for creating environments to train and evaluate LLMs.

Environments contain everything required to run and evaluate a model on a particular task:

  • A dataset of task inputs
  • A harness for the model (tools, sandboxes, context management, etc.)
  • A reward function or rubric to score the model's performance

Environments can be used for training models with reinforcement learning (RL), evaluating capabilities, generating synthetic data, experimenting with agent harnesses, and more.

Verifiers is tightly integrated with the Environments Hub, as well as our training framework prime-rl and our Hosted Training platform.

Getting Started

Ensure you have uv installed, as well as the prime CLI tool:

# install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# install the prime CLI
uv tool install prime
# log in to the Prime Intellect platform
prime login

To set up a new workspace for developing environments, do:

# ~/dev/my-lab
prime lab setup 

This sets up a Python project if needed (with uv init), installs verifiers (with uv add verifiers), creates the recommended workspace structure, and downloads useful starter files:

configs/
├── endpoints.toml      # OpenAI-compatible API endpoint configuration
├── rl/                 # Example configs for Hosted Training
├── eval/               # Example multi-environment eval configs
└── gepa/               # Example configs for prompt optimization
.prime/
└── skills/             # Bundled workflow skills for create/browse/review/eval/GEPA/train/brainstorm
environments/
└── AGENTS.md           # Documentation for AI coding agents
AGENTS.md               # Top-level documentation for AI coding agents
CLAUDE.md               # Claude-specific pointer to AGENTS.md

Alternatively, add verifiers to an existing project:

uv add verifiers && prime lab setup --skip-install

Environments built with Verifiers are self-contained Python modules. To initialize a fresh v1 environment template, do:

prime env init my-env --v1 # creates a new template in ./environments/my_env

This will create a new module called my_env with a basic environment template.

environments/my_env/
├── my_env.py           # Main implementation
├── pyproject.toml      # Dependencies and metadata
└── README.md           # Documentation

Environment modules should expose a load_environment function which returns an environment object. For simple legacy environments, this can still be a direct constructor:

# my_env.py
import verifiers as vf

def load_environment(dataset_name: str = 'gsm8k') -> vf.Environment:
    dataset = vf.load_example_dataset(dataset_name) # 'question'
    async def correct_answer(completion, answer) -> float:
        completion_ans = completion[-1]['content']
        return 1.0 if completion_ans == answer else 0.0
    rubric = vf.Rubric(funcs=[correct_answer])
    env = vf.SingleTurnEnv(dataset=dataset, rubric=rubric)
    return env

For new environments with reusable tasksets, toolsets, custom programs, or custom harnesses, use the v1 Taskset/Harness path:

# my_env.py
import verifiers as vf

@vf.reward(weight=1.0)
async def contains_answer(task, state) -> float:
    return float(task["answer"] in str(state.get("completion") or ""))

class MyTasksetConfig(vf.TasksetConfig):
    split: str = "train"


class MyTaskset(vf.Taskset[MyTasksetConfig]):
    _default_rewards = (contains_answer,)

    def rows(self) -> list[dict[str, object]]:
        rows = [
            {
                "prompt": [{"role": "user", "content": "Reverse abc."}],
                "answer": "cba",
                "split": "train",
                "max_turns": 1,
            }
        ]
        return [row for row in rows if row["split"] == self.config.split]


class MyHarnessConfig(vf.HarnessConfig):
    pass


class MyHarness(vf.Harness[MyHarnessConfig]):
    pass


def load_taskset(config: MyTasksetConfig) -> MyTaskset:
    return MyTaskset(config=config)


def load_harness(config: MyHarnessConfig) -> MyHarness:
    return MyHarness(config=config)


def load_environment(config: vf.EnvConfig) -> vf.Env:
    taskset = vf.load_taskset(config.taskset)
    harness = vf.load_harness(config.harness)
    return vf.Env(taskset=taskset, harness=harness)

See BYO Harness for the advanced v1 taskset/harness API. Reusable v1 taskset and harness packages live under verifiers.v1.packages while the API stabilizes, and are re-exported from verifiers.v1 for normal use. For example, Harbor task directories can run through the bundled OpenCode CLI harness with:

env = vf.Env(
    taskset=vf.HarborTaskset(config=vf.HarborTasksetConfig()),
    harness=vf.OpenCode(config=vf.OpenCodeConfig()),
)

To run a local evaluation with any OpenAI-compatible model, do:

prime eval run my-env -m openai/gpt-5-nano # run and save eval results locally

Evaluations use Prime Inference by default; configure your own API endpoints in ./configs/endpoints.toml.

View local evaluation results in the terminal UI:

prime eval view

The TUI opens a single run browser (environment -> model -> run). Press Enter on a run to open rollout details, b to go back, tab to cycle panes, e and x to expand or collapse history, pageup and pagedown to scroll history, and c for Copy Mode.

To publish the environment to the Environments Hub, do:

prime env push my-env # equivalent to --path ./environments/my_env

To run an evaluation directly from the Environments Hub, do:

prime eval run primeintellect/math-python

Documentation

Environments — Create datasets, rubrics, and custom multi-turn interaction protocols.

BYO Harness — Build v1 Taskset/Harness environments with custom tools, sandboxes, users, and custom programs.

Evaluation - Evaluate models using your environments.

Training — Train models in your environments with reinforcement learning.

Development — Contributing to verifiers

API Reference — Understanding the API and data structures

FAQs - Other frequently asked questions.