Inspect Flow

Workflow orchestration for Inspect AI that enables you to run evaluations at scale with repeatability and maintainability.

Why Inspect Flow?

As evaluation workflows grow in complexity—running multiple tasks across different models with varying parameters—managing these experiments becomes challenging. Inspect Flow addresses this by providing:

Declarative Configuration: Define complex evaluations with tasks, models, and parameters in type-safe schemas
Repeatable & Shareable: Encapsulated definitions of tasks, models, configurations, and Python dependencies ensure experiments can be reliably repeated and shared
Incremental Execution: Add new models, tasks, or configurations to existing results without re-running completed work
Parameter Sweeping: Matrix patterns for systematic exploration across tasks, models, and hyperparameters

Inspect Flow is designed for researchers and engineers running systematic AI evaluations who need to scale beyond ad-hoc scripts.

Getting Started

Prerequisites

Before using Inspect Flow, you should:

Have familiarity with Inspect AI
Have an existing Inspect evaluation or use one from inspect-evals

Installation

pip install inspect-flow

Basic Example

FlowJob is the main entrypoint for defining evaluation runs. At its core, it takes a list of tasks to run. Here's a simple example that runs two evaluations:

from inspect_flow import FlowJob, FlowTask

FlowJob(
    log_dir="logs",
    tasks=[
        FlowTask(
            name="inspect_evals/gpqa_diamond",
            model="openai/gpt-4o",
        ),
        FlowTask(
            name="inspect_evals/mmlu_0_shot",
            model="openai/gpt-4o",
        ),
    ],
)

To run the evaluations, run the following command in your shell. This will create a virtual environment for this job run and install the dependencies. Note that task and model dependencies (like the inspect-evals and openai Python packages) are inferred and installed automatically.

flow run config.py

This will run both tasks and display progress in your terminal.

Matrix Functions

Often you'll want to evaluate multiple tasks across multiple models. Rather than manually defining every combination, use tasks_matrix to generate all task-model pairs:

from inspect_flow import FlowJob, tasks_matrix

FlowJob(
    log_dir="logs",
    tasks=tasks_matrix(
        task=[
            "inspect_evals/gpqa_diamond",
            "inspect_evals/mmlu_0_shot",
        ],
        model=[
            "openai/gpt-5",
            "openai/gpt-5-mini",
        ],
    ),
)

To preview the expanded config before running it, you can run the following command in your shell to ensure the generated config is the one that you intend to run.

flow config matrix.py

This command outputs the expanded configuration showing all 4 task-model combinations (2 tasks × 2 models).

log_dir: logs
dependencies:
- inspect-evals
tasks:
- name: inspect_evals/gpqa_diamond
  model:
    name: openai/gpt-5
- name: inspect_evals/gpqa_diamond
  model:
    name: openai/gpt-5-mini
- name: inspect_evals/mmlu_0_shot
  model:
    name: openai/gpt-5
- name: inspect_evals/mmlu_0_shot
  model:
    name: openai/gpt-5-mini

tasks_matrix and models_matrix are powerful functions that can operate on multiple levels of nested matrixes which enable sophisticated parameter sweeping. Let's say you want to explore different reasoning efforts across models—you can achieve this with the models_matrix function.

For even more concise parameter sweeping, use configs_matrix to generate configuration variants.

Run evaluations

To run the config:

flow run matrix.py

When complete, you'll find a link to the logs at the bottom of the task results summary.

To view logs interactively, run:

inspect view

Learning More

See the following articles to learn more about using Flow:

Flow Concepts: Flow type system, config structure and basics.
Reference: Detailed documentation on the Flow Python API and CLI commands.

Development

To work on development of Inspect Flow, clone the repository and install with the -e flag and [dev, doc] optional dependencies:

git clone https://github.com/meridianlabs-ai/inspect_flow
cd inspect_flow
uv sync
source .venv/bin/activate

Optionally install pre-commit hooks via

make hooks

Run linting, formatting, and tests via

make check
make test

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
.github		.github
.vscode		.vscode
docs		docs
examples		examples
scripts		scripts
src/inspect_flow		src/inspect_flow
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Inspect Flow

Why Inspect Flow?

Getting Started

Prerequisites

Installation

Basic Example

Matrix Functions

Run evaluations

Learning More

Development

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

meridianlabs-ai/inspect_flow

Folders and files

Latest commit

History

Repository files navigation

Inspect Flow

Why Inspect Flow?

Getting Started

Prerequisites

Installation

Basic Example

Matrix Functions

Run evaluations

Learning More

Development

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages