Skip to content

Latest commit

 

History

History
277 lines (192 loc) · 11.1 KB

File metadata and controls

277 lines (192 loc) · 11.1 KB

Geval

Geval

Decision orchestration and reconciliation for AI changes.

You bring all kinds of signals and your rules. Geval orchestrates and reconciles them into one outcome. No brain — just your rules applied, every time.

Release MIT License CI


Demo video

Watch the Geval demo on YouTube → — walkthrough of how Geval turns signals and policy rules into PASS, REQUIRE_APPROVAL, or BLOCK.


Try it in under a minute

1. Download (pick your OS):

# Linux
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-linux-x86_64 -o geval && chmod +x geval

# macOS (Apple Silicon)
curl -sSL https://github.com/geval-labs/geval/releases/latest/download/geval-macos-aarch64 -o geval && chmod +x geval

# Windows (PowerShell) — see note below
Invoke-WebRequest -Uri https://github.com/geval-labs/geval/releases/latest/download/geval-windows-x86_64.exe -OutFile geval.exe

Windows: Open PowerShell as Administrator (right‑click → Run as administrator). Then run the download command and .\geval.exe demo.

2. Run the demo (no files needed):

./geval demo          # Linux / macOS — use ./ so you run this binary, not another "geval" in PATH
.\geval.exe demo      # Windows (same folder as geval.exe)

You get a report and one outcome: PASS, REQUIRE_APPROVAL, or BLOCK — produced by the demo contract and signals. Use in CI →

No binary for your OS? Build from source.

If you see "unknown command 'init'" or "required option '--eval'" — you're running a different program named geval (e.g. from npm or another install). Use the binary from Releases or build from source and run it with ./geval (or put it first in your PATH).

Start from a template (like create-react-app)

Inside your project (your codebase is not changed except for one new folder), run the same binary you downloaded (e.g. ./geval):

./geval init          # or: /path/to/geval init

This creates a .geval folder with:

  • contract.yaml — Names your release gate, versions it, and lists policy files to evaluate.
  • policies/ — Two starter files with descriptive names (safety-and-blocking.yaml, quality-and-approval.yaml); edit rules to match your metrics.
  • signals.json — Example pipeline metrics; replace with your real signal names and values.
  • README.md — What each file is for and how to run checks.

Then run:

./geval check --contract .geval/contract.yaml --signals .geval/signals.json

Use a different folder: ./geval init my-rules. Overwrite existing files: ./geval init --force.

Updating

Use the same download commands. Replace your old file with the new one. Check version: geval --version.


Use Geval with your own signals and contract

You need a contract (one YAML that references one or more policy files) and a signals file. Geval evaluates each policy against the same signals, then combines outcomes (e.g. all must pass, or any block blocks). Use geval init for a template with a contract and two policies, or create the files yourself below.

All kinds of signals: Not every signal needs a score. You can mix: entries with a numeric value, and entries with no value (presence-only). Use a rule with operator: presence to match “this metric exists.” Details →

Step 1: Your signals (data file)

A list of evidence: what you measured, observed, or flagged. Each item has a metric (name). Value is optional — use it for scores; omit it for “this happened” (presence-only).

Example — save as mydata.json:

{
  "signals": [
    { "metric": "accuracy", "value": 0.94 },
    { "metric": "engagement_drop", "value": 0.02 }
  ]
}

You can add labels like component or system if you need them. Full example →

Step 2: Your contract and policies

A contract is a YAML file that lists one or more policy files and a combination rule (how to merge their outcomes). Each policy file contains rules with unique priorities: When [condition on signals], then [pass / block / require_approval].

Prefer a form instead of writing YAML by hand? Use config.geval.io to generate Geval-compatible contract.yaml and policy files (download or copy), then validate with geval validate-contract and run geval check as below.

Example contract — save as contract.yaml:

name: my-gate
version: "1.0.0"
combine: worst_case
policies:
  - path: policy.yaml

Example policy — save as policy.yaml (path relative to the contract file):

name: quality
version: "1.0.0"
policy:
  rules:
    - priority: 1
      name: block_bad_engagement
      when:
        metric: engagement_drop
        operator: ">"
        threshold: 0
      then:
        action: block
    - priority: 2
      name: allow_good_accuracy
      when:
        metric: accuracy
        operator: ">="
        threshold: 0.9
      then:
        action: pass

Combine (worst_case): any BLOCK wins; else any require_approval; else pass. Rule priorities must be unique per policy; 1 = highest; Geval records every match and the best priority wins. Operators: >, <, >=, <=, ==, presence. Actions: pass, block, require_approval.

Full example → and policy →

Step 3: Run Geval

./geval check --contract contract.yaml --signals mydata.json

(Windows: .\geval.exe check --contract contract.yaml --signals mydata.json)

Step 4: Read the outcome

  • PASS — Every policy passed (or combined rule says go).
  • REQUIRE_APPROVAL — At least one policy requires approval.
  • BLOCK — At least one policy blocks.

To see per-policy results and the combined decision:

./geval explain --contract contract.yaml --signals mydata.json

To validate the contract and all referenced policies:

./geval validate-contract contract.yaml

The problemWhat Geval isCommandsDocs


The problem

You have many signals: scores, A/B results, human reviews, flags. You change a model or a prompt. Then what?

  • One signal says “better.”
  • Another says “worse.”
  • Someone asks: “Do we ship?”

Today that call happens in chat or a meeting. Hard to repeat. Hard to audit. You don't need a system that "decides" for you — you need orchestration and reconciliation: one place to define rules, one place to feed all your signals (not just numbers), and one deterministic outcome every time.


What Geval is

Geval is a decision orchestration and reconciliation engine. It does not make decisions. It has no brain. You provide:

  1. Your signals (one file) — any kind: scores, presence-only, flags, labels. Non-uniform is fine.
  2. Your rules (one file) — e.g. “If engagement drops, block. If accuracy is below X, need approval.”

Geval orchestrates the run and reconciles your signals against your rules in order. Same inputs + same rules = same outcome. It returns:

Outcome Meaning
PASS No rule matched a block or require-approval. Good to go.
REQUIRE_APPROVAL A rule matched; it says a person must approve first.
BLOCK A rule matched; it says don’t ship. Fix first.

Each run is recorded: which rules, which signals, when. So you can always answer: “Why did we ship?” and “Who approved?” — without any black box.


Commands

Run with ./geval (or ensure this repo’s binary is the one in your PATH):

Command What it does
./geval demo Run the built-in example. Try this first.
./geval init Create .geval/ with contract and policies. Edit and run.
./geval check --contract <file> --signals <file> Evaluate contract → one outcome (PASS / REQUIRE_APPROVAL / BLOCK)
./geval explain --contract <file> --signals <file> Per-policy results and combined decision report
./geval validate-contract <file> Validate contract and all referenced policies
./geval approve / ./geval reject Record a person’s approval or rejection

Documentation

Guide Description
Demo video (YouTube) Walkthrough of Geval
Config generator (web) Fill in forms → download contract.yaml and policies
Architecture Contract = multiple policies + combine rule; module layout
Signals and rules Non-uniform signals (scores, presence-only, mix); how rules use them
Signal assumptions What we assume; what input forms we accept (number, string, trace, object)
Versioning Contract, policy, and signals versioning; nothing unversioned
Extending How to add a combination rule or change behavior; process and conventions
GitHub Actions Use Geval in CI
Examples Sample data and rules files
Customer demo (feature story) Signals, policies, rules, and PASS/BLOCK/approval narrative for demos
Installation Install, PATH, build from source
Developer workflow PRs, check, approve/reject
Auditing How decisions are recorded

Contributing

Contributions welcome. CONTRIBUTING.md. Build from source: Installation.


License

MIT © Geval Contributors


WebsiteDemo videoConfig generatorReleasesGitHub