Skip to content

Proposal: verified_skills mode — deterministic, auditable skill evolution (no GPU required) #55

@nutstrut

Description

@nutstrut

Hi — I’ve been working on a verification layer for agent execution and evolution, and MetaClaw’s skills_only mode is a perfect fit for this.

This proposal adds deterministic verification to skill promotion without requiring RL or GPU compute.

Proposal: verified_skills mode — GPU-free skill evolution with deterministic improvement verification

Summary

MetaClaw's skills_only mode is excellent for operators without GPU access. However, skill promotion currently relies on LLM judgment — which is subjective and unauditable. This proposal adds a verified_skills mode that gates skill promotion on deterministic, cryptographically signed verification, eliminating the need for cloud RL while producing a trustworthy, auditable skill evolution history.


The Gap

In skills_only mode, the evolution loop works like this:

Conversation ends
    ↓
LLM analyzes session
    ↓
New skills extracted and summarized
    ↓
Skills promoted to permanent library

The problem: there is no deterministic check that a promoted skill actually improved performance. Promotion is gated on LLM self-evaluation, which is:

  • Subjective (same inputs can produce different verdicts)
  • Unverifiable by third parties
  • Not auditable across environments
  • Vulnerable to regression (a skill that hurts performance can be promoted)

In RL mode, weight updates provide a learning signal — but require cloud compute. In skills_only mode, there is no equivalent verification mechanism at all.


The Proposal: verified_skills mode

Add a new operating mode that sits between skills_only and rl:

Mode          | GPU | Cloud | Verified | Auditable
--------------|-----|-------|----------|----------
skills_only   | No  | No    | No       | No
verified_skills (new) | No | No | Yes | Yes
rl            | No  | Yes   | Partial  | No
madmax        | No  | Yes   | Partial  | No

verified_skills adds one step to the existing skills_only loop:

Conversation ends
    ↓
LLM analyzes session (existing)
    ↓
New skills extracted (existing)
    ↓
NEW: Spec defined — "what does improvement look like?"
    ↓
NEW: SettlementWitness verifies deterministically
         PASS → skill promoted with receipt_id attached
         FAIL → skill rejected, counter-evidence logged
         INDETERMINATE → flagged for human review
    ↓
NEW: receipt_id stored in skill metadata
    ↓
Full audit trail — every promoted skill is provably verified

What SettlementWitness Is

SettlementWitness is a stateless verification oracle for agent workflows. It evaluates whether an output matches a specification and returns a cryptographically signed receipt (SAR — Settlement Attestation Receipt).

Key properties relevant to MetaClaw:

  • Deterministic — identical inputs always produce identical verdicts
  • Ed25519 signed — receipts are cryptographically verifiable
  • Offline verifiable — no callbacks required after receipt is issued
  • Stateless — no session state, no dependencies
  • Free during adoption — no cost for integration

The verification call is simple:

import httpx

response = httpx.post(
    "https://defaultverifier.com/settlement-witness",
    json={
        "task_id": f"skill-evolution-{skill_name}-{timestamp}",
        "agent_id": f"{wallet}:metaclaw",
        "spec": {
            "improvement_type": "skill_promotion",
            "skill_name": skill_name,
            "expected": "skill improves agent performance on defined criteria"
        },
        "output": {
            "skill_name": skill_name,
            "skill_content": skill_content,
            "evaluation_criteria": criteria,
            "evaluation_result": evaluation_result
        }
    }
)

receipt = response.json()
verdict = receipt["receipt_v0_1"]["verdict"]  # PASS | FAIL | INDETERMINATE
receipt_id = receipt["receipt_v0_1"]["receipt_id"]

Skill Metadata with Receipt

When a skill is promoted under verified_skills mode, its metadata includes the verification receipt:

{
  "skill_name": "handle_api_rate_limits",
  "version": "1.0",
  "promoted_at": "2026-04-01T08:00:00Z",
  "verified": true,
  "receipt_id": "sha256:14be931e638ef93d043edc0c3feaf37bcbab33691b25997fefcef1b9b9062d00",
  "verifier_kid": "sar-prod-ed25519-02",
  "verdict": "PASS",
  "promotion_source": "verified_skills"
}

Unverified skills (promoted in skills_only mode) remain valid — this is fully backward compatible.


Rollback Logic

verified_skills mode enables something skills_only cannot: safe rollback.

# If a previously promoted skill later receives FAIL verdicts
# across N sessions, trigger rollback:

if fail_count >= ROLLBACK_THRESHOLD:
    skill.status = "reverted"
    skill.revert_reason = f"Failed verification {fail_count} times"
    skill.reverted_at = timestamp
    # Log counter-evidence receipt

This turns MetaClaw's skill library from append-only to self-correcting.


Why This Matters for Academic Rigor

MetaClaw's technical report describes skill evolution as a core contribution. The current implementation has a reproducibility gap: skill promotion decisions are made by an LLM judge whose outputs are non-deterministic across environments.

verified_skills mode closes this gap:

  • Every promotion decision is deterministically reproducible
  • Every receipt can be independently verified by any third party
  • Skill evolution history becomes a cryptographically auditable record
  • Results are comparable across environments and deployments

This directly strengthens the empirical claims in the technical report.


Implementation Scope

The change is contained and non-breaking:

New config option:

mode: verified_skills  # new option alongside skills_only, rl, madmax
verification:
  endpoint: "https://defaultverifier.com/settlement-witness"
  agent_id: "your_wallet:metaclaw"
  rollback_threshold: 3  # fail count before rollback triggers
  require_pass_for_promotion: true

New dependency:

httpx  # already likely present for proxy

No GPU. No cloud training backend. No Tinker API key required.

Files affected:

  • metaclaw/config.py — add verified_skills mode + verification config
  • metaclaw/skills/ — add receipt metadata to skill storage format
  • metaclaw/trainer.py or equivalent — add verification gate before promotion
  • README.md — document new mode in the mode comparison table

Integration Reference

A working reference implementation is available:

  • Live endpoint: https://defaultverifier.com/settlement-witness
  • Public key registry: https://defaultverifier.com/.well-known/sar-keys.json
  • TypeScript SDK: npm install sar-sdk ([sarprotocol.org](https://sarprotocol.org))
  • Spec: https://defaultverifier.com/spec/sar-v0.1
  • MCP server: https://defaultverifier.com/mcp

The endpoint is live, deterministic, and free to call. No API key required.


What This Enables for MetaClaw Users

For operators without GPU access:

  • Full skill evolution capability without cloud compute costs
  • Verified improvement history they can trust and audit

For researchers:

  • Reproducible skill promotion decisions
  • Cryptographic audit trail for empirical claims
  • Cross-environment comparability

For the ecosystem:

  • Promoted skills carry receipt_id — any downstream system can verify
  • Skill libraries become portable, trustworthy artifacts
  • Third parties can audit MetaClaw evolution history independently

Relationship to RL Mode

verified_skills is not a replacement for RL — it's a complement:

verified_skills  → behavioral improvement via verified skill injection
                   no weight updates, no cloud, full verification

rl / madmax      → weight updates via cloud training
                   + optional SAR verification of weight update outcomes

A future verified_rl mode could add SAR verification gates to weight updates as well — only applying updates that produce PASS outcomes across a validation set.


Offer

Happy to:

  • Contribute a reference implementation PR
  • Provide test fixtures and sample receipts for validation
  • Coordinate with the AIMING Lab team on spec alignment

This feels like a natural extension of MetaClaw’s architecture — bringing reproducibility and auditability to skill evolution.
The verification infrastructure is live and production-ready. Integration is a contained addition to the existing skill promotion flow.


Related:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions