verify: post-audit via non-Claude agent (excuse-check + repro, notify-don't-block)

## Problem

Verify mode's evidence is author-controlled: the orchestrator runs VP steps, summarises results, and writes `verification-evidence/*.json` itself. The stricter triage rules committed in `c331312` raise the policy bar but don't close the forgery hole — a lying or lazy orchestrator can still type `\"status\": \"pass\"` and nothing catches it.

The same hole applies to agentic VP steps (e.g. \"log in and confirm X renders\") — you can't wrap them in a bash call to capture an exit code, so evidence is inherently narrative.

## Design — Two-phase post-audit, non-Claude, notify-don't-block

Verify mode runs as-is. After evidence is written, a `kata vp-audit` command runs an external (non-Claude) agent in two phases:

```
verify mode runs as-is → evidence committed → kata vp-audit fires
                                              ├── Phase A: excuse check (cheap, text only)
                                              │     non-Claude LLM reads evidence + transcript + diff
                                              │     flags: skip/partial, infra-gap terminals, <3-attempt fails
                                              └── Phase B: full repro (only if A clean)
                                                    non-Claude agent with tools
                                                    runs each VP step end-to-end
                                                    on any failure → log + notify, do NOT block
```

### Why this shape

- **No new trust plumbing** — audit is an observer, not a gate. No sealed.json split, no PreToolUse write-block, no stop-condition changes.
- **Phase A is ~one API call** — catches ~80% of cop-outs for the cost of one \`kata review\`.
- **Phase B gated on A** — don't pay for repro when the evidence already admits skips.
- **Notify-don't-block tolerates flakiness** — a flaky repro doesn't deadlock a session. Discrepancies surface to the human.
- **Non-Claude required** — removes shared-model sycophancy/jailbreak surface.

## Surface

- `src/commands/vp-audit.ts` — new command, two-phase runner
- `batteries/prompts/vp-audit-excuse-check.md` — Phase A prompt (non-Claude, reads evidence + transcript + diff)
- `batteries/prompts/vp-audit-repro.md` — Phase B prompt (non-Claude agent with full tools)
- `.kata/verification-evidence/{id}.audit.json` — audit report (Phase A verdict + Phase B run summary)
- `.kata/audit-notices/{issue}.md` — local notice, surfaced by \`kata status\` on next invocation
- GitHub issue comment via gh-axi when session is issue-backed
- \`kata.yaml\` — \`verify.audit.provider\` (default: gemini), \`verify.audit.required\` (bool, default false — advisory at first)

## Triggering

Runs as the final close step in verify mode (after evidence commit, before push). Fires even when overall verdict is \"pass\" — that's exactly when you want the second opinion.

## Tool parity for Phase B

Repro requires provider-agnostic tooling. Paths:

1. **MCP** (preferred) — audit agent connects to the same MCP servers the orchestrator uses. Agent-mail is already MCP. chrome-devtools-axi could expose MCP.
2. **Bash-only common denominator** — repro restricted to shell + curl + headless scripts. Works today without new infra.
3. **Per-step opt-out** — VP step can mark \`audit: none\` to skip repro if tooling genuinely doesn't exist for that provider.

Start with (2), grow into (1).

## State bleed

Orchestrator's run leaves dirty state (db rows, files, service calls). Repro hits that dirty state. Mitigations in priority order:

- Fresh git worktree per repro (cheap)
- Fresh fixture copy if \`eval-fixture\` declared (kata harness already does this)
- Dev-server restart between orchestrator and repro
- Document idempotency as a VP step authoring rule

## Notification format

On any Phase A flag or Phase B repro failure:

- Append to \`.kata/audit-notices/{issue}.md\` with step id, phase, finding, raw output excerpt
- If issue-backed: \`gh-axi issue comment {n}\` with summary + link to audit.json
- \`kata status\` surfaces unread audit notices as a banner

## Non-goals

- Hard-blocking close on audit failures (explicitly advisory — user decides)
- Per-step seal tier config (keep it uniform, simpler)
- Two-way integration with external CI (out of scope — local audit only)

## Relation to prior work

- Follows from stricter triage rules in commit c331312 (issue-less task TK-ef6e-0420)
- Sibling to #64 / PR #65 two-agent pattern — this is the verify-side equivalent
- Uses existing \`kata review --provider=gemini\` plumbing for Phase A
- Uses existing \`kata verify-run\` spawn pattern (adapted for non-Claude) for Phase B

## Open questions

1. Phase A required vs advisory on first ship? Recommendation: advisory, promote to required after a shakedown period.
2. Default provider — gemini or openai? Gemini is already referenced in codebase.
3. Audit retry policy on transient provider failures — N=2 retries, then skip with notice?
4. Should repro run in parallel with \`kata close\` or strictly after? Parallel hides latency but complicates push ordering.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

verify: post-audit via non-Claude agent (excuse-check + repro, notify-don't-block) #67

Problem

Design — Two-phase post-audit, non-Claude, notify-don't-block

Why this shape

Surface

Triggering

Tool parity for Phase B

State bleed

Notification format

Non-goals

Relation to prior work

Open questions

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

verify: post-audit via non-Claude agent (excuse-check + repro, notify-don't-block) #67

Description

Problem

Design — Two-phase post-audit, non-Claude, notify-don't-block

Why this shape

Surface

Triggering

Tool parity for Phase B

State bleed

Notification format

Non-goals

Relation to prior work

Open questions

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions