From 78ffbef6b3c7cb4cdfb68004da81d8d1207efefa Mon Sep 17 00:00:00 2001 From: Boden Fuller Date: Sat, 6 Jun 2026 03:29:29 -0400 Subject: [PATCH 1/2] chore(standards): require guard-test fixtures to use the real persisted shape (ag-jwtx #test-standard-names-real-shape-requirement) Skip/dedup/consumed/idempotency/regression guard tests must build fixtures by round-tripping a real persisted sample (production writer -> reader) or asserting against a checked-in real example -- never a hand-built in-memory constructor that sets a marker at a granularity the on-disk format never emits. Encodes the ag-mjlg / PR #652 false-green: a guard test set `consumed` at item-level while next-work.jsonl marks it at batch-level, so CI passed on a shape production can't produce. - skills/standards/references/test-pyramid.md: new "Fixture Fidelity" section - skills/standards/references/go.md + .claude/rules/go.md: Test-Conventions bullet - skills/pre-mortem/references/mandatory-checks.md: two flags in the Test Pyramid Coverage Check (Mandatory) Closes-scenario: ag-jwtx#test-standard-names-real-shape-requirement Bounded-context: BC1-Corpus Evidence: skills/standards/references/test-pyramid.md --- .claude/rules/go.md | 1 + skills/pre-mortem/references/mandatory-checks.md | 2 ++ skills/standards/references/go.md | 1 + skills/standards/references/test-pyramid.md | 15 +++++++++++++++ 4 files changed, 19 insertions(+) diff --git a/.claude/rules/go.md b/.claude/rules/go.md index 6d44843c0..62f732bbb 100644 --- a/.claude/rules/go.md +++ b/.claude/rules/go.md @@ -29,6 +29,7 @@ Or equivalently: `cd cli && make build && make test` - Prefer table-driven tests for multi-case functions. - Test low-level functions directly; don't depend on external CLIs (`bd`, `ao`) in tests. - **Prefer L2 integration tests** that call a command/workflow entry point over L1 tests that mock dependencies. +- **Guard-test fixtures must use the real persisted shape.** Skip/dedup/consumed/idempotency/regression guard tests must build fixtures by round-tripping a real persisted sample (serialize with the production writer, read back with the production reader) or asserting against a checked-in real example — never a hand-built in-memory constructor that sets a marker at a granularity the on-disk format never produces (e.g. `consumed` at the item level when `next-work.jsonl` marks it at the batch level). A fixture of a shape production can't emit gives a false green (ag-mjlg / PR #652). Full rationale: `skills/standards/references/test-pyramid.md` → "Fixture Fidelity". ## Error Handling diff --git a/skills/pre-mortem/references/mandatory-checks.md b/skills/pre-mortem/references/mandatory-checks.md index 677ed8411..db26dd864 100644 --- a/skills/pre-mortem/references/mandatory-checks.md +++ b/skills/pre-mortem/references/mandatory-checks.md @@ -87,6 +87,8 @@ Check each issue in the plan: | Does every feature/bug issue include L1 (unit) tests? | Yes | severity=significant: "Missing unit tests for feature/bug issue" | | Do cross-module changes include L2 (integration) tests? | Yes | severity=moderate: "Missing integration tests for cross-module change" | | Are L4+ levels deferred to human gate (not agent-planned)? | Yes | severity=low: "Agent planning L4+ tests — these require human-defined scenarios" | +| For any skip/dedup/consumed/idempotency/regression guard test, does the fixture round-trip the **real persisted shape** (not a hand-built in-memory constructor)? | Yes | severity=significant: "Guard-test fixture uses a shape production never emits — false-green risk (cf. ag-mjlg / PR #652)" | +| Does any guard marker (`consumed`/`skip`/`dedup`) get set at the granularity the on-disk artifact uses (batch/parent/envelope vs item)? | Yes | severity=significant: "Guard marker set at item-level when persisted artifact marks it at batch-level" | Add to each judge's prompt when test pyramid check is active: diff --git a/skills/standards/references/go.md b/skills/standards/references/go.md index 450ab2378..da71e43af 100644 --- a/skills/standards/references/go.md +++ b/skills/standards/references/go.md @@ -264,6 +264,7 @@ func TestClassifyServeArg(t *testing.T) { - **Assert exact expected values:** Use `== expected`, never `!= wrong`. (See Exact Assertion Rule above.) - **Table-driven tests** preferred for multi-case functions. (See example above.) - **Test low-level functions directly;** don't depend on external CLIs (`bd`, `ao`) in tests. (See CI-Safe Test Pattern above.) +- **Guard-test fixtures must use the real persisted shape.** Skip/dedup/consumed/idempotency/regression guard tests must round-trip a real persisted sample (production writer → production reader) or assert against a checked-in real example — never a hand-built in-memory constructor that sets a marker at a granularity the on-disk format never emits (e.g. `consumed` at item-level when `next-work.jsonl` marks it at batch-level). A fixture of a shape production can't produce gives a false green (ag-mjlg / PR #652). Full rationale: `test-pyramid.md` → "Fixture Fidelity". ### Benchmark Tests (BF7) diff --git a/skills/standards/references/test-pyramid.md b/skills/standards/references/test-pyramid.md index d967cf5ab..f0deeaf3f 100644 --- a/skills/standards/references/test-pyramid.md +++ b/skills/standards/references/test-pyramid.md @@ -407,6 +407,21 @@ After L0–L3 coverage is complete, run bug-finding levels: | `/plan` (format changes) | Flag BF8 backward compat — add old format as fixture before changing | | `/pre-mortem` (security) | Verify BF9 tests planned for code handling secrets or user input | +## Fixture Fidelity — guard tests must use the real persisted shape + +> **Proven 2026-05-31 (ag-mjlg / PR #652):** a next-work materialize over-creation bug (44 beads planned vs 16 expected) shipped green because `TestNextWorkMaterialize_SkipsConsumed` set `consumed` at the **per-item** level while the real `next-work.jsonl` marks `consumed` at the **batch** level. The guard test exercised a data shape production never emits, so CI passed on a fixture that could not catch the bug. + +**The standard:** regression, idempotency, skip, dedup, and consumed/already-done guard tests MUST build their fixtures from the **real persisted data shape** — round-trip an actual on-disk sample (write → read back through the production serializer/parser), or assert against a checked-in real example. Do NOT hand-construct the in-memory struct via a convenient constructor when production reads the value from disk: the constructor lets you set fields at a granularity (per-item vs batch-level, flat vs nested) that the persisted format never produces, giving a false green. + +**Why it bites agents specifically:** an agent writing a guard test reaches for the cheapest fixture — the in-memory constructor — because it compiles fastest and reads cleanest. That fixture silently diverges from the persisted shape, and the test then "passes" by asserting on a state the system can't reach. The skip/dedup logic under test reads the persisted shape, so the test proves nothing about the real path. + +**How to comply:** +- Round-trip the fixture: serialize a sample with the production writer, read it back with the production reader, then assert. If that path doesn't exist yet, write the sample to a temp file in the real on-disk format and load it through the production loader. +- Match the marker granularity exactly: if production records `consumed`/`skip`/`dedup` state at the batch (or parent, or envelope) level, set it there — never only at the item level. +- Prefer a checked-in real sample (a trimmed copy of an actual artifact) over a synthetic one for the canonical happy/skip cases. + +**Pre-mortem / checklist flag:** when reviewing a plan or diff that adds a skip/dedup/consumed/idempotency guard test, FLAG any fixture built only from an in-memory constructor or that sets the guard marker at item-level when the persisted artifact marks it at batch-level. Require the fixture to round-trip the real persisted shape before the test counts as coverage. (See `.claude/rules/go.md` → "Guard-test fixtures must use the real persisted shape.") + ## Coverage Assessment Template Used by `/post-mortem` and `/vibe` to assess test shape health: From aed6b6899333d59a65f4a75b46ec68f4288c0ab5 Mon Sep 17 00:00:00 2001 From: Bo Date: Sat, 6 Jun 2026 08:26:22 -0400 Subject: [PATCH 2/2] chore(standards): regen codex twin hashes for pre-mortem + standards (ag-jwtx) The references edits in this PR (pre-mortem/references/mandatory-checks.md, standards/references/{go,test-pyramid}.md) drifted the codex twin hashes for both skills. Regenerated via scripts/regen-codex-hashes.sh --only pre-mortem,standards. Fixes the skill-gates 'codex hashes (no drift)' check. Bounded-context: BC2-Skills --- skills-codex/.agentops-manifest.json | 4 ++-- skills-codex/pre-mortem/.agentops-generated.json | 2 +- skills-codex/standards/.agentops-generated.json | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/skills-codex/.agentops-manifest.json b/skills-codex/.agentops-manifest.json index 56d20df8c..0702e9b51 100644 --- a/skills-codex/.agentops-manifest.json +++ b/skills-codex/.agentops-manifest.json @@ -943,7 +943,7 @@ { "name": "pre-mortem", "source_skill": "skills/pre-mortem", - "source_hash": "54b97fbb1ff8722c5488e83004b16d4c6652678976544034d23c1959a415f62a", + "source_hash": "4ed46f30063d690c14dd30c7a52bbbb3f1d8d9c90ceebb89be305c3bdbd91598", "generated_hash": "f98f6d4ad93124bdbe08cf728f218626e59ae4538621b03351138f593ee17b42" }, { @@ -1093,7 +1093,7 @@ { "name": "standards", "source_skill": "skills/standards", - "source_hash": "f9046b2822250f82670fd93ccdf60deb96a81bd1299150a5c74ddc052a05d1eb", + "source_hash": "561316a559d131d6ff9c141dbd2d4b160cf7330a774edf2150c31e864bae6a22", "generated_hash": "f6a7cadda9928b92bb1a59da3a34c6b9d5aa9132d5bc17a1bd770ebd3efc220c" }, { diff --git a/skills-codex/pre-mortem/.agentops-generated.json b/skills-codex/pre-mortem/.agentops-generated.json index 98244f58f..4e67ed019 100644 --- a/skills-codex/pre-mortem/.agentops-generated.json +++ b/skills-codex/pre-mortem/.agentops-generated.json @@ -2,6 +2,6 @@ "generator": "manual-maintained", "source_skill": "skills/pre-mortem", "layout": "modular", - "source_hash": "54b97fbb1ff8722c5488e83004b16d4c6652678976544034d23c1959a415f62a", + "source_hash": "4ed46f30063d690c14dd30c7a52bbbb3f1d8d9c90ceebb89be305c3bdbd91598", "generated_hash": "f98f6d4ad93124bdbe08cf728f218626e59ae4538621b03351138f593ee17b42" } diff --git a/skills-codex/standards/.agentops-generated.json b/skills-codex/standards/.agentops-generated.json index 8088b8a8c..1433a55d0 100644 --- a/skills-codex/standards/.agentops-generated.json +++ b/skills-codex/standards/.agentops-generated.json @@ -2,6 +2,6 @@ "generator": "manual-maintained", "source_skill": "skills/standards", "layout": "modular", - "source_hash": "f9046b2822250f82670fd93ccdf60deb96a81bd1299150a5c74ddc052a05d1eb", + "source_hash": "561316a559d131d6ff9c141dbd2d4b160cf7330a774edf2150c31e864bae6a22", "generated_hash": "f6a7cadda9928b92bb1a59da3a34c6b9d5aa9132d5bc17a1bd770ebd3efc220c" }