From 78ffbef6b3c7cb4cdfb68004da81d8d1207efefa Mon Sep 17 00:00:00 2001
From: Boden Fuller <boden.fuller@gmail.com>
Date: Sat, 6 Jun 2026 03:29:29 -0400
Subject: [PATCH 1/2] chore(standards): require guard-test fixtures to use the
 real persisted shape (ag-jwtx #test-standard-names-real-shape-requirement)

Skip/dedup/consumed/idempotency/regression guard tests must build fixtures
by round-tripping a real persisted sample (production writer -> reader) or
asserting against a checked-in real example -- never a hand-built in-memory
constructor that sets a marker at a granularity the on-disk format never
emits. Encodes the ag-mjlg / PR #652 false-green: a guard test set `consumed`
at item-level while next-work.jsonl marks it at batch-level, so CI passed on
a shape production can't produce.

- skills/standards/references/test-pyramid.md: new "Fixture Fidelity" section
- skills/standards/references/go.md + .claude/rules/go.md: Test-Conventions bullet
- skills/pre-mortem/references/mandatory-checks.md: two flags in the
  Test Pyramid Coverage Check (Mandatory)

Closes-scenario: ag-jwtx#test-standard-names-real-shape-requirement
Bounded-context: BC1-Corpus
Evidence: skills/standards/references/test-pyramid.md
---
 .claude/rules/go.md                              |  1 +
 skills/pre-mortem/references/mandatory-checks.md |  2 ++
 skills/standards/references/go.md                |  1 +
 skills/standards/references/test-pyramid.md      | 15 +++++++++++++++
 4 files changed, 19 insertions(+)

diff --git a/.claude/rules/go.md b/.claude/rules/go.md
index 6d44843c0..62f732bbb 100644
--- a/.claude/rules/go.md
+++ b/.claude/rules/go.md
@@ -29,6 +29,7 @@ Or equivalently: `cd cli && make build && make test`
 - Prefer table-driven tests for multi-case functions.
 - Test low-level functions directly; don't depend on external CLIs (`bd`, `ao`) in tests.
 - **Prefer L2 integration tests** that call a command/workflow entry point over L1 tests that mock dependencies.
+- **Guard-test fixtures must use the real persisted shape.** Skip/dedup/consumed/idempotency/regression guard tests must build fixtures by round-tripping a real persisted sample (serialize with the production writer, read back with the production reader) or asserting against a checked-in real example — never a hand-built in-memory constructor that sets a marker at a granularity the on-disk format never produces (e.g. `consumed` at the item level when `next-work.jsonl` marks it at the batch level). A fixture of a shape production can't emit gives a false green (ag-mjlg / PR #652). Full rationale: `skills/standards/references/test-pyramid.md` → "Fixture Fidelity".
 
 ## Error Handling
 
diff --git a/skills/pre-mortem/references/mandatory-checks.md b/skills/pre-mortem/references/mandatory-checks.md
index 677ed8411..db26dd864 100644
--- a/skills/pre-mortem/references/mandatory-checks.md
+++ b/skills/pre-mortem/references/mandatory-checks.md
@@ -87,6 +87,8 @@ Check each issue in the plan:
 | Does every feature/bug issue include L1 (unit) tests? | Yes | severity=significant: "Missing unit tests for feature/bug issue" |
 | Do cross-module changes include L2 (integration) tests? | Yes | severity=moderate: "Missing integration tests for cross-module change" |
 | Are L4+ levels deferred to human gate (not agent-planned)? | Yes | severity=low: "Agent planning L4+ tests — these require human-defined scenarios" |
+| For any skip/dedup/consumed/idempotency/regression guard test, does the fixture round-trip the **real persisted shape** (not a hand-built in-memory constructor)? | Yes | severity=significant: "Guard-test fixture uses a shape production never emits — false-green risk (cf. ag-mjlg / PR #652)" |
+| Does any guard marker (`consumed`/`skip`/`dedup`) get set at the granularity the on-disk artifact uses (batch/parent/envelope vs item)? | Yes | severity=significant: "Guard marker set at item-level when persisted artifact marks it at batch-level" |
 
 Add to each judge's prompt when test pyramid check is active:
 
diff --git a/skills/standards/references/go.md b/skills/standards/references/go.md
index 450ab2378..da71e43af 100644
--- a/skills/standards/references/go.md
+++ b/skills/standards/references/go.md
@@ -264,6 +264,7 @@ func TestClassifyServeArg(t *testing.T) {
 - **Assert exact expected values:** Use `== expected`, never `!= wrong`. (See Exact Assertion Rule above.)
 - **Table-driven tests** preferred for multi-case functions. (See example above.)
 - **Test low-level functions directly;** don't depend on external CLIs (`bd`, `ao`) in tests. (See CI-Safe Test Pattern above.)
+- **Guard-test fixtures must use the real persisted shape.** Skip/dedup/consumed/idempotency/regression guard tests must round-trip a real persisted sample (production writer → production reader) or assert against a checked-in real example — never a hand-built in-memory constructor that sets a marker at a granularity the on-disk format never emits (e.g. `consumed` at item-level when `next-work.jsonl` marks it at batch-level). A fixture of a shape production can't produce gives a false green (ag-mjlg / PR #652). Full rationale: `test-pyramid.md` → "Fixture Fidelity".
 
 ### Benchmark Tests (BF7)
 
diff --git a/skills/standards/references/test-pyramid.md b/skills/standards/references/test-pyramid.md
index d967cf5ab..f0deeaf3f 100644
--- a/skills/standards/references/test-pyramid.md
+++ b/skills/standards/references/test-pyramid.md
@@ -407,6 +407,21 @@ After L0–L3 coverage is complete, run bug-finding levels:
 | `/plan` (format changes) | Flag BF8 backward compat — add old format as fixture before changing |
 | `/pre-mortem` (security) | Verify BF9 tests planned for code handling secrets or user input |
 
+## Fixture Fidelity — guard tests must use the real persisted shape
+
+> **Proven 2026-05-31 (ag-mjlg / PR #652):** a next-work materialize over-creation bug (44 beads planned vs 16 expected) shipped green because `TestNextWorkMaterialize_SkipsConsumed` set `consumed` at the **per-item** level while the real `next-work.jsonl` marks `consumed` at the **batch** level. The guard test exercised a data shape production never emits, so CI passed on a fixture that could not catch the bug.
+
+**The standard:** regression, idempotency, skip, dedup, and consumed/already-done guard tests MUST build their fixtures from the **real persisted data shape** — round-trip an actual on-disk sample (write → read back through the production serializer/parser), or assert against a checked-in real example. Do NOT hand-construct the in-memory struct via a convenient constructor when production reads the value from disk: the constructor lets you set fields at a granularity (per-item vs batch-level, flat vs nested) that the persisted format never produces, giving a false green.
+
+**Why it bites agents specifically:** an agent writing a guard test reaches for the cheapest fixture — the in-memory constructor — because it compiles fastest and reads cleanest. That fixture silently diverges from the persisted shape, and the test then "passes" by asserting on a state the system can't reach. The skip/dedup logic under test reads the persisted shape, so the test proves nothing about the real path.
+
+**How to comply:**
+- Round-trip the fixture: serialize a sample with the production writer, read it back with the production reader, then assert. If that path doesn't exist yet, write the sample to a temp file in the real on-disk format and load it through the production loader.
+- Match the marker granularity exactly: if production records `consumed`/`skip`/`dedup` state at the batch (or parent, or envelope) level, set it there — never only at the item level.
+- Prefer a checked-in real sample (a trimmed copy of an actual artifact) over a synthetic one for the canonical happy/skip cases.
+
+**Pre-mortem / checklist flag:** when reviewing a plan or diff that adds a skip/dedup/consumed/idempotency guard test, FLAG any fixture built only from an in-memory constructor or that sets the guard marker at item-level when the persisted artifact marks it at batch-level. Require the fixture to round-trip the real persisted shape before the test counts as coverage. (See `.claude/rules/go.md` → "Guard-test fixtures must use the real persisted shape.")
+
 ## Coverage Assessment Template
 
 Used by `/post-mortem` and `/vibe` to assess test shape health:

From aed6b6899333d59a65f4a75b46ec68f4288c0ab5 Mon Sep 17 00:00:00 2001
From: Bo <boden.fuller@gmail.com>
Date: Sat, 6 Jun 2026 08:26:22 -0400
Subject: [PATCH 2/2] chore(standards): regen codex twin hashes for pre-mortem
 + standards (ag-jwtx)

The references edits in this PR (pre-mortem/references/mandatory-checks.md,
standards/references/{go,test-pyramid}.md) drifted the codex twin hashes
for both skills. Regenerated via scripts/regen-codex-hashes.sh --only
pre-mortem,standards. Fixes the skill-gates 'codex hashes (no drift)' check.

Bounded-context: BC2-Skills
---
 skills-codex/.agentops-manifest.json             | 4 ++--
 skills-codex/pre-mortem/.agentops-generated.json | 2 +-
 skills-codex/standards/.agentops-generated.json  | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/skills-codex/.agentops-manifest.json b/skills-codex/.agentops-manifest.json
index 56d20df8c..0702e9b51 100644
--- a/skills-codex/.agentops-manifest.json
+++ b/skills-codex/.agentops-manifest.json
@@ -943,7 +943,7 @@
     {
       "name": "pre-mortem",
       "source_skill": "skills/pre-mortem",
-      "source_hash": "54b97fbb1ff8722c5488e83004b16d4c6652678976544034d23c1959a415f62a",
+      "source_hash": "4ed46f30063d690c14dd30c7a52bbbb3f1d8d9c90ceebb89be305c3bdbd91598",
       "generated_hash": "f98f6d4ad93124bdbe08cf728f218626e59ae4538621b03351138f593ee17b42"
     },
     {
@@ -1093,7 +1093,7 @@
     {
       "name": "standards",
       "source_skill": "skills/standards",
-      "source_hash": "f9046b2822250f82670fd93ccdf60deb96a81bd1299150a5c74ddc052a05d1eb",
+      "source_hash": "561316a559d131d6ff9c141dbd2d4b160cf7330a774edf2150c31e864bae6a22",
       "generated_hash": "f6a7cadda9928b92bb1a59da3a34c6b9d5aa9132d5bc17a1bd770ebd3efc220c"
     },
     {
diff --git a/skills-codex/pre-mortem/.agentops-generated.json b/skills-codex/pre-mortem/.agentops-generated.json
index 98244f58f..4e67ed019 100644
--- a/skills-codex/pre-mortem/.agentops-generated.json
+++ b/skills-codex/pre-mortem/.agentops-generated.json
@@ -2,6 +2,6 @@
   "generator": "manual-maintained",
   "source_skill": "skills/pre-mortem",
   "layout": "modular",
-  "source_hash": "54b97fbb1ff8722c5488e83004b16d4c6652678976544034d23c1959a415f62a",
+  "source_hash": "4ed46f30063d690c14dd30c7a52bbbb3f1d8d9c90ceebb89be305c3bdbd91598",
   "generated_hash": "f98f6d4ad93124bdbe08cf728f218626e59ae4538621b03351138f593ee17b42"
 }
diff --git a/skills-codex/standards/.agentops-generated.json b/skills-codex/standards/.agentops-generated.json
index 8088b8a8c..1433a55d0 100644
--- a/skills-codex/standards/.agentops-generated.json
+++ b/skills-codex/standards/.agentops-generated.json
@@ -2,6 +2,6 @@
   "generator": "manual-maintained",
   "source_skill": "skills/standards",
   "layout": "modular",
-  "source_hash": "f9046b2822250f82670fd93ccdf60deb96a81bd1299150a5c74ddc052a05d1eb",
+  "source_hash": "561316a559d131d6ff9c141dbd2d4b160cf7330a774edf2150c31e864bae6a22",
   "generated_hash": "f6a7cadda9928b92bb1a59da3a34c6b9d5aa9132d5bc17a1bd770ebd3efc220c"
 }