boshu2 · boshu2 · Jun 6, 2026 · Jun 6, 2026 · Jun 6, 2026 · Jun 6, 2026
@@ -29,6 +29,7 @@ Or equivalently: `cd cli && make build && make test`
 - Prefer table-driven tests for multi-case functions.
 - Test low-level functions directly; don't depend on external CLIs (`bd`, `ao`) in tests.
 - **Prefer L2 integration tests** that call a command/workflow entry point over L1 tests that mock dependencies.
+- **Guard-test fixtures must use the real persisted shape.** Skip/dedup/consumed/idempotency/regression guard tests must build fixtures by round-tripping a real persisted sample (serialize with the production writer, read back with the production reader) or asserting against a checked-in real example — never a hand-built in-memory constructor that sets a marker at a granularity the on-disk format never produces (e.g. `consumed` at the item level when `next-work.jsonl` marks it at the batch level). A fixture of a shape production can't emit gives a false green (ag-mjlg / PR #652). Full rationale: `skills/standards/references/test-pyramid.md` → "Fixture Fidelity".
 
 ## Error Handling
 

@@ -943,7 +943,7 @@
     {
       "name": "pre-mortem",
       "source_skill": "skills/pre-mortem",
-      "source_hash": "54b97fbb1ff8722c5488e83004b16d4c6652678976544034d23c1959a415f62a",
+      "source_hash": "4ed46f30063d690c14dd30c7a52bbbb3f1d8d9c90ceebb89be305c3bdbd91598",
       "generated_hash": "f98f6d4ad93124bdbe08cf728f218626e59ae4538621b03351138f593ee17b42"
     },
     {
@@ -1093,7 +1093,7 @@
     {
       "name": "standards",
       "source_skill": "skills/standards",
-      "source_hash": "f9046b2822250f82670fd93ccdf60deb96a81bd1299150a5c74ddc052a05d1eb",
+      "source_hash": "561316a559d131d6ff9c141dbd2d4b160cf7330a774edf2150c31e864bae6a22",
       "generated_hash": "f6a7cadda9928b92bb1a59da3a34c6b9d5aa9132d5bc17a1bd770ebd3efc220c"
     },
     {

@@ -2,6 +2,6 @@
   "generator": "manual-maintained",
   "source_skill": "skills/pre-mortem",
   "layout": "modular",
-  "source_hash": "54b97fbb1ff8722c5488e83004b16d4c6652678976544034d23c1959a415f62a",
+  "source_hash": "4ed46f30063d690c14dd30c7a52bbbb3f1d8d9c90ceebb89be305c3bdbd91598",
   "generated_hash": "f98f6d4ad93124bdbe08cf728f218626e59ae4538621b03351138f593ee17b42"
 }
@@ -2,6 +2,6 @@
   "generator": "manual-maintained",
   "source_skill": "skills/standards",
   "layout": "modular",
-  "source_hash": "f9046b2822250f82670fd93ccdf60deb96a81bd1299150a5c74ddc052a05d1eb",
+  "source_hash": "561316a559d131d6ff9c141dbd2d4b160cf7330a774edf2150c31e864bae6a22",
   "generated_hash": "f6a7cadda9928b92bb1a59da3a34c6b9d5aa9132d5bc17a1bd770ebd3efc220c"
 }
@@ -87,6 +87,8 @@ Check each issue in the plan:
 | Does every feature/bug issue include L1 (unit) tests? | Yes | severity=significant: "Missing unit tests for feature/bug issue" |
 | Do cross-module changes include L2 (integration) tests? | Yes | severity=moderate: "Missing integration tests for cross-module change" |
 | Are L4+ levels deferred to human gate (not agent-planned)? | Yes | severity=low: "Agent planning L4+ tests — these require human-defined scenarios" |
+| For any skip/dedup/consumed/idempotency/regression guard test, does the fixture round-trip the **real persisted shape** (not a hand-built in-memory constructor)? | Yes | severity=significant: "Guard-test fixture uses a shape production never emits — false-green risk (cf. ag-mjlg / PR #652)" |
+| Does any guard marker (`consumed`/`skip`/`dedup`) get set at the granularity the on-disk artifact uses (batch/parent/envelope vs item)? | Yes | severity=significant: "Guard marker set at item-level when persisted artifact marks it at batch-level" |
 
 Add to each judge's prompt when test pyramid check is active:
 

@@ -264,6 +264,7 @@ func TestClassifyServeArg(t *testing.T) {
 - **Assert exact expected values:** Use `== expected`, never `!= wrong`. (See Exact Assertion Rule above.)
 - **Table-driven tests** preferred for multi-case functions. (See example above.)
 - **Test low-level functions directly;** don't depend on external CLIs (`bd`, `ao`) in tests. (See CI-Safe Test Pattern above.)
+- **Guard-test fixtures must use the real persisted shape.** Skip/dedup/consumed/idempotency/regression guard tests must round-trip a real persisted sample (production writer → production reader) or assert against a checked-in real example — never a hand-built in-memory constructor that sets a marker at a granularity the on-disk format never emits (e.g. `consumed` at item-level when `next-work.jsonl` marks it at batch-level). A fixture of a shape production can't produce gives a false green (ag-mjlg / PR #652). Full rationale: `test-pyramid.md` → "Fixture Fidelity".
 
 ### Benchmark Tests (BF7)
 

@@ -407,6 +407,21 @@ After L0–L3 coverage is complete, run bug-finding levels:
 | `/plan` (format changes) | Flag BF8 backward compat — add old format as fixture before changing |
 | `/pre-mortem` (security) | Verify BF9 tests planned for code handling secrets or user input |
 
+## Fixture Fidelity — guard tests must use the real persisted shape
+
+> **Proven 2026-05-31 (ag-mjlg / PR #652):** a next-work materialize over-creation bug (44 beads planned vs 16 expected) shipped green because `TestNextWorkMaterialize_SkipsConsumed` set `consumed` at the **per-item** level while the real `next-work.jsonl` marks `consumed` at the **batch** level. The guard test exercised a data shape production never emits, so CI passed on a fixture that could not catch the bug.
+
+**The standard:** regression, idempotency, skip, dedup, and consumed/already-done guard tests MUST build their fixtures from the **real persisted data shape** — round-trip an actual on-disk sample (write → read back through the production serializer/parser), or assert against a checked-in real example. Do NOT hand-construct the in-memory struct via a convenient constructor when production reads the value from disk: the constructor lets you set fields at a granularity (per-item vs batch-level, flat vs nested) that the persisted format never produces, giving a false green.
+
+**Why it bites agents specifically:** an agent writing a guard test reaches for the cheapest fixture — the in-memory constructor — because it compiles fastest and reads cleanest. That fixture silently diverges from the persisted shape, and the test then "passes" by asserting on a state the system can't reach. The skip/dedup logic under test reads the persisted shape, so the test proves nothing about the real path.
+
+**How to comply:**
+- Round-trip the fixture: serialize a sample with the production writer, read it back with the production reader, then assert. If that path doesn't exist yet, write the sample to a temp file in the real on-disk format and load it through the production loader.
+- Match the marker granularity exactly: if production records `consumed`/`skip`/`dedup` state at the batch (or parent, or envelope) level, set it there — never only at the item level.
+- Prefer a checked-in real sample (a trimmed copy of an actual artifact) over a synthetic one for the canonical happy/skip cases.
+
+**Pre-mortem / checklist flag:** when reviewing a plan or diff that adds a skip/dedup/consumed/idempotency guard test, FLAG any fixture built only from an in-memory constructor or that sets the guard marker at item-level when the persisted artifact marks it at batch-level. Require the fixture to round-trip the real persisted shape before the test counts as coverage. (See `.claude/rules/go.md` → "Guard-test fixtures must use the real persisted shape.")
+
 ## Coverage Assessment Template
 
 Used by `/post-mortem` and `/vibe` to assess test shape health: