Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/rules/go.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ Or equivalently: `cd cli && make build && make test`
- Prefer table-driven tests for multi-case functions.
- Test low-level functions directly; don't depend on external CLIs (`bd`, `ao`) in tests.
- **Prefer L2 integration tests** that call a command/workflow entry point over L1 tests that mock dependencies.
- **Guard-test fixtures must use the real persisted shape.** Skip/dedup/consumed/idempotency/regression guard tests must build fixtures by round-tripping a real persisted sample (serialize with the production writer, read back with the production reader) or asserting against a checked-in real example — never a hand-built in-memory constructor that sets a marker at a granularity the on-disk format never produces (e.g. `consumed` at the item level when `next-work.jsonl` marks it at the batch level). A fixture of a shape production can't emit gives a false green (ag-mjlg / PR #652). Full rationale: `skills/standards/references/test-pyramid.md` → "Fixture Fidelity".

## Error Handling

Expand Down
4 changes: 2 additions & 2 deletions skills-codex/.agentops-manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -943,7 +943,7 @@
{
"name": "pre-mortem",
"source_skill": "skills/pre-mortem",
"source_hash": "54b97fbb1ff8722c5488e83004b16d4c6652678976544034d23c1959a415f62a",
"source_hash": "4ed46f30063d690c14dd30c7a52bbbb3f1d8d9c90ceebb89be305c3bdbd91598",
"generated_hash": "f98f6d4ad93124bdbe08cf728f218626e59ae4538621b03351138f593ee17b42"
},
{
Expand Down Expand Up @@ -1093,7 +1093,7 @@
{
"name": "standards",
"source_skill": "skills/standards",
"source_hash": "f9046b2822250f82670fd93ccdf60deb96a81bd1299150a5c74ddc052a05d1eb",
"source_hash": "561316a559d131d6ff9c141dbd2d4b160cf7330a774edf2150c31e864bae6a22",
"generated_hash": "f6a7cadda9928b92bb1a59da3a34c6b9d5aa9132d5bc17a1bd770ebd3efc220c"
},
{
Expand Down
2 changes: 1 addition & 1 deletion skills-codex/pre-mortem/.agentops-generated.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
"generator": "manual-maintained",
"source_skill": "skills/pre-mortem",
"layout": "modular",
"source_hash": "54b97fbb1ff8722c5488e83004b16d4c6652678976544034d23c1959a415f62a",
"source_hash": "4ed46f30063d690c14dd30c7a52bbbb3f1d8d9c90ceebb89be305c3bdbd91598",
"generated_hash": "f98f6d4ad93124bdbe08cf728f218626e59ae4538621b03351138f593ee17b42"
}
2 changes: 1 addition & 1 deletion skills-codex/standards/.agentops-generated.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
"generator": "manual-maintained",
"source_skill": "skills/standards",
"layout": "modular",
"source_hash": "f9046b2822250f82670fd93ccdf60deb96a81bd1299150a5c74ddc052a05d1eb",
"source_hash": "561316a559d131d6ff9c141dbd2d4b160cf7330a774edf2150c31e864bae6a22",
"generated_hash": "f6a7cadda9928b92bb1a59da3a34c6b9d5aa9132d5bc17a1bd770ebd3efc220c"
}
2 changes: 2 additions & 0 deletions skills/pre-mortem/references/mandatory-checks.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,8 @@ Check each issue in the plan:
| Does every feature/bug issue include L1 (unit) tests? | Yes | severity=significant: "Missing unit tests for feature/bug issue" |
| Do cross-module changes include L2 (integration) tests? | Yes | severity=moderate: "Missing integration tests for cross-module change" |
| Are L4+ levels deferred to human gate (not agent-planned)? | Yes | severity=low: "Agent planning L4+ tests — these require human-defined scenarios" |
| For any skip/dedup/consumed/idempotency/regression guard test, does the fixture round-trip the **real persisted shape** (not a hand-built in-memory constructor)? | Yes | severity=significant: "Guard-test fixture uses a shape production never emits — false-green risk (cf. ag-mjlg / PR #652)" |
| Does any guard marker (`consumed`/`skip`/`dedup`) get set at the granularity the on-disk artifact uses (batch/parent/envelope vs item)? | Yes | severity=significant: "Guard marker set at item-level when persisted artifact marks it at batch-level" |

Add to each judge's prompt when test pyramid check is active:

Expand Down
1 change: 1 addition & 0 deletions skills/standards/references/go.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,7 @@ func TestClassifyServeArg(t *testing.T) {
- **Assert exact expected values:** Use `== expected`, never `!= wrong`. (See Exact Assertion Rule above.)
- **Table-driven tests** preferred for multi-case functions. (See example above.)
- **Test low-level functions directly;** don't depend on external CLIs (`bd`, `ao`) in tests. (See CI-Safe Test Pattern above.)
- **Guard-test fixtures must use the real persisted shape.** Skip/dedup/consumed/idempotency/regression guard tests must round-trip a real persisted sample (production writer → production reader) or assert against a checked-in real example — never a hand-built in-memory constructor that sets a marker at a granularity the on-disk format never emits (e.g. `consumed` at item-level when `next-work.jsonl` marks it at batch-level). A fixture of a shape production can't produce gives a false green (ag-mjlg / PR #652). Full rationale: `test-pyramid.md` → "Fixture Fidelity".

### Benchmark Tests (BF7)

Expand Down
15 changes: 15 additions & 0 deletions skills/standards/references/test-pyramid.md
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,21 @@ After L0–L3 coverage is complete, run bug-finding levels:
| `/plan` (format changes) | Flag BF8 backward compat — add old format as fixture before changing |
| `/pre-mortem` (security) | Verify BF9 tests planned for code handling secrets or user input |

## Fixture Fidelity — guard tests must use the real persisted shape

> **Proven 2026-05-31 (ag-mjlg / PR #652):** a next-work materialize over-creation bug (44 beads planned vs 16 expected) shipped green because `TestNextWorkMaterialize_SkipsConsumed` set `consumed` at the **per-item** level while the real `next-work.jsonl` marks `consumed` at the **batch** level. The guard test exercised a data shape production never emits, so CI passed on a fixture that could not catch the bug.

**The standard:** regression, idempotency, skip, dedup, and consumed/already-done guard tests MUST build their fixtures from the **real persisted data shape** — round-trip an actual on-disk sample (write → read back through the production serializer/parser), or assert against a checked-in real example. Do NOT hand-construct the in-memory struct via a convenient constructor when production reads the value from disk: the constructor lets you set fields at a granularity (per-item vs batch-level, flat vs nested) that the persisted format never produces, giving a false green.

**Why it bites agents specifically:** an agent writing a guard test reaches for the cheapest fixture — the in-memory constructor — because it compiles fastest and reads cleanest. That fixture silently diverges from the persisted shape, and the test then "passes" by asserting on a state the system can't reach. The skip/dedup logic under test reads the persisted shape, so the test proves nothing about the real path.

**How to comply:**
- Round-trip the fixture: serialize a sample with the production writer, read it back with the production reader, then assert. If that path doesn't exist yet, write the sample to a temp file in the real on-disk format and load it through the production loader.
- Match the marker granularity exactly: if production records `consumed`/`skip`/`dedup` state at the batch (or parent, or envelope) level, set it there — never only at the item level.
- Prefer a checked-in real sample (a trimmed copy of an actual artifact) over a synthetic one for the canonical happy/skip cases.

**Pre-mortem / checklist flag:** when reviewing a plan or diff that adds a skip/dedup/consumed/idempotency guard test, FLAG any fixture built only from an in-memory constructor or that sets the guard marker at item-level when the persisted artifact marks it at batch-level. Require the fixture to round-trip the real persisted shape before the test counts as coverage. (See `.claude/rules/go.md` → "Guard-test fixtures must use the real persisted shape.")

## Coverage Assessment Template

Used by `/post-mortem` and `/vibe` to assess test shape health:
Expand Down
Loading