🤖 AI text below 🤖
From PR-6.75's cumulative adversarial review (LOW-6, follow-up recommendation).
The trigger-ordering parity sweep in crates/engine/src/game/triggers_ordering_parity_tests.rs carries the PR-6.75 soundness evidence: the under-prompt direction gate (decision_old && !decision_new → asserted-empty unexplained), the mutation-proven non-vacuity floors, and the DOCUMENTED_OVER_PROMPT ledger completeness asserts. All of that only runs at full strength under FORGE_TEST_FULL_DB=1 — default CI executes the fixture-subset sweep, whose floors are dormant (fixture counts sit below the full-DB minima by design).
Consequence: a future change that introduces a same-event/batch under-prompt or silently shrinks a classifier population would pass default CI and only be caught by a manual full-DB run.
Recommendation: add a scheduled (nightly) or release-gate CI job that runs
FORGE_TEST_FULL_DB=1 cargo test -p engine ordering_parity_sweep
against freshly generated card-data, treating any unexplained row, floor trip, or ledger-completeness failure as blocking. Corpus-churn tolerance is already built into the floors (measured −5%), so routine card-data drift should not flap the gate; genuine drift beyond tolerance is exactly what should surface for re-adjudication.
Refs: #5072 (PR-6.75, where the sweep + floors + ledger landed).
🤖 Generated with Claude Code
🤖 AI text below 🤖
From PR-6.75's cumulative adversarial review (LOW-6, follow-up recommendation).
The trigger-ordering parity sweep in
crates/engine/src/game/triggers_ordering_parity_tests.rscarries the PR-6.75 soundness evidence: the under-prompt direction gate (decision_old && !decision_new→ asserted-emptyunexplained), the mutation-proven non-vacuity floors, and theDOCUMENTED_OVER_PROMPTledger completeness asserts. All of that only runs at full strength underFORGE_TEST_FULL_DB=1— default CI executes the fixture-subset sweep, whose floors are dormant (fixture counts sit below the full-DB minima by design).Consequence: a future change that introduces a same-event/batch under-prompt or silently shrinks a classifier population would pass default CI and only be caught by a manual full-DB run.
Recommendation: add a scheduled (nightly) or release-gate CI job that runs
FORGE_TEST_FULL_DB=1 cargo test -p engine ordering_parity_sweepagainst freshly generated card-data, treating any
unexplainedrow, floor trip, or ledger-completeness failure as blocking. Corpus-churn tolerance is already built into the floors (measured −5%), so routine card-data drift should not flap the gate; genuine drift beyond tolerance is exactly what should surface for re-adjudication.Refs: #5072 (PR-6.75, where the sweep + floors + ledger landed).
🤖 Generated with Claude Code