feat: rfc-0016 phase 6 — soak + drift detection + class proposals (AISDLC-284)#524
Draft
deefactorial wants to merge 3 commits into
Draft
Conversation
…SDLC-284) Closes the calibration loop for RFC-0016. Implements all 6 ACs corpus-driven (not calendar-gated) per maintainer directive 2026-05-01: - bias-drift.ts: EstimateBiasOverCorrected event when overall mean miss >0 and last >=3 consecutive records flip to <=0 (AC#1) - digest.ts + cli-estimate digest: weekly calibration digest with per-class stats including stageACoverageRate and promotionReady flag (AC#2, AC#3) - class-proposals.ts + cli-estimate-classes review/promote: class proposal management with auto-promotion at >=3 same-shape proposals (AC#4, AC#5) - docs/operations/estimation-promotion.md: promotion criteria documented (AC#6) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-generated by .husky/pre-push (scripts/check-task-moved.sh). Task file(s) moved from backlog/tasks/ to backlog/completed/. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…y (AISDLC-284) Two reviewer-flagged major bugs: 1. class-proposals.ts parseClassesYaml was a stub that only extracted class names, setting empty structures. On repeated autoPromote calls (promote class A, then promote class B), the second YAML write serialised the previously-promoted classes with blank definitions/exemplars/synonyms. Fix: replaced the stub with a full state-machine parser that recovers the complete per-class structure (definition + exemplars + anti_patterns + synonyms) from the controlled YAML format emitted by serializeClassesYaml. Added a regression test: promote docs-rewrite then promote infra-rebuild; verify docs-rewrite structure is intact in the resulting YAML. 2. bias-drift.ts JSDoc promised idempotency but no mechanism existed. detectBiasDrift unconditionally called writeEvent on every invocation with overCorrected=true, producing duplicate EstimateBiasOverCorrected entries in events.jsonl when the digest and manual CLI calls ran against the same data. Fix: compute a windowSignature (SHA-256 of sorted taskId@ts tuples of the tail window), scan _orchestrator/events-*.jsonl before emitting, and short- circuit with alreadyEmitted=true when a matching event exists. Added alreadyEmitted + windowSignature fields to DriftCheckResult and two tests covering idempotency and signature change on window extension. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements RFC-0016 Phase 6 — the calibration-loop-closure phase. All 6 acceptance criteria are met corpus-driven (NOT calendar-gated) per maintainer directive 2026-05-01.
What shipped:
bias-drift.ts—detectBiasDrift()scans calibration records and emitsEstimateBiasOverCorrectedwhen the overall mean bucket miss was positive (historical overestimate bias) but the last ≥3 consecutive records are all ≤0 (AC#1)digest.ts—generateDigest()produces aCalibrationDigestwith per-class stats includingstageACoverageRate,promotionReadyflags, Q6 calibration state tokens, andformatDigestText()for TUI/Slack surfaces (AC#2, AC#3)cli-estimate digestsubcommand — exposes the digest via CLI with--format json|table(AC#3)class-proposals.ts— full proposal lifecycle: read → cluster by normalised name → auto-promote when ≥3 same-shape proposals accumulate to.ai-sdlc/estimate-classes.yaml(AC#4, AC#5)cli-estimate-classesCLI —review,promote,listsubcommands; degrade-open whenAI_SDLC_ESTIMATION_CALIBRATIONflag is off (AC#4, AC#5)docs/operations/estimation-promotion.md— documents the Phase 6 promotion criteria: n≥50, oneBucketMissRate≥0.95, threeBucketMissRate<0.05, stageACoverageRate>0.70 (AC#6)OrchestratorEventTypeextended withEstimateBiasOverCorrectedAcceptance criteria
EstimateBiasOverCorrectedevent emitted on detection (bias-drift.ts, 6 tests)digest.ts,cli-estimate digest, 17 tests)cli-estimates(queryStageACoverage,stageACoverageRatein digest,cli-estimate show <class> --check-drift)cli-estimate-classes reviewlists pending class proposals (10 tests for CLI)autoPromote(), 17 tests)docs/operations/estimation-promotion.mdTest plan
pnpm build— clean (all packages including@ai-sdlc/pipeline-cli)pnpm test— 50 new tests passing (bias-drift: 6, digest: 17, class-proposals: 17, estimate-classes CLI: 10); 1 pre-existing flaky failure inloop.test.ts(AISDLC-2 ordering race, unrelated to this PR's changes — onlyevents.tsunion type was modified)pnpm lint— cleanpnpm format:check— cleanloop.test.tsfailureNotes
loop.test.tsflaky failure (drains a 5-task fixture queue end-to-end with maxConcurrent=1) is pre-existing and unrelated to AISDLC-284 changes (my only orchestrator change is adding one type name to theOrchestratorEventTypeunion)🤖 Generated with Claude Code