diff --git a/CLAUDE.md b/CLAUDE.md index 1fa78bc..bfbc697 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -27,7 +27,7 @@ Product north star: `code-oz` is a **repo-native agentic SDLC runtime** (market Gate-file writes are produced only by orchestrator-owned primitives in `src/state/gates.ts` / `src/state/run.ts`: phase-approval primitives (`writeGate`, `approveGate`, `approveReviewTaskGate`) for `GATE__PASSED.json`, and intervention/control primitives (`writeNeedsInterventionGate`, `writePauseGate`, `writeStopGate`, all routed through `writeControlGate`) for `NEEDS_INTERVENTION.json` / `PAUSE.json` / `STOP.json`. These intervention writers are invoked by orchestrator-owned phase modules (`src/phases/*.ts`, `src/providers/invoke.ts`, `src/worktree/load-or-create-run-worktree.ts`) when a phase or wrapper refuses; they are part of the orchestrator surface, not external surfaces. MCP tool calls, hook invocations, and any future external integration surfaces cannot write gate files, canonical artifacts, or `events.jsonl` directly; only orchestrator-owned event-emission primitives append events. If a future external surface accepts write requests from clients, those requests are recorded as *advisory request files* under `.code-oz/state/runs//requests/.json` (same shape as `NEEDS_INTERVENTION.json`, a request the next phase preflight may consider) and never bypass gate validation; external consumers may read events and submit advisory requests, but they never own gate or event-log writes. Pinned 2026-05-10 from Mimir comparison (`docs/comparison/11-mimir/SYNTHESIS.md` § "C-MIMIR-1"). 2. **Cross-family review at REVIEW gate.** REVIEW agent must be a different provider family than BUILD. Pass file paths, not curated summaries. (ARIS lesson) 3. **3-source verification before any code.** Spec + reference code + library docs. PLAN cannot pass without `SOURCE_CHECK.md`. (maestro lesson) -4. **Opus default; warn on downgrade.** `claude-opus-4-7` is the primary model; downgrading requires explicit config. (maestro session-55 lesson) +4. **Opus default; warn on downgrade.** `claude-opus-4-8` is the primary model; downgrading requires explicit config. (maestro session-55 lesson) 5. **Wave-based execution + grep verification** between phases catches pattern blindness. 6. **Hard cap on review loops:** max 4 rounds, exit on score≥6 + verdict=ready. (ARIS) 7. **Artifact contracts in plain Markdown** (`SPEC.md`, `PLAN.md`, `SOURCE_CHECK.md`, `BUILD_REPORT.md`, `VERIFY.md`, `REVIEW.md`, `AUDIT.md`) — never JSON serialization for inter-phase handoffs. diff --git a/docs/contracts/BUILD.md b/docs/contracts/BUILD.md index 47ef314..5251391 100644 --- a/docs/contracts/BUILD.md +++ b/docs/contracts/BUILD.md @@ -164,7 +164,7 @@ REVIEW-driven shape (M9 commit 10): ```yaml provider: claude -modelPolicy: { primary: claude-opus-4-7, fallback: claude-sonnet-4-6 } +modelPolicy: { primary: claude-opus-4-8, fallback: claude-sonnet-4-6 } permissions: read: ['.code-oz/artifacts/SPEC.md', '.code-oz/artifacts/PLAN.md', '.code-oz/artifacts/SOURCE_CHECK.md', '.code-oz/artifacts/HYPOTHESES.md', diff --git a/docs/contracts/COMPANY.md b/docs/contracts/COMPANY.md index ea342c2..f55e591 100644 --- a/docs/contracts/COMPANY.md +++ b/docs/contracts/COMPANY.md @@ -77,12 +77,12 @@ company: provider: codex model: gpt-5.5 lead: - model: claude-opus-4-7 + model: claude-opus-4-8 reviewer: provider: gemini ``` -In this example the resolved values are: `ba` runs on `codex` with model `gpt-5.5`; `lead` keeps its frontmatter provider (`claude`) but runs on `claude-opus-4-7`; `reviewer` runs on `gemini`. The third row will fail load-time validation today because `gemini` is not eligible for any phase in v0.1 (`capabilityOf('gemini').eligiblePhases === []`); the resolved-provider eligibility check raises `loader_provider_phase_not_eligible` before the run starts. +In this example the resolved values are: `ba` runs on `codex` with model `gpt-5.5`; `lead` keeps its frontmatter provider (`claude`) but runs on `claude-opus-4-8`; `reviewer` runs on `gemini`. The third row will fail load-time validation today because `gemini` is not eligible for any phase in v0.1 (`capabilityOf('gemini').eligiblePhases === []`); the resolved-provider eligibility check raises `loader_provider_phase_not_eligible` before the run starts. ## Validation rules @@ -197,13 +197,13 @@ version: '0.12.0-alpha.0' profile: greenfield company: ba: - model: claude-opus-4-7 + model: claude-opus-4-8 lead: provider: codex model: gpt-5.5 builder: provider: claude - model: claude-opus-4-7 + model: claude-opus-4-8 verifier: provider: claude reviewer: @@ -211,19 +211,19 @@ company: model: gpt-5.5 scientist: provider: claude - model: claude-opus-4-7 + model: claude-opus-4-8 ``` Resolved registry after `bootstrap({ cwd, config })`: | Role | Frontmatter provider | Resolved provider | Resolved model | |---|---|---|---| -| `ba` | claude | claude | claude-opus-4-7 | +| `ba` | claude | claude | claude-opus-4-8 | | `lead` | claude | codex | gpt-5.5 | -| `builder` | claude | claude | claude-opus-4-7 | +| `builder` | claude | claude | claude-opus-4-8 | | `verifier` | claude | claude | persona frontmatter value | | `reviewer` | codex | codex | gpt-5.5 | -| `scientist` | claude | claude | claude-opus-4-7 | +| `scientist` | claude | claude | claude-opus-4-8 | Cross-family REVIEW holds: `reviewer` (codex) and `builder` (claude) resolve to different families. Eligibility holds: every (provider, phase) pair is in `capabilityOf(provider).eligiblePhases`. The `lead` override changes the debate-opposing-family calculation: if the bundled `lead.md` declares `opposingProviders: ['codex']`, the resolved family (`codex`) now appears in its own debate list — the post-override check raises `schema_invalid_permissions` at load time, before any debate call. diff --git a/docs/contracts/REVIEW_PANEL.md b/docs/contracts/REVIEW_PANEL.md index 90c8ce5..88a07e9 100644 --- a/docs/contracts/REVIEW_PANEL.md +++ b/docs/contracts/REVIEW_PANEL.md @@ -211,7 +211,7 @@ The canonical `REVIEW.md` parser dispatches on the H2 heading line: an exact `## - id: reviewer-C providerId: claude providerFamily: claude - modelPolicy: claude-opus-4-7 + modelPolicy: claude-opus-4-8 role: advisory score: 9 verdict: ready diff --git a/docs/contracts/SCIENTIST.md b/docs/contracts/SCIENTIST.md index b2ff63f..8dc51ed 100644 --- a/docs/contracts/SCIENTIST.md +++ b/docs/contracts/SCIENTIST.md @@ -105,7 +105,7 @@ If `retroSeedDefine` is `true`, DEFINE's gate-preflight runs the same sidecar va ```yaml provider: claude # cross-family with Lead's claude default is acceptable in M6; # M7 may flip Scientist to a Codex-family default to widen blind-spot coverage. -modelPolicy: { primary: claude-opus-4-7, fallback: claude-sonnet-4-6 } +modelPolicy: { primary: claude-opus-4-8, fallback: claude-sonnet-4-6 } permissions: read: ['.code-oz/artifacts/SPEC.md', '.code-oz/artifacts/PLAN.md', '.code-oz/artifacts/HYPOTHESES.md', '.code-oz/artifacts/OPEN_QUESTIONS.md'] diff --git a/docs/contracts/VERIFY.md b/docs/contracts/VERIFY.md index f9a7fbb..31deca7 100644 --- a/docs/contracts/VERIFY.md +++ b/docs/contracts/VERIFY.md @@ -128,7 +128,7 @@ Populated only when `Verdict.Verdict = fail`. Mirrors [`BUILD.md`](./BUILD.md) ```yaml provider: claude -modelPolicy: { primary: claude-opus-4-7, fallback: claude-sonnet-4-6 } +modelPolicy: { primary: claude-opus-4-8, fallback: claude-sonnet-4-6 } permissions: read: ['.code-oz/artifacts/SPEC.md', '.code-oz/artifacts/PLAN.md', '.code-oz/artifacts/BUILD_REPORT.md', diff --git a/docs/references/agent-skill-format.md b/docs/references/agent-skill-format.md index 8df682a..045e681 100644 --- a/docs/references/agent-skill-format.md +++ b/docs/references/agent-skill-format.md @@ -84,7 +84,7 @@ name: ba-discovery type: agent # agent | skill | phase | gate | hook phase: define # define | plan | build | verify | review | ship | audit provider: claude # claude | codex | gemini | fake -model: claude-opus-4-7 # optional; falls back to provider default +model: claude-opus-4-8 # optional; falls back to provider default modelPolicy: opus-default # opus-default | strict-opus | any permissions: read: '*' diff --git a/docs/references/budgets.md b/docs/references/budgets.md index 637211e..ad14621 100644 --- a/docs/references/budgets.md +++ b/docs/references/budgets.md @@ -46,7 +46,7 @@ budgets: maxWallTimeMinutes: 240 # since run_started.ts softWarnAtRatio: 0.75 # emit budget_warning at 75% of cap priceTable: - claude:claude-opus-4-7: + claude:claude-opus-4-8: inputPerMTok: 5 outputPerMTok: 25 byRole: @@ -83,15 +83,15 @@ call has no role — see "Role-identity binding" below), only per-phase ### `budgets.global.priceTable` (M6, extended in M13) Operator-configured per-model rates for advisory dollar telemetry. -Keyed by `:` (e.g., `claude:claude-opus-4-7`). Values +Keyed by `:` (e.g., `claude:claude-opus-4-8`). Values are non-negative finite numbers (NaN, Infinity, negative all reject). Default `priceTable` populates Claude shipped models per -`platform.claude.com/docs/en/about-claude/pricing` (lookup 2026-05-01): +`platform.claude.com/docs/en/about-claude/pricing` (lookup 2026-05-29; Opus 4.8 $5/$25 unchanged from 4.7): | Model | inputPerMTok | outputPerMTok | |---|---|---| -| `claude-opus-4-7` | $5 | $25 | +| `claude-opus-4-8` | $5 | $25 | | `claude-sonnet-4-6` | $3 | $15 | | `claude-haiku-4-5-20251001` | $1 | $5 | diff --git a/docs/references/provider-contract.md b/docs/references/provider-contract.md index 0956d4b..d35ba03 100644 --- a/docs/references/provider-contract.md +++ b/docs/references/provider-contract.md @@ -221,7 +221,7 @@ When neither resolves, the field is omitted on the event. The wrapper never emit - **`agent_invoked.costEstimateUSD?`** — pre-call upper-bound. Combines input from `prepared.metrics.tokensEstimate` (4-chars/token bound) with output from `req.maxOutputTokens ?? 0`. The output default is a known underestimate when `maxOutputTokens` is unset; advisory only. - **`agent_completed.costActualUSD?`** — post-call dollar actual with **output-tokens-only semantics** (Codex scope correction): today's Claude adapter reads `usage.output_tokens` and the xAI adapter reads `usage.completion_tokens`. Neither is full request cost. Operators reading this field as full invoice will understate spend. -`config.budgets.global.priceTable` ships Claude shipped-model defaults (Opus 4.7 = $5/$25, Sonnet 4.6 = $3/$15, Haiku 4.5 = $1/$5; dated source comment). xAI / Codex / Gemini / Fake stay omitted per the rotting-data discipline (Codex Q4-bis lock). Operator overrides via `.code-oz/config.yaml budgets.global.priceTable`. +`config.budgets.global.priceTable` ships Claude shipped-model defaults (Opus 4.8 = $5/$25, Sonnet 4.6 = $3/$15, Haiku 4.5 = $1/$5; dated source comment). xAI / Codex / Gemini / Fake stay omitted per the rotting-data discipline (Codex Q4-bis lock). Operator overrides via `.code-oz/config.yaml budgets.global.priceTable`. See [`docs/references/budgets.md`](./budgets.md) for the user-facing surface and the cap-layer cascade. diff --git a/src/config/schema.ts b/src/config/schema.ts index 0bf999d..2418159 100644 --- a/src/config/schema.ts +++ b/src/config/schema.ts @@ -105,7 +105,7 @@ export interface GlobalBudget extends PhaseBudget { softWarnAtRatio: number /** * M6 (rule 19, optional): per-model price table for dollar telemetry. Keys - * are `:` (e.g. `claude:claude-opus-4-7`). Values are the + * are `:` (e.g. `claude:claude-opus-4-8`). Values are the * per-MTok prices from platform.claude.com. Telemetry only — never used * for budget enforcement. * @@ -311,7 +311,7 @@ export const DEFAULT_CONFIG: CodeOzConfig = { profile: 'greenfield', defaultProvider: 'claude', models: { - primary: 'claude-opus-4-7', + primary: 'claude-opus-4-8', reviewer: 'gpt-5.5', }, budgets: { @@ -333,9 +333,9 @@ export const DEFAULT_CONFIG: CodeOzConfig = { // Gemini is a stub, Fake is the offline test runtime. Operator // overrides via `.code-oz/config.yaml budgets.global.priceTable`. // Source: https://platform.claude.com/docs/en/about-claude/pricing - // Lookup date: 2026-05-01 + // Lookup date: 2026-05-29 (Opus 4.8 is the default; $5/$25 unchanged from 4.7) priceTable: Object.freeze({ - 'claude:claude-opus-4-7': Object.freeze({ inputPerMTok: 5, outputPerMTok: 25 }), + 'claude:claude-opus-4-8': Object.freeze({ inputPerMTok: 5, outputPerMTok: 25 }), 'claude:claude-sonnet-4-6': Object.freeze({ inputPerMTok: 3, outputPerMTok: 15 }), 'claude:claude-haiku-4-5-20251001': Object.freeze({ inputPerMTok: 1, diff --git a/src/providers/capabilities.ts b/src/providers/capabilities.ts index 8cbaa6b..8164468 100644 --- a/src/providers/capabilities.ts +++ b/src/providers/capabilities.ts @@ -89,7 +89,7 @@ export const DEFAULT_CAPABILITY_BY_ID: Readonly { expect(config.version).toBe('0.21.1-alpha.0') expect(config.defaultProvider).toBe('claude') - expect(config.models.primary).toBe('claude-opus-4-7') + expect(config.models.primary).toBe('claude-opus-4-8') expect(config.permissions.allowEscapeHatch).toBe(false) }) diff --git a/tests/config-budgets.test.ts b/tests/config-budgets.test.ts index a3c177e..d6b4710 100644 --- a/tests/config-budgets.test.ts +++ b/tests/config-budgets.test.ts @@ -26,10 +26,10 @@ describe('budgets.global extension', () => { // Codex Q4-bis lock: per-model Claude prices live in priceTable // (model-level), NOT in DEFAULT_CAPABILITY_BY_ID.claude.costPerMTok // (provider-level — has no model dimension). Source: - // https://platform.claude.com/docs/en/about-claude/pricing (2026-05-01). + // https://platform.claude.com/docs/en/about-claude/pricing (2026-05-29). const t = DEFAULT_CONFIG.budgets.global.priceTable expect(t).toBeDefined() - expect(t!['claude:claude-opus-4-7']).toEqual({ inputPerMTok: 5, outputPerMTok: 25 }) + expect(t!['claude:claude-opus-4-8']).toEqual({ inputPerMTok: 5, outputPerMTok: 25 }) expect(t!['claude:claude-sonnet-4-6']).toEqual({ inputPerMTok: 3, outputPerMTok: 15 }) expect(t!['claude:claude-haiku-4-5-20251001']).toEqual({ inputPerMTok: 1, outputPerMTok: 5 }) })