Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Product north star: `code-oz` is a **repo-native agentic SDLC runtime** (market
Gate-file writes are produced only by orchestrator-owned primitives in `src/state/gates.ts` / `src/state/run.ts`: phase-approval primitives (`writeGate`, `approveGate`, `approveReviewTaskGate`) for `GATE_<PHASE>_PASSED.json`, and intervention/control primitives (`writeNeedsInterventionGate`, `writePauseGate`, `writeStopGate`, all routed through `writeControlGate`) for `NEEDS_INTERVENTION.json` / `PAUSE.json` / `STOP.json`. These intervention writers are invoked by orchestrator-owned phase modules (`src/phases/*.ts`, `src/providers/invoke.ts`, `src/worktree/load-or-create-run-worktree.ts`) when a phase or wrapper refuses; they are part of the orchestrator surface, not external surfaces. MCP tool calls, hook invocations, and any future external integration surfaces cannot write gate files, canonical artifacts, or `events.jsonl` directly; only orchestrator-owned event-emission primitives append events. If a future external surface accepts write requests from clients, those requests are recorded as *advisory request files* under `.code-oz/state/runs/<runId>/requests/<request-id>.json` (same shape as `NEEDS_INTERVENTION.json`, a request the next phase preflight may consider) and never bypass gate validation; external consumers may read events and submit advisory requests, but they never own gate or event-log writes. Pinned 2026-05-10 from Mimir comparison (`docs/comparison/11-mimir/SYNTHESIS.md` § "C-MIMIR-1").
2. **Cross-family review at REVIEW gate.** REVIEW agent must be a different provider family than BUILD. Pass file paths, not curated summaries. (ARIS lesson)
3. **3-source verification before any code.** Spec + reference code + library docs. PLAN cannot pass without `SOURCE_CHECK.md`. (maestro lesson)
4. **Opus default; warn on downgrade.** `claude-opus-4-7` is the primary model; downgrading requires explicit config. (maestro session-55 lesson)
4. **Opus default; warn on downgrade.** `claude-opus-4-8` is the primary model; downgrading requires explicit config. (maestro session-55 lesson)
5. **Wave-based execution + grep verification** between phases catches pattern blindness.
6. **Hard cap on review loops:** max 4 rounds, exit on score≥6 + verdict=ready. (ARIS)
7. **Artifact contracts in plain Markdown** (`SPEC.md`, `PLAN.md`, `SOURCE_CHECK.md`, `BUILD_REPORT.md`, `VERIFY.md`, `REVIEW.md`, `AUDIT.md`) — never JSON serialization for inter-phase handoffs.
Expand Down
2 changes: 1 addition & 1 deletion docs/contracts/BUILD.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,7 @@ REVIEW-driven shape (M9 commit 10):

```yaml
provider: claude
modelPolicy: { primary: claude-opus-4-7, fallback: claude-sonnet-4-6 }
modelPolicy: { primary: claude-opus-4-8, fallback: claude-sonnet-4-6 }
permissions:
read: ['.code-oz/artifacts/SPEC.md', '.code-oz/artifacts/PLAN.md',
'.code-oz/artifacts/SOURCE_CHECK.md', '.code-oz/artifacts/HYPOTHESES.md',
Expand Down
16 changes: 8 additions & 8 deletions docs/contracts/COMPANY.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,12 +77,12 @@ company:
provider: codex
model: gpt-5.5
lead:
model: claude-opus-4-7
model: claude-opus-4-8
reviewer:
provider: gemini
```

In this example the resolved values are: `ba` runs on `codex` with model `gpt-5.5`; `lead` keeps its frontmatter provider (`claude`) but runs on `claude-opus-4-7`; `reviewer` runs on `gemini`. The third row will fail load-time validation today because `gemini` is not eligible for any phase in v0.1 (`capabilityOf('gemini').eligiblePhases === []`); the resolved-provider eligibility check raises `loader_provider_phase_not_eligible` before the run starts.
In this example the resolved values are: `ba` runs on `codex` with model `gpt-5.5`; `lead` keeps its frontmatter provider (`claude`) but runs on `claude-opus-4-8`; `reviewer` runs on `gemini`. The third row will fail load-time validation today because `gemini` is not eligible for any phase in v0.1 (`capabilityOf('gemini').eligiblePhases === []`); the resolved-provider eligibility check raises `loader_provider_phase_not_eligible` before the run starts.

## Validation rules

Expand Down Expand Up @@ -197,33 +197,33 @@ version: '0.12.0-alpha.0'
profile: greenfield
company:
ba:
model: claude-opus-4-7
model: claude-opus-4-8
lead:
provider: codex
model: gpt-5.5
builder:
provider: claude
model: claude-opus-4-7
model: claude-opus-4-8
verifier:
provider: claude
reviewer:
provider: codex
model: gpt-5.5
scientist:
provider: claude
model: claude-opus-4-7
model: claude-opus-4-8
```

Resolved registry after `bootstrap({ cwd, config })`:

| Role | Frontmatter provider | Resolved provider | Resolved model |
|---|---|---|---|
| `ba` | claude | claude | claude-opus-4-7 |
| `ba` | claude | claude | claude-opus-4-8 |
| `lead` | claude | codex | gpt-5.5 |
| `builder` | claude | claude | claude-opus-4-7 |
| `builder` | claude | claude | claude-opus-4-8 |
| `verifier` | claude | claude | persona frontmatter value |
| `reviewer` | codex | codex | gpt-5.5 |
| `scientist` | claude | claude | claude-opus-4-7 |
| `scientist` | claude | claude | claude-opus-4-8 |

Cross-family REVIEW holds: `reviewer` (codex) and `builder` (claude) resolve to different families. Eligibility holds: every (provider, phase) pair is in `capabilityOf(provider).eligiblePhases`. The `lead` override changes the debate-opposing-family calculation: if the bundled `lead.md` declares `opposingProviders: ['codex']`, the resolved family (`codex`) now appears in its own debate list — the post-override check raises `schema_invalid_permissions` at load time, before any debate call.

Expand Down
2 changes: 1 addition & 1 deletion docs/contracts/REVIEW_PANEL.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ The canonical `REVIEW.md` parser dispatches on the H2 heading line: an exact `##
- id: reviewer-C
providerId: claude
providerFamily: claude
modelPolicy: claude-opus-4-7
modelPolicy: claude-opus-4-8
role: advisory
score: 9
verdict: ready
Expand Down
2 changes: 1 addition & 1 deletion docs/contracts/SCIENTIST.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ If `retroSeedDefine` is `true`, DEFINE's gate-preflight runs the same sidecar va
```yaml
provider: claude # cross-family with Lead's claude default is acceptable in M6;
# M7 may flip Scientist to a Codex-family default to widen blind-spot coverage.
modelPolicy: { primary: claude-opus-4-7, fallback: claude-sonnet-4-6 }
modelPolicy: { primary: claude-opus-4-8, fallback: claude-sonnet-4-6 }
permissions:
read: ['.code-oz/artifacts/SPEC.md', '.code-oz/artifacts/PLAN.md',
'.code-oz/artifacts/HYPOTHESES.md', '.code-oz/artifacts/OPEN_QUESTIONS.md']
Expand Down
2 changes: 1 addition & 1 deletion docs/contracts/VERIFY.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ Populated only when `Verdict.Verdict = fail`. Mirrors [`BUILD.md`](./BUILD.md)

```yaml
provider: claude
modelPolicy: { primary: claude-opus-4-7, fallback: claude-sonnet-4-6 }
modelPolicy: { primary: claude-opus-4-8, fallback: claude-sonnet-4-6 }
permissions:
read: ['.code-oz/artifacts/SPEC.md', '.code-oz/artifacts/PLAN.md',
'.code-oz/artifacts/BUILD_REPORT.md',
Expand Down
2 changes: 1 addition & 1 deletion docs/references/agent-skill-format.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ name: ba-discovery
type: agent # agent | skill | phase | gate | hook
phase: define # define | plan | build | verify | review | ship | audit
provider: claude # claude | codex | gemini | fake
model: claude-opus-4-7 # optional; falls back to provider default
model: claude-opus-4-8 # optional; falls back to provider default
modelPolicy: opus-default # opus-default | strict-opus | any
permissions:
read: '*'
Expand Down
8 changes: 4 additions & 4 deletions docs/references/budgets.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ budgets:
maxWallTimeMinutes: 240 # since run_started.ts
softWarnAtRatio: 0.75 # emit budget_warning at 75% of cap
priceTable:
claude:claude-opus-4-7:
claude:claude-opus-4-8:
inputPerMTok: 5
outputPerMTok: 25
byRole:
Expand Down Expand Up @@ -83,15 +83,15 @@ call has no role — see "Role-identity binding" below), only per-phase
### `budgets.global.priceTable` (M6, extended in M13)

Operator-configured per-model rates for advisory dollar telemetry.
Keyed by `<provider>:<model>` (e.g., `claude:claude-opus-4-7`). Values
Keyed by `<provider>:<model>` (e.g., `claude:claude-opus-4-8`). Values
are non-negative finite numbers (NaN, Infinity, negative all reject).

Default `priceTable` populates Claude shipped models per
`platform.claude.com/docs/en/about-claude/pricing` (lookup 2026-05-01):
`platform.claude.com/docs/en/about-claude/pricing` (lookup 2026-05-29; Opus 4.8 $5/$25 unchanged from 4.7):

| Model | inputPerMTok | outputPerMTok |
|---|---|---|
| `claude-opus-4-7` | $5 | $25 |
| `claude-opus-4-8` | $5 | $25 |
| `claude-sonnet-4-6` | $3 | $15 |
| `claude-haiku-4-5-20251001` | $1 | $5 |

Expand Down
2 changes: 1 addition & 1 deletion docs/references/provider-contract.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ When neither resolves, the field is omitted on the event. The wrapper never emit
- **`agent_invoked.costEstimateUSD?`** — pre-call upper-bound. Combines input from `prepared.metrics.tokensEstimate` (4-chars/token bound) with output from `req.maxOutputTokens ?? 0`. The output default is a known underestimate when `maxOutputTokens` is unset; advisory only.
- **`agent_completed.costActualUSD?`** — post-call dollar actual with **output-tokens-only semantics** (Codex scope correction): today's Claude adapter reads `usage.output_tokens` and the xAI adapter reads `usage.completion_tokens`. Neither is full request cost. Operators reading this field as full invoice will understate spend.

`config.budgets.global.priceTable` ships Claude shipped-model defaults (Opus 4.7 = $5/$25, Sonnet 4.6 = $3/$15, Haiku 4.5 = $1/$5; dated source comment). xAI / Codex / Gemini / Fake stay omitted per the rotting-data discipline (Codex Q4-bis lock). Operator overrides via `.code-oz/config.yaml budgets.global.priceTable`.
`config.budgets.global.priceTable` ships Claude shipped-model defaults (Opus 4.8 = $5/$25, Sonnet 4.6 = $3/$15, Haiku 4.5 = $1/$5; dated source comment). xAI / Codex / Gemini / Fake stay omitted per the rotting-data discipline (Codex Q4-bis lock). Operator overrides via `.code-oz/config.yaml budgets.global.priceTable`.

See [`docs/references/budgets.md`](./budgets.md) for the user-facing surface and the cap-layer cascade.

Expand Down
8 changes: 4 additions & 4 deletions src/config/schema.ts
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ export interface GlobalBudget extends PhaseBudget {
softWarnAtRatio: number
/**
* M6 (rule 19, optional): per-model price table for dollar telemetry. Keys
* are `<provider>:<model>` (e.g. `claude:claude-opus-4-7`). Values are the
* are `<provider>:<model>` (e.g. `claude:claude-opus-4-8`). Values are the
* per-MTok prices from platform.claude.com. Telemetry only — never used
* for budget enforcement.
*
Expand Down Expand Up @@ -311,7 +311,7 @@ export const DEFAULT_CONFIG: CodeOzConfig = {
profile: 'greenfield',
defaultProvider: 'claude',
models: {
primary: 'claude-opus-4-7',
primary: 'claude-opus-4-8',
reviewer: 'gpt-5.5',
},
budgets: {
Expand All @@ -333,9 +333,9 @@ export const DEFAULT_CONFIG: CodeOzConfig = {
// Gemini is a stub, Fake is the offline test runtime. Operator
// overrides via `.code-oz/config.yaml budgets.global.priceTable`.
// Source: https://platform.claude.com/docs/en/about-claude/pricing
// Lookup date: 2026-05-01
// Lookup date: 2026-05-29 (Opus 4.8 is the default; $5/$25 unchanged from 4.7)
priceTable: Object.freeze({
'claude:claude-opus-4-7': Object.freeze({ inputPerMTok: 5, outputPerMTok: 25 }),
'claude:claude-opus-4-8': Object.freeze({ inputPerMTok: 5, outputPerMTok: 25 }),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Retain the 4.7 price entry for pinned configs

When a project explicitly pins claude-opus-4-7 via models.primary, company.<role>.model, or agent frontmatter but relies on the default budget table, this replacement makes USD telemetry disappear for that still-supported model: estimateCostUSD/actualCostUSD resolve only the exact claude:<model> key and Claude has no provider-level costPerMTok fallback. Since the comment says the 4.8 price is unchanged from 4.7, keeping both keys preserves costEstimateUSD/costActualUSD for users who downgrade or haven't migrated yet.

Useful? React with 👍 / 👎.

'claude:claude-sonnet-4-6': Object.freeze({ inputPerMTok: 3, outputPerMTok: 15 }),
'claude:claude-haiku-4-5-20251001': Object.freeze({
inputPerMTok: 1,
Expand Down
2 changes: 1 addition & 1 deletion src/providers/capabilities.ts
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ export const DEFAULT_CAPABILITY_BY_ID: Readonly<Record<ProviderId, ProviderCapab
authSource: 'claude-cli-oauth' as const,
eligiblePhases: ALL_PHASES,
// costPerMTok / rateLimits omitted: per-provider granularity is
// M13's decision. Opus 4.7 / Sonnet 4.6 / Haiku 4.5 prices are
// M13's decision. Opus 4.8 / Sonnet 4.6 / Haiku 4.5 prices are
// model-level, not provider-level.
}),
codex: Object.freeze({
Expand Down
2 changes: 1 addition & 1 deletion tests/cli-init.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ describe('code-oz init', () => {

expect(config.version).toBe('0.21.1-alpha.0')
expect(config.defaultProvider).toBe('claude')
expect(config.models.primary).toBe('claude-opus-4-7')
expect(config.models.primary).toBe('claude-opus-4-8')
expect(config.permissions.allowEscapeHatch).toBe(false)
})

Expand Down
4 changes: 2 additions & 2 deletions tests/config-budgets.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ describe('budgets.global extension', () => {
// Codex Q4-bis lock: per-model Claude prices live in priceTable
// (model-level), NOT in DEFAULT_CAPABILITY_BY_ID.claude.costPerMTok
// (provider-level — has no model dimension). Source:
// https://platform.claude.com/docs/en/about-claude/pricing (2026-05-01).
// https://platform.claude.com/docs/en/about-claude/pricing (2026-05-29).
const t = DEFAULT_CONFIG.budgets.global.priceTable
expect(t).toBeDefined()
expect(t!['claude:claude-opus-4-7']).toEqual({ inputPerMTok: 5, outputPerMTok: 25 })
expect(t!['claude:claude-opus-4-8']).toEqual({ inputPerMTok: 5, outputPerMTok: 25 })
expect(t!['claude:claude-sonnet-4-6']).toEqual({ inputPerMTok: 3, outputPerMTok: 15 })
expect(t!['claude:claude-haiku-4-5-20251001']).toEqual({ inputPerMTok: 1, outputPerMTok: 5 })
})
Expand Down
Loading