Model fallback & graceful degradation for token/rate limit exhaustion

## Problem

Overstory currently has **no fallback or retry mechanism** when an agent's underlying LLM provider hits token limits, rate limits (HTTP 429), or quota exhaustion. The system operates as a process-level orchestrator and delegates all API-level concerns to the CLI runtime (e.g., Claude Code). This creates several failure modes:

| Scenario | Current Behavior | Impact |
|----------|-----------------|--------|
| API rate limit (429) | CLI retries internally; if stalled >5min, watchdog nudges → kills | Agent work lost, no respawn |
| Context window exhaustion | CLI does internal compaction; Overstory unaware | Agent may lose context without checkpointing |
| Quota/billing limit hit | Agent process dies | Watchdog marks zombie → no respawn, no model switch |
| Provider outage | Agent stalls indefinitely | Killed by watchdog after `zombieThresholdMs` |

When running at scale (10-25 concurrent agents), a single provider rate limit can cascade — multiple agents hit the ceiling simultaneously, stall, and get killed by the watchdog with no recovery.

## Proposed Solution

### 1. Model Fallback Chain in `config.yaml`

Allow defining ordered fallback models per role:

```yaml
models:
  coordinator:
    primary: opus
    fallback: [sonnet, haiku]
  lead:
    primary: opus
    fallback: [sonnet]
  scout:
    primary: haiku
    fallback: [gemini-2.5-flash]  # cross-runtime fallback
  builder:
    primary: sonnet
    fallback: [minimax-m2.5]
```

When `resolveModel()` in `src/agents/manifest.ts` is called, it would return the primary model. On detected failure, the system retries with the next model in the chain.

### 2. Failure Detection Layer

Add API-level awareness between the runtime adapter and the watchdog:

- **Runtime adapters** (`src/runtimes/*.ts`) could expose a `detectFailure()` method (similar to existing `detectReady()`) that parses tmux pane output for known error patterns:
  - `rate_limit`, `429`, `overloaded`, `quota exceeded`, `context_length_exceeded`
  - Provider-specific error signatures
- **Watchdog integration**: Before escalating to kill, check if the failure is recoverable via model fallback

### 3. Automatic Agent Respawn with Degraded Model

When the watchdog detects a token/rate limit failure:

1. Save agent checkpoint (if not already saved)
2. Kill the current session
3. Respawn with the next fallback model from the chain
4. Restore from checkpoint
5. Log the degradation event for observability

### 4. Per-Agent Token Budget (Optional)

Allow setting soft/hard token limits per agent role:

```yaml
budgets:
  scout:
    softLimit: 50000    # warn at 50k tokens
    hardLimit: 100000   # force checkpoint + stop at 100k
  builder:
    softLimit: 200000
    hardLimit: 500000
```

This would require real-time token counting from transcript parsing (building on existing `src/metrics/transcript.ts`).

## Alternatives Considered

- **Do nothing**: Rely on CLI-level retry. Works for transient 429s but fails for quota exhaustion and provides no cross-provider fallback.
- **External proxy (LiteLLM/OpenRouter)**: Route all API calls through a proxy that handles fallback. Adds infrastructure complexity but works without Overstory changes. However, this doesn't solve the checkpoint/respawn problem.
- **Prompt-only checkpointing**: Current approach — agents are *instructed* to checkpoint. Unreliable since it depends on the LLM following instructions.

## Context

Analysis was done by reading Overstory CLI source code (`src/watchdog/`, `src/runtimes/`, `src/agents/`, `src/config.ts`). Confirmed that:

- `resolveModel()` is a static one-time lookup at spawn
- Watchdog operates on OS signals only (tmux alive/dead, PID alive/dead)
- No `fallback`, `retry`, `budget`, or `degradation` keys exist in the config schema
- Checkpoint system is prompt-level only, not code-enforced

This becomes especially important for multi-provider setups (e.g., Anthropic for coordinators, Fireworks for scouts, OpenAI for builders) where different providers have different rate limits and failure modes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Model fallback & graceful degradation for token/rate limit exhaustion #67

Problem

Proposed Solution

1. Model Fallback Chain in `config.yaml`

2. Failure Detection Layer

3. Automatic Agent Respawn with Degraded Model

4. Per-Agent Token Budget (Optional)

Alternatives Considered

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Scenario	Current Behavior	Impact
API rate limit (429)	CLI retries internally; if stalled >5min, watchdog nudges → kills	Agent work lost, no respawn
Context window exhaustion	CLI does internal compaction; Overstory unaware	Agent may lose context without checkpointing
Quota/billing limit hit	Agent process dies	Watchdog marks zombie → no respawn, no model switch
Provider outage	Agent stalls indefinitely	Killed by watchdog after `zombieThresholdMs`

Uh oh!

Model fallback & graceful degradation for token/rate limit exhaustion #67

Description

Problem

Proposed Solution

1. Model Fallback Chain in config.yaml

2. Failure Detection Layer

3. Automatic Agent Respawn with Degraded Model

4. Per-Agent Token Budget (Optional)

Alternatives Considered

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Model Fallback Chain in `config.yaml`