Skip to content

model_context_window_exceeded is treated as retriable transient error #503

@vsumner-spacebot

Description

@vsumner-spacebot

Bug Description

When a worker hits the model's context window limit (model_context_window_exceeded), Spacebot treats it as a transient API error and retries with exponential backoff (5, 10, 20, 40, 80 seconds). Since the context is the same size on each retry, it fails every time, wasting ~3 minutes of retries and consuming API credits.

Reproduction

  1. Spawn a worker that performs commands with large output (e.g., journalctl with hundreds of lines)
  2. Worker accumulates enough tool results to exceed the model context window
  3. The API returns stop_reason: model_context_window_exceeded with empty content
  4. Spacebot retries 5 times with the same bloated context, failing each time

Expected Behavior

  • model_context_window_exceeded should be detected as non-retriable — this is a context length problem, not a transient API error
  • On context overflow, either: (a) truncate/summarize conversation history and retry once, or (b) fail fast
  • No exponential backoff for a condition that won't resolve by waiting

Actual Behavior

  • Treated as transient error, retried 5 times with delays of 5, 10, 20, 40, 80 seconds
  • All retries fail with the same error
  • Total wasted time: ~155 seconds
  • Total wasted API calls: 5

Log Evidence

WARN spacebot::llm::model: unexpected empty assistant_content from Anthropic stop_reason="model_context_window_exceeded"
WARN spacebot::agent::worker: transient provider error, backing off and retrying attempt=1..5 delay_secs=5..80
ERROR spacebot::agent::worker: worker transient error retries exhausted retries=5

Impact

  • Wastes API credits on guaranteed-to-fail retries
  • Adds ~3 minutes of delay before the user gets the failure
  • Can cascade: the main agent then tries to retrigger with the same bloated context

Suggested Fix

  1. Match on stop_reason == "model_context_window_exceeded" and classify as non-retriable
  2. Optionally: implement automatic context truncation (summarize oldest messages) and retry once
  3. Alternatively: add a max_input_tokens check before API calls to proactively prevent overflow

Environment

  • Spacebot version: 0.3.3
  • Model: zai_anthropic/glm-5-turbo
  • Observed when workers run journalctl with large output

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions