-
Notifications
You must be signed in to change notification settings - Fork 299
model_context_window_exceeded is treated as retriable transient error #503
Copy link
Copy link
Open
Description
Bug Description
When a worker hits the model's context window limit (model_context_window_exceeded), Spacebot treats it as a transient API error and retries with exponential backoff (5, 10, 20, 40, 80 seconds). Since the context is the same size on each retry, it fails every time, wasting ~3 minutes of retries and consuming API credits.
Reproduction
- Spawn a worker that performs commands with large output (e.g., journalctl with hundreds of lines)
- Worker accumulates enough tool results to exceed the model context window
- The API returns
stop_reason: model_context_window_exceededwith empty content - Spacebot retries 5 times with the same bloated context, failing each time
Expected Behavior
model_context_window_exceededshould be detected as non-retriable — this is a context length problem, not a transient API error- On context overflow, either: (a) truncate/summarize conversation history and retry once, or (b) fail fast
- No exponential backoff for a condition that won't resolve by waiting
Actual Behavior
- Treated as transient error, retried 5 times with delays of 5, 10, 20, 40, 80 seconds
- All retries fail with the same error
- Total wasted time: ~155 seconds
- Total wasted API calls: 5
Log Evidence
WARN spacebot::llm::model: unexpected empty assistant_content from Anthropic stop_reason="model_context_window_exceeded"
WARN spacebot::agent::worker: transient provider error, backing off and retrying attempt=1..5 delay_secs=5..80
ERROR spacebot::agent::worker: worker transient error retries exhausted retries=5
Impact
- Wastes API credits on guaranteed-to-fail retries
- Adds ~3 minutes of delay before the user gets the failure
- Can cascade: the main agent then tries to retrigger with the same bloated context
Suggested Fix
- Match on
stop_reason == "model_context_window_exceeded"and classify as non-retriable - Optionally: implement automatic context truncation (summarize oldest messages) and retry once
- Alternatively: add a
max_input_tokenscheck before API calls to proactively prevent overflow
Environment
- Spacebot version: 0.3.3
- Model: zai_anthropic/glm-5-turbo
- Observed when workers run journalctl with large output
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels