Skip to content

fix(agent-runner): deliver budget/billing error turns instead of dropping them#2759

Open
assapin wants to merge 2 commits into
nanocoai:mainfrom
assapin:fix/budget-error-surfaced-to-user
Open

fix(agent-runner): deliver budget/billing error turns instead of dropping them#2759
assapin wants to merge 2 commits into
nanocoai:mainfrom
assapin:fix/budget-error-surfaced-to-user

Conversation

@assapin

@assapin assapin commented Jun 14, 2026

Copy link
Copy Markdown

Type of Change

  • Feature skill
  • Utility skill
  • Operational/container skill
  • Fix - bug fix or security fix to source code
  • Simplification
  • Documentation

Description

Closes #2751.

What — Budget/token-exhausted LLM turns (e.g. an Anthropic 403 billing_error, directly or via the OneCLI gateway) now reach the user instead of being silently dropped.

Why — In streaming mode the Agent SDK doesn't throw for a budget/billing error; it yields a terminal result with is_error: true and no <message> envelope. translateEvents discarded is_error, and dispatchResultText treated the unwrapped notice as scratchpad and dropped it — then the poll-loop pushed the re-wrap nudge → new turn → same error, re-hammering the gateway until idle-kill. Net: the user got silence.

How it works

  • providers/claude.ts — surface is_error on the result event, and fall back to the error subtype's errors[] for the message text (error subtypes carry no result).
  • poll-loop.ts — when a result has no <message> blocks and is_error, deliver the notice verbatim to the originating channel (deliverErrorResult, the same write the catch block already uses) and skip the re-wrap nudge. The happy path and the genuine wrap-mistake nudge are untouched.

How it was tested

  • TDD unit tests (bun test): assert the error notice is delivered to the channel with no nudge, and that a normal unwrapped result still nudges (regression guard).
  • Full suite green: container 111 pass, host 472 pass, both typechecks + prettier.
  • Verified live on a real agent image + SDK with ANTHROPIC_BASE_URL pointed at a local 403 billing_error mock: the notice is delivered to the channel and the retry loop is gone.

Dependency

Requires the gateway to emit a standard Anthropic 403 so the SDK sets is_error — e.g. OneCLI returning 403 {"type":"error","error":{"type":"billing_error","message":"…"}} instead of a fabricated HTTP 200. Detailed in #2751. (Not 429 — that's retryable, the SDK auto-retries it, and the runner only logs rate-limit events → still silent.)

…ping them

A turn that ends in a non-retryable provider error (e.g. an Anthropic
403 billing_error) comes back from the streaming SDK as a result with
is_error=true and no <message> envelope. dispatchResultText treated it
as scratchpad and dropped it, then the poll-loop pushed a re-wrap nudge
-> new turn -> same error, re-hammering the gateway until idle-kill. The
user saw silence.

- providers/claude.ts: surface is_error on the result event, and fall
  back to errors[] for the message text (error subtypes carry no result).
- poll-loop.ts: when a result has no <message> blocks and is_error, deliver
  the notice verbatim to the originating channel and skip the nudge.

Verified live (real agent image + SDK, 403 mock): the notice is delivered
to the channel and the retry loop is gone.

Refs nanocoai#2751
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: Fix Bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Budget-exhausted LLM turns are silently dropped — user gets no reply

2 participants