Skip to content

fix(providers): detect truncated Anthropic and OpenAI Responses streams#33

Merged
ethanndickson merged 4 commits into
coder_2_33from
ethan/anthropic-require-message-stop
May 7, 2026
Merged

fix(providers): detect truncated Anthropic and OpenAI Responses streams#33
ethanndickson merged 4 commits into
coder_2_33from
ethan/anthropic-require-message-stop

Conversation

@ethanndickson
Copy link
Copy Markdown
Member

@ethanndickson ethanndickson commented May 7, 2026

Background

This started from a Coder agent observability report: a workspace agent's chat appeared to hang and then surfaced as context canceled in coderd logs, with no upstream error visible to the user. Root cause: a mitmproxy sitting between Coder and Anthropic was occasionally closing the upstream SSE response cleanly, mid-stream, before the response was complete. The fantasy Anthropic adapter treated the clean EOF as a successful Finish, committing the partial assistant text as the final answer. Because no error surfaces, chatretry.Retry never engages, the chatd retry budget is wasted, and the failure is invisible to operators.

This PR now applies the same terminal-event invariant to the OpenAI Responses API path too. Responses streaming has semantic lifecycle events; OpenAI documents response.completed as "emitted when the model response is complete," response.incomplete as "emitted when a response finishes as incomplete," and response.failed as the failed terminal event. The current Responses adapter had the same shape as the Anthropic bug: after stream.Next() stops, stream.Err() == nil meant Finish, even if no terminal lifecycle event arrived.

Closes CODAGT-325

Fix

Anthropic Messages streaming

The Anthropic Messages streaming protocol explicitly guarantees message_stop as the final SSE event of every successful stream (in two places on the same docs page: the "Event types" flow list and the "Full HTTP Stream response" section, each ending with "A final message_stop event"). Tool use, extended thinking, web search, and server tools all preserve this invariant; only the documented event: error failure mode replaces it.

Today, this adapter has an empty case "message_stop": arm and treats Stream.Err() == nil || errors.Is(err, io.EOF) as a successful Finish unconditionally. The Anthropic Go SDK does not enforce the terminal event either: in packages/ssestream/ssestream.go, Stream[T].Next returns false on a clean EOF and leaves Stream.Err() == nil. Application code must add the gate.

This PR tracks whether message_stop arrived during the SSE loop. On clean EOF without it, yield a StreamPartTypeError wrapping io.EOF. The existing transport-error path (else branch) is unchanged, and the event: error path keeps surfacing through Stream.Err(). Wrapping with %w preserves the underlying io.EOF so downstream classifiers (e.g. Coder's chaterror.Classify already matches eof in its timeoutPatterns) treat it as retryable without any extra plumbing.

OpenAI Responses streaming

For Responses API streaming, this PR tracks terminal lifecycle events before yielding Finish from both Stream and JSON-mode StreamObject:

  • response.completed marks a normal terminal response and can yield Finish.
  • response.incomplete marks a terminal response with an incomplete finish reason (e.g. max output tokens) and can yield Finish with the mapped finish reason.
  • response.failed now yields an error immediately instead of falling through to a synthetic Finish.
  • Clean EOF before any terminal lifecycle event yields a StreamPartTypeError / ObjectStreamPartTypeError wrapping io.EOF with an "openai responses stream closed before terminal event" message.

This deliberately does not change the legacy OpenAI Chat Completions adapter. That API has a different streaming shape ([DONE], chunk finish reasons, and SDK-level behavior), and the codex prior art is specifically for Responses-style terminal lifecycle events.

Coverage

Anthropic coverage is table-driven across complete stream, EOF before message_stop, empty stream, and malformed stream (existing error path preserved).

OpenAI Responses coverage is table-driven across response.completed, response.incomplete, EOF before terminal event, empty stream, response.failed, malformed stream, and JSON-mode StreamObject truncation.

Prior art

This is the same defense Anthropic itself shipped in their newest official SDK: MessageAccumulator in anthropic-sdk-java (PR anthropics/anthropic-sdk-java#178, merged 2025-03-21) raises IllegalStateException("'message_stop' event not yet received.") on MessageAccumulator.message() and has dedicated unit tests for both messageNotStarted and messageNotStopped.

OpenAI's openai/codex implements the analogous gate for the OpenAI Responses API: in codex-rs/codex-api/src/sse/responses.rs it emits ApiError::Stream("stream closed before response.completed") on early EOF, with a regression test (stream_no_completed.rs::retries_on_early_close) that feeds an incomplete_sse.json fixture and asserts the retry path fires. This is the direct precedent for the Responses API portion of this PR.

A community contributor independently arrived at the same conclusion for LiteLLM in BerriAI/litellm#20361 (open, "changes requested" as of Feb 2026), filed against issue BerriAI/litellm#20347 ("Anthropic streaming silently completes with empty content"). The proposed AnthropicStreamValidator synthesizes an incomplete_stream_error event on missing message_stop.

The bug class has multiple open user-facing reports:

  • anthropic-sdk-typescript#842 — "Streaming responses consistently interrupted mid-transmission - connection closes without message_stop event"
  • anthropic-sdk-python#1470 — "Streaming /v1/messages drops mid-stream with RemoteProtocolError on long code_execution + skills runs"
  • Roo-Code#12079 — "write_to_file called without required content parameter" (textbook truncated-tool-call signature)

Adjacent (not overlapping) prior work in fantasy

No other upstream fantasy PR or issue addresses the Anthropic message_stop gate (verified: 17 issue searches + 12 PR searches + reading every open PR and every Anthropic-titled PR in the repo + 6 months of providers/anthropic/anthropic.go commits + code search across charmbracelet/* for message_stop / sawMessageStop / "stream closed before").

Rollout

Once merged on coder_2_33, the consuming change in coder/coder is a pseudo-version bump in go.mod. No code changes needed in coder/coder for the Anthropic path: the wrapped io.EOF already classifies as retryable via the existing chaterror.Classify timeoutPatterns, so chatretry.Retry's 25-attempt budget engages automatically. The OpenAI Responses path now exposes the same class of retryable transport-shaped error for consumers that classify EOF.

Drive-by

The diff includes two small pre-existing cleanup items in tests:

  • A one-line gofmt fix in TestComputerUseToolJSON (a malformed require.Contains(...)}) line). The Anthropic test file was not gofmt-clean at coder_2_33 HEAD without it.
  • A one-line compile fix in an OpenAI test where toResponsesPrompt had been updated to return three values, but the test still expected two.

Drafting because we'd like to land this on the existing coder_2_33 line and then bump the pin in coder/coder once the SHA is settled.

The Anthropic Messages streaming protocol guarantees message_stop as the
final SSE event of every successful stream. Today the adapter treats any
clean EOF (Stream.Err() == nil or io.EOF) as a successful Finish, even
when the upstream body was cut off mid-response. This silently truncates
the assistant's reply and commits the partial text as if it were the
model's complete answer.

Track whether message_stop was observed during the SSE loop. On clean
EOF without it, yield StreamPartTypeError wrapping io.EOF so the failure
surfaces as a retryable transport error rather than a phantom success.
Existing transport errors continue to flow through the unchanged else
branch; the event: error path keeps yielding via Stream.Err().

Tests cover happy path, EOF before message_stop, empty stream, and
malformed stream (existing error path preserved).

Also picks up a one-line gofmt fix in TestComputerUseToolJSON; the test
file was not gofmt-clean at HEAD without it.
The OpenAI Responses API emits terminal lifecycle events when a streamed
response reaches its final state. The adapter currently yields Finish on
any clean EOF, even if the stream ended before response.completed or
response.incomplete. That has the same silent-truncation shape as the
Anthropic message_stop bug in this PR.

Track response.completed and response.incomplete before yielding Finish
from both Stream and StreamObject. If the transport closes cleanly first,
yield a StreamPartTypeError/ObjectStreamPartTypeError wrapping io.EOF so
callers can retry instead of committing partial output. Also surface
response.failed as an error event instead of falling through to Finish.

Tests cover completed and incomplete terminal events, EOF before terminal
event, empty streams, response.failed, malformed streams, and JSON-mode
StreamObject truncation.

Also fixes a pre-existing OpenAI test compile issue where one
 toResponsesPrompt call still expected two return values.
@ethanndickson ethanndickson changed the title fix(providers/anthropic): require message_stop before yielding Finish fix(providers): detect truncated Anthropic and OpenAI Responses streams May 7, 2026
@ethanndickson ethanndickson marked this pull request as ready for review May 7, 2026 09:11
@ibetitsmike
Copy link
Copy Markdown

@codex review

1 similar comment
@ethanndickson
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ethanndickson ethanndickson merged commit 246c4ae into coder_2_33 May 7, 2026
ethanndickson added a commit to coder/coder that referenced this pull request May 8, 2026
coder/fantasy now fails closed when Anthropic or OpenAI Responses
streams close before their provider terminal events instead of yielding
a successful finish.

This bumps the fantasy replacement to coder/fantasy#33 and teaches chat
error classification to treat those failures as retryable timeout errors
with explicit stream-closed messages.

<img width="875" height="311" alt="image"
src="https://github.com/user-attachments/assets/69c6f7b5-c885-46d2-a88b-b7a2b111bd55"
/>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants