Skip to content

Streaming responses consistently interrupted mid-transmission - connection closes without message_stop event #842

@gustavosizilio

Description

@gustavosizilio

Issue Description:

I'm experiencing a consistent issue where streaming responses from the Anthropic API are being prematurely terminated. The stream ends abruptly without sending a message_stop event, leaving the response incomplete. This happens specifically when using tool_use with large JSON payloads.

Environment:

  • SDK: @anthropic-ai/sdk v0.68.0
  • Runtime: Node.js (NestJS application)
  • Model: claude-3-5-haiku-latest (also tested with other models)
  • API: anthropicClient.beta.messages.create with stream: true

Observed Behavior:

The stream consistently ends mid-transmission after receiving multiple content_block_delta events:

[ClaudeService] Stream END event. Total bytes: 277346
[ClaudeService] Last chunk (last 500 chars): event: content_block_delta
data: {"type":"content_block_delta","index":2,"delta":{"type":"input_json_delta","partial_json":"fa"} }

Failed parsing Anthropic stream Error: Stream ended without message_stop event

The connection closes without:

  • A content_block_stop event for the current block
  • A message_delta event with stop reason
  • A message_stop event

The incomplete JSON in the last chunk ("partial_json":"fa") indicates the stream is being cut off mid-transmission rather than completing gracefully.

What I've extensively tried:

I've spent significant time trying different approaches to isolate the issue:

  1. Multiple custom fetch implementations:
    - Native Node.js fetch with custom HTTP agents
    - Axios with streaming response handling
    - Custom fetch wrapper using axios as transport
    - Different combinations of HTTP/HTTPS agent configurations
  2. Various timeout and connection configurations:
    - Disabled all timeouts (timeout: 0)
    - Configured HTTP agents with persistent keep-alive
    - Set maxContentLength: Infinity and maxBodyLength: Infinity
    - Implemented aggressive keep-alive settings (1-second intervals)
    - Added socket-level timeout prevention
  3. Different maxTokens values:
    - Tested with 2048, 4096, 8192, and higher values
    - The interruption occurs regardless of the token limit setting
  4. Stream handling variations:
    - Direct stream passthrough
    - ReadableStream conversion with proper backpressure handling
    - Added extensive error handling and logging
    - Monitored socket events (timeout, close, end, abort)
    - Pause/resume patterns on the underlying Node.js stream

What I cannot determine:

Despite all these attempts, I cannot determine if this is:

  • A backpressure/flow control issue between the client and server
  • A timing issue (some timeout I haven't been able to configure)?
  • A size limit (the stream consistently ends around 270-280KB)?
  • A network intermediary issue (Cloudflare proxy limitation)?
  • Something else entirely?

The fact that slowing down consumption fixes the issue suggests the SDK or underlying connection isn't properly handling backpressure signals.

Use Case:

My tool_use needs to generate large JSON structures (arrays with many objects) for data organization tasks. This is a legitimate use case where the response needs to be comprehensive.

Expected Behavior:

The stream should continue until:

  1. All tool_use input JSON is fully transmitted
  2. A content_block_stop event is sent
  3. A message_delta event with stop reason is sent
  4. A message_stop event is sent

The stream consumer speed should not affect whether the complete response is delivered.

Questions:

  1. Is there a known backpressure or flow control issue with the SDK's streaming implementation?
  2. Are there any undocumented limits (size, time, or otherwise) on streaming responses?
  3. Is there a recommended maximum size for tool_use JSON payloads?
  4. Could this be related to Cloudflare (I see cf-ray headers) buffering or timeout settings?
  5. Are there SDK configuration options for controlling stream consumption rate or buffering?

Request:

Could you please help identify:

  • Why rapid stream consumption causes premature termination
  • Whether this is a known issue with the SDK's async iterator implementation
  • How to properly handle backpressure for large tool_use responses
  • Any configuration or approach I should try

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions