Skip to content

Conversation

@notactuallytreyanastasio
Copy link

@notactuallytreyanastasio notactuallytreyanastasio commented Dec 19, 2025

This is an example PR for what's brought up in #867

I tried to ensure good test coverage.

Summary

This PR adds an idleTimeout option to detect and abort streams that stop sending SSE events without closing the connection. This addresses a critical failure mode where streaming responses hang indefinitely when:

  • Network connectivity becomes unstable mid-stream
  • The API server experiences transient issues
  • Load balancers or proxies close connections without proper cleanup

The Problem

Currently, the SDK provides:

  • timeout: Overall request timeout (covers connection + response time)
  • AbortController: Manual cancellation via signals

Neither detects stalled streams where the connection is alive but no data is flowing. When this happens, for await (const event of stream) blocks forever with no error and no indication anything is wrong.

Real-World Evidence

From production Claude Code CLI sessions, we observed:

  • Sessions hanging for 15+ minutes with 0% CPU (waiting for data)
  • "stop_reason": null in logs - response never completed
  • Only solution was to kill the process, losing all work in progress

The Solution

New idleTimeout option that:

  1. Tracks time since the last SSE event was received
  2. Uses Promise.race to race each chunk read against a timeout
  3. Throws StreamIdleTimeoutError with diagnostic information if the stream stalls

Changes

File Change
src/client.ts Add idleTimeout to ClientOptions
src/internal/request-options.ts Add idleTimeout to RequestOptions
src/core/error.ts Add StreamIdleTimeoutError class
src/core/streaming.ts Implement timeout in _iterSSEMessages
src/internal/parse.ts Pass options through to Stream.fromSSEResponse
src/index.ts Export StreamIdleTimeoutError
tests/streaming.test.ts Add 13 idle timeout unit tests
tests/api-resources/MessageStream.test.ts Add 4 client-level integration tests

API

// Set default idle timeout for all streams (90 seconds)
const client = new Anthropic({
  idleTimeout: 90_000,
});

// Override per-request
const stream = await client.messages.stream(params, {
  idleTimeout: 120_000, // 2 minutes for this specific request
});

// Catch the specific error type
try {
  for await (const event of stream) {
    // process events
  }
} catch (err) {
  if (err instanceof Anthropic.StreamIdleTimeoutError) {
    console.log(`Stream stalled after ${err.eventCount} events`);
    console.log(`Last event at: ${err.lastEventTime}`);
    console.log(`Configured timeout: ${err.idleTimeoutMs}ms`);
    // Retry or fail gracefully
  }
}

Type Safety

  • All new properties are optional (backward compatible)
  • No any types used
  • StreamIdleTimeoutError extends APIConnectionError (existing error hierarchy)
  • idleTimeout?: number | undefined to satisfy exactOptionalPropertyTypes

Test Plan

  • npm run lint passes
  • npm test passes (427 tests total, including 17 new idle timeout tests)
  • Types compile correctly

Test Coverage

Low-Level Unit Tests (streaming.test.ts) - 13 tests

Test Purpose
Basic timeout fires Core functionality works
Timeout resets on data Slow but consistent streams complete
Error has diagnostic info idleTimeoutMs, eventCount, lastEventTime
No timeout without option Backward compatibility
Options pass through Stream Factory pattern works
Cleanup on normal completion No timer memory leaks
Cleanup on user break No leaks when breaking from loop
Cleanup on manual abort No leaks on controller.abort()
Very short timeout (1ms) Edge case behavior
Multiple sequential streams No cross-contamination
Accurate event count Diagnostic accuracy
No timer leak on processing error Exception safety

Client Integration Tests (MessageStream.test.ts) - 4 tests

Test Purpose
Timeout via request options client.messages.stream(params, { idleTimeout })
Timeout via client default new Anthropic({ idleTimeout })
Request overrides client Option precedence works
Normal completion with timeout No false positives

Questions for Maintainers

  1. Default value: Should idleTimeout have a default (e.g., 2 minutes) or remain opt-in?
  2. Ping events: Should ping SSE events reset the idle timer, or only "meaningful" events?
  3. Retry integration: Should idle timeout trigger retry logic if maxRetries > 0?

Adds a new `idleTimeout` option to both `ClientOptions` and `RequestOptions`
that detects and aborts streams that stop sending SSE events.

When a stream stalls (server stops sending data without closing the connection),
the SDK will now throw a `StreamIdleTimeoutError` after the configured timeout,
allowing applications to handle the failure gracefully instead of hanging
indefinitely.

Changes:
- Add `idleTimeout` option to `ClientOptions` (default for all requests)
- Add `idleTimeout` option to `RequestOptions` (per-request override)
- Add `StreamIdleTimeoutError` class with diagnostic information
- Implement idle timeout in `_iterSSEMessages` using Promise.race
- Pass options through `Stream.fromSSEResponse` and response parsing
- Add comprehensive unit tests for idle timeout behavior

Usage:
```typescript
// Set default for all streams
const client = new Anthropic({ idleTimeout: 90_000 });

// Override per-request
const stream = await client.messages.stream(params, {
  idleTimeout: 120_000,
});
```
- Timer cleanup on normal completion
- Timer cleanup on user break/return
- Timer cleanup on manual abort
- Timer cleanup on processing error
- Very short timeout behavior
- Multiple sequential streams isolation
- Accurate event count tracking
- Memory leak prevention verification
- Test idleTimeout via request options
- Test idleTimeout via client default
- Test request options override client default
- Test normal completion with timeout configured

Verifies the full integration path from client.messages.stream()
through to StreamIdleTimeoutError.
@notactuallytreyanastasio notactuallytreyanastasio changed the title feat: add streaming idle timeout to detect stalled SSE connections bug/proposal: infinitely hanging clients breaking bigger/complex sessions. Proposal: add Streaming Idle Timeout to Prevent Indefinite Hangs #867 Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant