-
Notifications
You must be signed in to change notification settings - Fork 87
Description
Problem
ReqLLM streaming operations using Finch.stream() fail when connections are closed by the server during idle periods. This is particularly problematic for LLM workflows that have long thinking times between streaming chunks.
Current Behavior
When a streaming connection is closed (e.g., due to server-side idle timeout), the error is immediately propagated to the caller without retry:
{:error, %Mint.TransportError{reason: :closed}}{:error, %Mint.TransportError{reason: :timeout}}{:error, %Mint.TransportError{reason: :econnrefused}}
This occurs in ReqLLM.Streaming.FinchClient.start_streaming_task/6 where Finch.stream/5 is called directly without retry logic.
Expected Behavior
Streaming operations should automatically retry on transient connection errors, similar to:
- How Req handles retries with
retry: :transientoption - How
ReqLLM.Step.Retryalready handles non-streaming requests - The pattern used in langchain: Add
retry: transientto Req for Anthropic models in stream mode brainlid/langchain#329
Proposed Solution
Implement automatic retry logic at the Finch streaming layer by:
- Create retry module (
ReqLLM.Streaming.Retry) that wrapsFinch.stream/5calls - Detect retryable errors:
:closed,:timeout,:econnrefused - Configurable retries: Default 3 max attempts (consistent with
ReqLLM.Step.Retry) - Immediate retry: 0ms delay (can add backoff later)
- Configuration options:
streaming_max_retries(default: 3)streaming_retry_delay(default: 0)
Implementation Details
Files to Modify
New file: lib/req_llm/streaming/retry.ex
- Implement
stream_with_retry/4function - Detect retryable vs non-retryable errors
- Handle retry loop with configurable max attempts
Modify: lib/req_llm/streaming/finch_client.ex
- Wrap
Finch.stream/5call with retry logic (lines 152-226) - Pass retry configuration from options or application config
Modify: lib/req_llm/provider/options.ex
- Add
streaming_max_retriesandstreaming_retry_delayto schema
Example Usage
# Application config
config :req_llm,
streaming_max_retries: 3,
streaming_retry_delay: 0
# Per-request config
{:ok, response} = ReqLLM.stream_text(
model,
messages,
streaming_max_retries: 5,
streaming_retry_delay: 100
)References
- Langchain PR with similar fix: Add
retry: transientto Req for Anthropic models in stream mode brainlid/langchain#329 - Existing non-streaming retry:
ReqLLM.Step.Retrymodule - Req retry documentation: https://hexdocs.pm/req/Req.Steps.html#retry/1
Environment
- ReqLLM version: 1.2.0
- Issue occurs specifically during streaming operations (not non-streaming)
- Affects all providers using Finch for streaming (Anthropic, OpenAI, etc.)
Additional Context
This issue becomes more critical with extended thinking models (Claude Sonnet 4.5 with thinking) where the model may spend 30+ seconds in the thinking phase without sending chunks, making idle connection timeouts more likely.