Skip to content

Add retry logic for streaming connection failures #314

@jonator

Description

@jonator

Problem

ReqLLM streaming operations using Finch.stream() fail when connections are closed by the server during idle periods. This is particularly problematic for LLM workflows that have long thinking times between streaming chunks.

Current Behavior

When a streaming connection is closed (e.g., due to server-side idle timeout), the error is immediately propagated to the caller without retry:

  • {:error, %Mint.TransportError{reason: :closed}}
  • {:error, %Mint.TransportError{reason: :timeout}}
  • {:error, %Mint.TransportError{reason: :econnrefused}}

This occurs in ReqLLM.Streaming.FinchClient.start_streaming_task/6 where Finch.stream/5 is called directly without retry logic.

Expected Behavior

Streaming operations should automatically retry on transient connection errors, similar to:

  1. How Req handles retries with retry: :transient option
  2. How ReqLLM.Step.Retry already handles non-streaming requests
  3. The pattern used in langchain: Add retry: transient to Req for Anthropic models in stream mode brainlid/langchain#329

Proposed Solution

Implement automatic retry logic at the Finch streaming layer by:

  1. Create retry module (ReqLLM.Streaming.Retry) that wraps Finch.stream/5 calls
  2. Detect retryable errors: :closed, :timeout, :econnrefused
  3. Configurable retries: Default 3 max attempts (consistent with ReqLLM.Step.Retry)
  4. Immediate retry: 0ms delay (can add backoff later)
  5. Configuration options:
    • streaming_max_retries (default: 3)
    • streaming_retry_delay (default: 0)

Implementation Details

Files to Modify

New file: lib/req_llm/streaming/retry.ex

  • Implement stream_with_retry/4 function
  • Detect retryable vs non-retryable errors
  • Handle retry loop with configurable max attempts

Modify: lib/req_llm/streaming/finch_client.ex

  • Wrap Finch.stream/5 call with retry logic (lines 152-226)
  • Pass retry configuration from options or application config

Modify: lib/req_llm/provider/options.ex

  • Add streaming_max_retries and streaming_retry_delay to schema

Example Usage

# Application config
config :req_llm,
  streaming_max_retries: 3,
  streaming_retry_delay: 0

# Per-request config
{:ok, response} = ReqLLM.stream_text(
  model,
  messages,
  streaming_max_retries: 5,
  streaming_retry_delay: 100
)

References

Environment

  • ReqLLM version: 1.2.0
  • Issue occurs specifically during streaming operations (not non-streaming)
  • Affects all providers using Finch for streaming (Anthropic, OpenAI, etc.)

Additional Context

This issue becomes more critical with extended thinking models (Claude Sonnet 4.5 with thinking) where the model may spend 30+ seconds in the thinking phase without sending chunks, making idle connection timeouts more likely.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions