Skip to content

More resilient LLM retries (auto-retry on transient failures) #284

@0xallam

Description

@0xallam

During long runs, Strix sometimes fails to connect to the LLM backend (e.g. Ollama hosted on another server). When this happens, the whole run can stall/halt until manual intervention.

We already have retry logic in llm.py (exponential backoff), but it needs to be more robust for real-world flaky connections: longer retry windows, smarter retry conditions (network timeouts / connection refused).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    Status

    Bugs

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions