Skip to content

Async Executor/Runner slows to a halt with jobs that auto-retry with default (high) max_wait #642

Closed
@joy13975

Description

@joy13975

Describe the bug
Anything that a) has a throttle limit and b) uses Executor (and in turn, Runner), such as OpenAI's API, will slow to a halt if too many jobs are requested.

Specifically, in my case I was attempting to generate testset for ~1k documents.

testset = generator.generate_with_langchain_docs(documents, test_size=N, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

This refuses to even start if either there are too many documents (roughly >300) or too high test_size. Example with 500 docs stuck at 0% after 20min:
image

Reducing test_size didn't help with high document count because the place that got stuck was at docstore.add_documents. However, reducing both document count and test_size to < 100 did get it going at reasonable speed.

Ragas version: 0.1.2.dev8+gc18c7f4
Python version: 3.9.13

Code to Reproduce
Below is just one example that uses Executor & Runner with higher job count (i.e. 1,000). Document contens averaged ~800 characters. OpenAI API is being used as LLM.

testset = generator.generate_with_langchain_docs(documents, test_size=N, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

Error trace
No error, just gets stuck.

Expected behavior
Don't get stuck with high job count.

Additional context
Will add my findings in comments below. Related: #394

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions