Skip to content

Async Executor/Runner slows to a halt with jobs that auto-retry with default (high) max_wait #642

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
joy13975 opened this issue Feb 20, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@joy13975
Copy link
Contributor

joy13975 commented Feb 20, 2024

Describe the bug
Anything that a) has a throttle limit and b) uses Executor (and in turn, Runner), such as OpenAI's API, will slow to a halt if too many jobs are requested.

Specifically, in my case I was attempting to generate testset for ~1k documents.

testset = generator.generate_with_langchain_docs(documents, test_size=N, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

This refuses to even start if either there are too many documents (roughly >300) or too high test_size. Example with 500 docs stuck at 0% after 20min:
image

Reducing test_size didn't help with high document count because the place that got stuck was at docstore.add_documents. However, reducing both document count and test_size to < 100 did get it going at reasonable speed.

Ragas version: 0.1.2.dev8+gc18c7f4
Python version: 3.9.13

Code to Reproduce
Below is just one example that uses Executor & Runner with higher job count (i.e. 1,000). Document contens averaged ~800 characters. OpenAI API is being used as LLM.

testset = generator.generate_with_langchain_docs(documents, test_size=N, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

Error trace
No error, just gets stuck.

Expected behavior
Don't get stuck with high job count.

Additional context
Will add my findings in comments below. Related: #394

@joy13975 joy13975 added the bug Something isn't working label Feb 20, 2024
@joy13975
Copy link
Contributor Author

joy13975 commented Feb 20, 2024

Problem Investigation
The Executor async job code was initially introduced by bad3a8e and later separated into Runner. The problem is that Runner starts all jobs at once (here) and then later uses a asyncio.as_completed() (here) to wait for all jobs to be completed, but doesn't impose any max concurrency.

When too many jobs (i.e. >hundreds) are calling the same API, a throughput limit is hit and many of the jobs need to wait & retry. With the default RunConfig, wait time can be up to 60s 90s. I suspect this is the culrprit of the stuckness.

Proposal
The Executor/Runner should facilitate a maximum concurrency limit so as to avoid kicking all the remaining (and possibly on-the-fly?) requests into wait & retry loop once throughput limit gets hit. Even if that's not a good reason (for the admittedly lack of tracing down the actual reason of getting stuck), it's still a good knob to have for users who don't want to be constantly hitting throughput limits.

PR created: #643 please review.

@joy13975
Copy link
Contributor Author

In case this doesn't get attention, excuse my mention @jjmachan

@wangyiran33
Copy link

Problem Investigation The Executor async job code was initially introduced by bad3a8e and later separated into Runner. The problem is that Runner starts all jobs at once (here) and then later uses a asyncio.as_completed() (here) to wait for all jobs to be completed, but doesn't impose any max concurrency.

When too many jobs (hundreds) are calling the same API, a throughput limit is hit and many of the jobs need to wait & retry. With the default RunConfig, wait time can be up to 60s. I suspect this is the culrprit of the stuckness.

Proposal The Executor/Runner should facilitate a maximum concurrency limit so as to avoid kicking all the remaining requests (once throughput limit gets hit) into wait & retry loop.

PR created: #643 please review.

same problem, when the size of document chunks > 500 the program will probably stuck. appreciate your PR.

@joy13975 joy13975 changed the title Async Executor/Runner slows to a halt with jobs that auto-retry config with default (high) max_wait Async Executor/Runner slows to a halt with jobs that auto-retry with default (high) max_wait Feb 21, 2024
@joy13975
Copy link
Contributor Author

#486 seems to have added max_workers to the evaluate function but did not reference it. @jjmachan correct me if I'm wrong.

@jjmachan
Copy link
Member

hey @joy13975 really sorry I have not been able to get to this - but for the last commend you are right. max_workers are an old arg we dont' use

as for your PR - really really appreciate it ❤️ , will get it reviewed both and respond tonight

thanks again for your help bringing this up 🙌🏽

jjmachan added a commit that referenced this issue Feb 28, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
**Added optional Semaphore-based concurrency control for #642** 
As for the default value for `max_concurrency`, I don't know the ratio
of API users vs. local LLM users, so the proposed default is an
opinionated value of `16`
* I *think* more people use OpenAI API for now vs. local LLMs, thus
default is not `-1` (no limit)
* `16` seems to be reasonably fast and doesn't seem to hit throughput
limit in my experience

**Tests**
Embedding for 1k documents finished in <2min and subsequent Testset
generation for `test_size=1000` proceeding without getting stuck:
<img width="693" alt="image"
src="https://github.com/explodinggradients/ragas/assets/6729737/d83fecc8-a815-43ee-a3b0-3395d7a9d244">

another 30s passes:
<img width="725" alt="image"
src="https://github.com/explodinggradients/ragas/assets/6729737/d4ab08ba-5a79-45f6-84b1-e563f107d682">

---------

Co-authored-by: Jithin James <[email protected]>
@jjmachan
Copy link
Member

thanks to @joy13975 we have fixed this guys 🙂

joy13975 added a commit to joy13975/ragas that referenced this issue Feb 28, 2024
**Added optional Semaphore-based concurrency control for explodinggradients#642** 
As for the default value for `max_concurrency`, I don't know the ratio
of API users vs. local LLM users, so the proposed default is an
opinionated value of `16`
* I *think* more people use OpenAI API for now vs. local LLMs, thus
default is not `-1` (no limit)
* `16` seems to be reasonably fast and doesn't seem to hit throughput
limit in my experience

**Tests**
Embedding for 1k documents finished in <2min and subsequent Testset
generation for `test_size=1000` proceeding without getting stuck:
<img width="693" alt="image"
src="https://github.com/explodinggradients/ragas/assets/6729737/d83fecc8-a815-43ee-a3b0-3395d7a9d244">

another 30s passes:
<img width="725" alt="image"
src="https://github.com/explodinggradients/ragas/assets/6729737/d4ab08ba-5a79-45f6-84b1-e563f107d682">

---------

Co-authored-by: Jithin James <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants