Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished #56

Open
llsj14 opened this issue Jun 18, 2024 · 5 comments

Comments

@llsj14
Copy link

llsj14 commented Jun 18, 2024

Hello,

I've encountered an issue where the request launcher does not allow the next requests to be sent until all requests specified by num_concurrent_requests have finished.

This behavior seems counterintuitive for benchmarking TTFT and throughput in Continuous Batching systems accurately, as it can block subsequent requests even when the serving system is capable of handling them.

To address this, I believe the get_next_ready function should be modified as follows, enabling it to return results as soon as each individual request is completed:

--- a/src/llmperf/requests_launcher.py
+++ b/src/llmperf/requests_launcher.py
@@ -40,6 +40,7 @@ class RequestsLauncher:
         if not block:
             while self._llm_client_pool.has_next():
                 results.append(self._llm_client_pool.get_next_unordered())
+                return results
         else:
             while not self._llm_client_pool.has_next():
                 pass

I am prepared to submit a pull request with this change and would appreciate your feedback.

Thank you.

@llsj14
Copy link
Author

llsj14 commented Jun 22, 2024

I found that the previous revision isn't sufficient to solve the problem. To send a request asynchronously right after the previous one finishes, many parts need fixing. I attempted to make the get_next_ready function asynchronous, but it depends on ray.util.get_next_unordered(). Converting it to an asynchronous function is challenging due to its current blocking implementation.

Here is the link to the relevant code: ray/util/actor_pool.py lines 311-326.

I think there are two potential approaches for change:

  • Convert the get_next_ready function into an asynchronous function.
  • Enable clients to operate independently using multi-threading or multi-processing, while using current get_next_ready implementation.

@llsj14 llsj14 closed this as completed Jun 22, 2024
@llsj14 llsj14 reopened this Jun 22, 2024
@ashutoshsaboo
Copy link

ashutoshsaboo commented Jun 27, 2024

Hey @llsj14 I'm facing the same issue - without issuing concurrent requests at a set rate, it's no longer a proper load testing framework, do you have plans to fix this?

@llsj14
Copy link
Author

llsj14 commented Jul 1, 2024

Hello @ashutoshsaboo,

I've made some changes to the load testing code to continuously send requests without waiting for all "num_concurrent_requests" to finish. Since modifying the core part related to Ray was challenging, I used multiple threads and request launchers, each holding a single client.

Code branch:
https://github.com/llsj14/llmperf/tree/fix/get-next-ready

Commit:
llsj14@d75db69

@ashutoshsaboo
Copy link

ashutoshsaboo commented Jul 1, 2024

@llsj14 Saw your branch code, I'll test it. Can you please rebase your branch with the latest commit in this repo, your branch seems to be one commit behind from the main branch in this project?

I think the latest commit missing from your branch - llsj14/llmperf@main...ray-project:llmperf:main - is useful for the dynamic prompts that it generates.

@llsj14
Copy link
Author

llsj14 commented Jul 3, 2024

@ashutoshsaboo
Thank you for letting me know. I have rebased my code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants