Subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished #56

llsj14 · 2024-06-18T16:57:25Z

Hello,

I've encountered an issue where the request launcher does not allow the next requests to be sent until all requests specified by num_concurrent_requests have finished.

This behavior seems counterintuitive for benchmarking TTFT and throughput in Continuous Batching systems accurately, as it can block subsequent requests even when the serving system is capable of handling them.

To address this, I believe the get_next_ready function should be modified as follows, enabling it to return results as soon as each individual request is completed:

--- a/src/llmperf/requests_launcher.py
+++ b/src/llmperf/requests_launcher.py
@@ -40,6 +40,7 @@ class RequestsLauncher:
         if not block:
             while self._llm_client_pool.has_next():
                 results.append(self._llm_client_pool.get_next_unordered())
+                return results
         else:
             while not self._llm_client_pool.has_next():
                 pass

I am prepared to submit a pull request with this change and would appreciate your feedback.

Thank you.

The text was updated successfully, but these errors were encountered:

llsj14 · 2024-06-22T07:28:35Z

I found that the previous revision isn't sufficient to solve the problem. To send a request asynchronously right after the previous one finishes, many parts need fixing. I attempted to make the get_next_ready function asynchronous, but it depends on ray.util.get_next_unordered(). Converting it to an asynchronous function is challenging due to its current blocking implementation.

Here is the link to the relevant code: ray/util/actor_pool.py lines 311-326.

I think there are two potential approaches for change:

Convert the get_next_ready function into an asynchronous function.
Enable clients to operate independently using multi-threading or multi-processing, while using current get_next_ready implementation.

ashutoshsaboo · 2024-06-27T06:51:48Z

Hey @llsj14 I'm facing the same issue - without issuing concurrent requests at a set rate, it's no longer a proper load testing framework, do you have plans to fix this?

llsj14 · 2024-07-01T08:22:30Z

Hello @ashutoshsaboo,

I've made some changes to the load testing code to continuously send requests without waiting for all "num_concurrent_requests" to finish. Since modifying the core part related to Ray was challenging, I used multiple threads and request launchers, each holding a single client.

Code branch:
https://github.com/llsj14/llmperf/tree/fix/get-next-ready

Commit:
llsj14@d75db69

ashutoshsaboo · 2024-07-01T08:36:15Z

@llsj14 Saw your branch code, I'll test it. Can you please rebase your branch with the latest commit in this repo, your branch seems to be one commit behind from the main branch in this project?

I think the latest commit missing from your branch - llsj14/llmperf@main...ray-project:llmperf:main - is useful for the dynamic prompts that it generates.

llsj14 · 2024-07-03T16:14:12Z

@ashutoshsaboo
Thank you for letting me know. I have rebased my code.

This was referenced Jun 18, 2024

Blocking on pending requests despite block == false #43

Open

fix: subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished in non-block mode #57

Closed

llsj14 closed this as completed Jun 22, 2024

llsj14 reopened this Jun 22, 2024

llsj14 mentioned this issue Jul 1, 2024

fix: subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished in non-block mode #59

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished #56

Subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished #56

llsj14 commented Jun 18, 2024 •

edited

Loading

llsj14 commented Jun 22, 2024 •

edited

Loading

ashutoshsaboo commented Jun 27, 2024 •

edited

Loading

llsj14 commented Jul 1, 2024

ashutoshsaboo commented Jul 1, 2024 •

edited

Loading

llsj14 commented Jul 3, 2024

Subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished #56

Subsequent requests cannot be sent until 'num_concurrent_requests' requests have all finished #56

Comments

llsj14 commented Jun 18, 2024 • edited Loading

llsj14 commented Jun 22, 2024 • edited Loading

ashutoshsaboo commented Jun 27, 2024 • edited Loading

llsj14 commented Jul 1, 2024

ashutoshsaboo commented Jul 1, 2024 • edited Loading

llsj14 commented Jul 3, 2024

llsj14 commented Jun 18, 2024 •

edited

Loading

llsj14 commented Jun 22, 2024 •

edited

Loading

ashutoshsaboo commented Jun 27, 2024 •

edited

Loading

ashutoshsaboo commented Jul 1, 2024 •

edited

Loading