Skip to content

Feat/max error rate #171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 26 commits into
base: main
Choose a base branch
from
Open

Feat/max error rate #171

wants to merge 26 commits into from

Conversation

markVaykhansky
Copy link
Collaborator

@markVaykhansky markVaykhansky commented May 21, 2025

Addressing this issue:
#105

--max-error-rate flag was added.
It is only applicable in either or the the following scenarios:

  1. Fixed length dataset
  2. max_requests is set
  3. rate_type is constant and max_duration is set
  • If error rate is not applicable (i.e non of the above scenarios) we just ignore it (throwing an error would break sweep for example)
  • Implemented using multiprocessing.Event which is used to trigger a shutdown once we hit max_error_rate.
  • max_error_rate - added to report
  • error_rate in the final report is calculated as total errored / total requests finalized i.e we exclude incomplete requests

@markVaykhansky markVaykhansky self-assigned this May 21, 2025

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15167220831/artifacts/3170182036.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15167280131/artifacts/3170205369.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15180648310/artifacts/3176050662.
They will be retained for up to 30 days.

@markVaykhansky markVaykhansky marked this pull request as ready for review May 22, 2025 10:58
@markVaykhansky markVaykhansky requested a review from markurtz May 22, 2025 11:02

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15186822925/artifacts/3176853207.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15186908161/artifacts/3176882600.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15187752185/artifacts/3177193458.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15278204420/artifacts/3204756876.
They will be retained for up to 30 days.

Copy link
Member

@markurtz markurtz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few minor comments for the less intrusive changes. Also, please be sure to run tox -e style and tox -e types locally and check those are passing -- there are a bunch of styling pieces, such as where parentheses are placed, how new lines are escaped, etc that are inconsistent with the ruff styling setup. Running tox -e style locally applied several fixes and raised some errors that couldn't be resolved without either modifying the code or ignoring them where appropriate.

For implementation, I think defaulting always to use the shutdown event is a good pattern rather than relying on two signals: either None being placed on the queue or the shutdown event being set if the maximum error rate is reached. I'd like to incorporate that into this to reduce complexity.

Also, I think a single parameter, max_errors, would be a great extension where it can be either max_error_rate or an absolute number of errors that can occur, allowing us to support all benchmarking pathways. Along these lines, for ones where we don't know how many requests are going to run, or maybe in general, using a simple sample of the last X requests based on the precision needed could work here to ensure rate can be set for those.

Additionally, with the above, as well as some combination logic in asyncio, we should be able to simplify the changes, so we don't need to make significant changes to the resolve requests pathways and can cancel for any type. For example, we should be able to replace the root process runner to call into async loops and handle the shutdown event globally in a central place with something like the following, where ideally shutdown_poll_interval would grab from the global settings so users can adjust it:

    def run_process(
        self,
        type_: Literal["synchronous", "asynchronous"],
        requests_queue: multiprocessing.Queue,
        results_queue: multiprocessing.Queue,
        shutdown_event: multiprocessing.Event,
        shutdown_poll_interval: float,
        process_id: int,
        max_concurrency: int,
    ):
        async def _process_runner():
            if type_ == "synchronous":
                loop_task = asyncio.create_task(self.synchronous_loop(...))
            elif type_ == "asynchronous":
                loop_task = asyncio.create_task(self.asyncronous_loop(...))
            else:
                raise ValueError(f"Invalid process type: {type_}")
            
            shutdown_task = asyncio.create_task(
                self.wait_for_shutdown(shutdown_event, shutdown_poll_interval)
            )
            
            done, pending = await asyncio.wait(
                [
                    loop_task,
                    shutdown_task,
                ],
                return_when=asyncio.FIRST_EXCEPTION,
            )

            for task in pending:
                task.cancel()
                try:
                    await task
                except asyncio.CancelledError:
                    pass

            for task in done:
                if task.exception():
                    raise task.exception()  # need to check and ignore cancelations
        
        try:
            asyncio.run(_process_runner())
        except Exception as exc:  # noqa: BLE001
            logger.error(
                f"Error in worker process {process_id}: {exc}",
                exc_info=True,
                stack_info=True,
            )
        finally:
            shutdown_event.set()  # ensure shutdown event is set to stop other processes

    async def wait_for_shutdown(
        self, shutdown_event: multiprocessing.Event, shutdown_poll_interval: float
    ):
        while not shutdown_event.is_set():
            await asyncio.sleep(shutdown_poll_interval)

        raise asyncio.CancelledError("Shutdown event set, cancelling process loop.")
    
    async def synchronous_loop(self, ...):
        while True:
            process_request = await self.get_request(requests_queue)
            ...

    async def asyncronous_loop(self, ...):
        while True:
            process_request = await self.get_request(requests_queue)
            ...

@@ -625,12 +627,19 @@ def compile(self) -> GenerativeBenchmark:
request_start_time_targeted_delay_avg=self.requests_stats.request_start_time_targeted_delay.mean,
request_time_delay_avg=self.requests_stats.request_time_delay.mean,
request_time_avg=self.requests_stats.request_time.mean,
error_rate=error_rate,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that implementing this as a calculated field under the StatusBreakdown would make sense, but that would require some work in that class to start generating computed values based on the types (numerics, collections, and generics). Putting this note here in case you're interested in generalizing the solution a bit more, but not a blocker from my side

error_rate: float = Field(
description=(
"The number of errored requests divided by the number "
"of errored requests. This can be higher than max_error_rate "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"of errored requests. This can be higher than max_error_rate "
"of successful and errored requests. This can be higher than max_error_rate "

"The number of errored requests divided by the number "
"of errored requests. This can be higher than max_error_rate "
"(if applicable) cause it does not take into "
"account incomplete requests."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following this point, the error rate we calculate and compare to should never include incomplete, right?

@@ -159,6 +166,17 @@ async def run(
run_info,
)
if iter_result is not None:
if iter_result.request_info.errored \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: should use paranetheses rather than line escapes for if statements across multiple lines, tox -e style should fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants