Skip to content

Add new benchmark mode to search for peak goodput under an SLO #197

Open
@dagrayvid

Description

@dagrayvid

Often when we benchmark a new model or hardware, the goal is to determine the max RPS or tokens per second that the server can sustain under a certain SLO. We should add a new feature similar to the "sweep" but instead of doing linearly spaced constant RPS runs, it should do something like a binary search to try to find the peak load which the server can handle while meeting a defined latency SLO.

We would need to support some config options for the SLO, to support p99 or p95 ITL and TTFT.

I have a rough PoC of this in progress on this branch: https://github.com/dagrayvid/guidellm/tree/goodput, but wanted to open this issue to discuss the idea further and track progress.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestfeatureFeature addition with a set PRD for the release

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions