Add new benchmark mode to search for peak goodput under an SLO

Often when we benchmark a new model or hardware, the goal is to determine the max RPS or tokens per second that the server can sustain under a certain SLO. We should add a new feature similar to the "sweep" but instead of doing linearly spaced constant RPS runs, it should do something like a binary search to try to find the peak load which the server can handle while meeting a defined latency SLO. 

We would need to support some config options for the SLO, to support p99 or p95 ITL and TTFT. 

I have a rough PoC of this in progress on this branch: https://github.com/dagrayvid/guidellm/tree/goodput, but wanted to open this issue to discuss the idea further and track progress.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add new benchmark mode to search for peak goodput under an SLO #197

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add new benchmark mode to search for peak goodput under an SLO #197

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions