Add support for "benchmarking scenarios" #99

sjmonson · 2025-03-19T18:17:05Z

This PR adds support for "scenarios" that allow specifying benchmark argument in a file / as a single Pydantic object. CLI argument defaults are loaded from the scenario object defaults to give benchmark-as-code users the same defaults as CLI. Argument values in the CLI follow the following precedence: Scenario (class defaults) < Scenario (CLI provided Scenario) < CLI Arguments.

Closes:

Base enablement to enable a dataset configuration file to be passed in rather than standard args, allow alias's for the folder structure #81

markurtz · 2025-05-21T15:58:49Z

@sjmonson I like the direction of this. A few quick thoughts from my side:

I think using Pydantic's built-in support for dotenv files along the lines of what we're doing for the general settings might work well here, and we can store the different scenarios then as just .env files and also enable potentially a default .env file: https://docs.pydantic.dev/latest/concepts/pydantic_settings/#dotenv-env-support
I'd like to see if we could get a unified setup for validation of values, types, and auto conversion that click is handling currently standardized through this, so all entry point pathways go through the same validations and fail early
It also would be great if we could get the Python API entrypoint and the CLI then pulling from the same defaults and able to support the same pathways

Not sure if there's something out there already to automatically handle the last two points, but if so, that would be a great inclusion

sjmonson · 2025-05-21T17:24:36Z

I think using Pydantic's built-in support for dotenv files along the lines of what we're doing for the general settings might work well here, and we can store the different scenarios then as just .env files and also enable potentially a default .env file: https://docs.pydantic.dev/latest/concepts/pydantic_settings/#dotenv-env-support

My plan was to rely on the base class json and yaml loaders since they are cleaner for nested structures, such as synthetic dataset args. I can definitely try pydantic-settings since that does have the advantage of allowing us to unify all GuideLLM options under one file.

I'd like to see if we could get a unified setup for validation of values, types, and auto conversion that click is handling currently standardized through this, so all entry point pathways go through the same validations and fail early

By default pydantic will attempt type coercion during the validation so its probably as simple as disabling the type handling in click and raising a click.BadParameter on model validation errors.

It also would be great if we could get the Python API entrypoint and the CLI then pulling from the same defaults and able to support the same pathways

They already do in this patch as I have set every click default to pull from the scenario model, unless you mean something else?

markurtz · 2025-05-21T19:12:42Z

@sjmonson, for the last one, yes, something else. Specifically the entrypoint for benchmarking Python API here: https://github.com/neuralmagic/guidellm/blob/main/src/guidellm/benchmark/entrypoints.py#L22. Towards @anmarques's previous issues with needing to set all argument values when not using the CLI. Also would help towards suporting both CLI and Python API for the scenarios

sjmonson · 2025-05-21T21:29:55Z

Ah I see. The workflow I imagined for @anmarques's use-case is to call the higher-level entrypoint benchmark_with_scenario since it handles the defaults on scenario instantiation so a call like the following would be the same as calling the CLI with only the required args:

result = await benchmark_with_scenario(
    GenerativeTextScenario(
        target="http://localhost:8000",
        data={
            "prompt_tokens": 128,
            "output_tokens": 128,
        },
        rate_type="sweep",
    ),
    output_path="output.json",
)

Copilot

Pull Request Overview

This pull request adds support for benchmarking scenarios by introducing a unified way to specify benchmark arguments via Pydantic objects and JSON/YAML scenario files, ensuring that CLI and code-based defaults remain consistent.

Introduces utility functions for JSON parsing and default option handling in CLI
Adds Pydantic helper methods to load configuration from files
Implements scenario-based configuration with new benchmark scenario files and updated entrypoints

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/guidellm/utils/cli.py	Added JSON parsing helper, default-setting function, and custom union parameter type
src/guidellm/objects/pydantic.py	Added get_default and from_file methods to streamline model instantiation
src/guidellm/benchmark/scenarios/rag.json	New benchmark scenario configuration in JSON
src/guidellm/benchmark/scenarios/chat.json	New benchmark scenario configuration in JSON
src/guidellm/benchmark/scenario.py	Introduced Scenario and GenerativeTextScenario classes along with a float-list parser
src/guidellm/benchmark/entrypoints.py	Added benchmark_with_scenario to enable scenario-driven execution
src/guidellm/main.py	Updated CLI to support scenario files and default values from scenario models

Comments suppressed due to low confidence (2)

src/guidellm/utils/cli.py:29

[nitpick] The class name 'Union' may be confused with the built-in union type operator used in type hints. Consider renaming it to a more explicit name (e.g. 'UnionParamType') for clarity.

class Union(click.ParamType):

src/guidellm/benchmark/scenario.py:28

[nitpick] The docstring for parse_float_list could be improved by mentioning that any whitespace around comma-separated numbers will not be trimmed automatically. Consider adding a note or trimming whitespace before conversion.

def parse_float_list(value: Union[str, float, list[float]]) -> list[float]:

sjmonson force-pushed the feat/scenarios branch from 81c35db to 91651b0 Compare March 19, 2025 18:55

sjmonson force-pushed the feat/scenarios branch from 614b288 to a0e8944 Compare May 1, 2025 19:17

sjmonson changed the title ~~Benchmarking scenarios~~ RFC: Benchmarking scenarios May 1, 2025

sjmonson force-pushed the feat/scenarios branch from a0e8944 to 0811fc5 Compare May 21, 2025 17:52

sjmonson force-pushed the feat/scenarios branch from ce4c73b to 099f4e5 Compare May 22, 2025 17:43

sjmonson changed the title ~~RFC: Benchmarking scenarios~~ Add support for "benchmarking scenarios" May 22, 2025

sjmonson marked this pull request as ready for review May 22, 2025 19:49

sjmonson added 16 commits June 6, 2025 11:29

Second Scenario implmentation attempt

206fc79

Fix pydantic model parsing issues

06e6ff0

Move CLI and output options out of scenario

c11cc7f

Drop int type from rate

2411d4e

Handle reading scenario from file in factory method

603a1f8

Handle required arg parsing with pydantic

c896507

Move rate string parsing into scenario

ef51463

Annotate scenario number fields with bounds checks

511e9f2

Add a helper method to get scenario defaults

c8db1be

Move scenario helper methods to base pydantic class

bc937b5

Move cli helpers to separate file and add click Union type back

cf31319

Properly detect if scenario is a Path

257cc9e

Set defaults for console output

aa6dc13

Add builtin scenarios

f3d9d06

Documentation pass

0ecf7b1

Add default scenarios

15fd7f5

sjmonson force-pushed the feat/scenarios branch from 112d987 to 15fd7f5 Compare June 6, 2025 15:32

Allow CLI arguments to be set through ENV

57468c1

sjmonson requested a review from markurtz June 9, 2025 17:35

Switch to assuming everything is yaml

5e484c4

markurtz requested a review from Copilot June 17, 2025 12:45

Copilot AI reviewed Jun 17, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for "benchmarking scenarios" #99

Add support for "benchmarking scenarios" #99

Uh oh!

sjmonson commented Mar 19, 2025 •

edited

Loading

Uh oh!

markurtz commented May 21, 2025

Uh oh!

sjmonson commented May 21, 2025

Uh oh!

markurtz commented May 21, 2025

Uh oh!

sjmonson commented May 21, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Add support for "benchmarking scenarios" #99

Are you sure you want to change the base?

Add support for "benchmarking scenarios" #99

Uh oh!

Conversation

sjmonson commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markurtz commented May 21, 2025

Uh oh!

sjmonson commented May 21, 2025

Uh oh!

markurtz commented May 21, 2025

Uh oh!

sjmonson commented May 21, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

sjmonson commented Mar 19, 2025 •

edited

Loading