Add end-to-end testing using llm-d-inference-sim #294

diamondburned · 2025-11-26T04:03:29Z

tl;dr: run the added tests within Docker using pdm run e2e:test:docker to get started.

This PR adds additional end-to-end testing of inference-perf using llm-d-inference-sim. The following commits make up this addition:

convert e2e run_benchmark_minimal to async, which converts the existing inference-perf tests to be asynchronous. This will make lifecycle management of llm-d-inference-sim easier later on.
add Nix flake, which adds inference-perf and llm-d-inference-sim packaging for Nix and NixOS, which will be used in both opted-in local environments, the CI/CD environment, and the local Docker testing environment (added below). The flake declares the package formulas for compiling both libtokenizers, an llm-d-inference-sim dependency, and llm-d-inference-sim itself, from source.

Some irregular maintenance on the Nix side is introduced as a result of this addition. This will be easier with the new Docker image introduced below.
add test_llm_d_inference_sim end-to-end testing, which actually adds end-to-end testing infrastructure for llm-d-inference-sim inside e2e/tests/test_llm_d_inference_sim.py.

The test works by introducing a new LLMDInferenceSimRunner class, which is responsible for running the llm-d-inference-sim executable and managing its lifecycle, similar to the existing run_benchmark_minimal code, although it won't manage the installation/compiling part.

In order for inference-perf to work without a Hugging Face token, the test also bundles the google/gemma-3-270m tokenizer model as a .tar.gz file. This way, the CI won't need the token to download a model to run on. In the future, we can decide to give it a token, but for now, this adds a 5MB archive file into the repository.

For now, this commit comes with a very basic set of inference-perf configurations and simple assertions inside a single test to make sure that at least some requests were actually done. This test is automatically skipped if LLMDInferenceSimRunner detects no llm-d-inference-sim in the local environment to prevent test result regressions.

This commit also changed the current end-to-end GitHub workflow to use the Nix environment directly, which contains both pdm, llm-d-inference-sim, and other needed tools, rather than manually installing pdm through the setup-pdm action. This ensures that the package versions inside the CI environment matches those in the Nix (and therefore Docker) environment used locally. The workflow will also upload a test_e2e.out file at the end of its testing as an artifact for manual inspection of test outputs as needed.
add pdm run e2e:test:docker, which adds a Dockerfile.e2e-test that builds the local Nix environment and runs the end-to-end test suite (by default).

pdm scripts are added to help using this Docker image:
- e2e:test:docker itself is an alias to docker:e2e-test:run.
- docker:e2e-test:build, which builds the Docker image locally using Buildkit.
- docker:e2e-test:run, which builds and runs the Docker image locally in one step.
To make maintenance easier, docker:e2e-test:run accepts trailing arguments to override the command that it will run. For example:
- pdm run docker:e2e-test:run -- pdm run test:e2e -o log_cli_level=DEBUG will run all tests with debug logging enabled.
- pdm run docker:e2e-test:run -- nix flake info path:///workspace will show a summary of the local Nix environment within Docker.
For now, building the Docker image locally is easy enough that there isn't be a GitHub workflow to upload the Docker image onto an upstream Docker registry. This will be nice in the future, but it's not necessary for now. The first build will take a couple of minutes. After that, changing either the inference-perf code or the end-to-end test code will not cause a slow image rebuild, but changing pyproject.toml will.

k8s-ci-robot · 2025-11-26T04:03:35Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: diamondburned
Once this PR has been reviewed and has the lgtm label, please assign sergeykanzhelev for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

diamondburned · 2025-11-26T09:17:28Z

Commit diamondburned@1dbb21d successful run: https://github.com/diamondburned/inference-perf/actions/runs/19698009393

also made run_benchmark_minimal more resilient by utilizing process groups for cleaning up, which ensures that all forked multiprocessing workers are also killed off when done.

this commit adds a Nix flake [1] into the repository. it contains several notable things: - llm-d-inference-sim as a Nix package - inference-perf as a Nix package - a venv-enabled dev shell containing python, pdm, pyright and llm-d-inference-sim if Nix is present in your environment, you may enter the dev shell using `nix develop` (requires flakes to be enabled). the GitHub Actions uses this file to install the necessary dependencies to run end-to-end tests. [1]: https://wiki.nixos.org/wiki/Flakes

run with the same `pdm run test:e2e`. this requires `llm-d-inference-sim` to be present in the local environment. see the module docstring for more information. the `e2e_test-on-change.yml` workflow has been updated to run on all `push` and `pull_request` events, not just this one.

this commit adds a new `Dockerfile.e2e-test` just for running the end-to-end tests inside a Docker container. This is useful if you don't have Nix in your local environment.

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 26, 2025

k8s-ci-robot requested review from ArangoGutierrez and Bslabe123 November 26, 2025 04:03

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 26, 2025

diamondburned force-pushed the llm-d-inference-sim-tests branch 11 times, most recently from 506fce3 to 1dbb21d Compare November 26, 2025 08:57

diamondburned force-pushed the llm-d-inference-sim-tests branch 2 times, most recently from 702d94f to 122d40f Compare November 26, 2025 22:54

convert e2e run_benchmark_minimal to async

66a9610

also made run_benchmark_minimal more resilient by utilizing process groups for cleaning up, which ensures that all forked multiprocessing workers are also killed off when done.

diamondburned force-pushed the llm-d-inference-sim-tests branch from 122d40f to e378e20 Compare December 1, 2025 19:01

diamondburned added 3 commits December 1, 2025 14:35

add pdm run e2e:test:docker

70bb77a

this commit adds a new `Dockerfile.e2e-test` just for running the end-to-end tests inside a Docker container. This is useful if you don't have Nix in your local environment.

diamondburned force-pushed the llm-d-inference-sim-tests branch from e378e20 to 70bb77a Compare December 1, 2025 22:41

diamondburned marked this pull request as ready for review December 1, 2025 22:50

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 1, 2025

k8s-ci-robot requested a review from SachinVarghese December 1, 2025 22:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add end-to-end testing using llm-d-inference-sim #294

Add end-to-end testing using llm-d-inference-sim #294

diamondburned commented Nov 26, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Nov 26, 2025

Uh oh!

diamondburned commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add end-to-end testing using llm-d-inference-sim #294

Are you sure you want to change the base?

Add end-to-end testing using llm-d-inference-sim #294

Conversation

diamondburned commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Nov 26, 2025

Uh oh!

diamondburned commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

diamondburned commented Nov 26, 2025 •

edited

Loading