Skip to content

Conversation

@diamondburned
Copy link
Contributor

@diamondburned diamondburned commented Nov 26, 2025

tl;dr: run the added tests within Docker using pdm run e2e:test:docker to get started.


This PR adds additional end-to-end testing of inference-perf using llm-d-inference-sim. The following commits make up this addition:

  • convert e2e run_benchmark_minimal to async, which converts the existing inference-perf tests to be asynchronous. This will make lifecycle management of llm-d-inference-sim easier later on.

  • add Nix flake, which adds inference-perf and llm-d-inference-sim packaging for Nix and NixOS, which will be used in both opted-in local environments, the CI/CD environment, and the local Docker testing environment (added below). The flake declares the package formulas for compiling both libtokenizers, an llm-d-inference-sim dependency, and llm-d-inference-sim itself, from source.

    Some irregular maintenance on the Nix side is introduced as a result of this addition. This will be easier with the new Docker image introduced below.

  • add test_llm_d_inference_sim end-to-end testing, which actually adds end-to-end testing infrastructure for llm-d-inference-sim inside e2e/tests/test_llm_d_inference_sim.py.

    The test works by introducing a new LLMDInferenceSimRunner class, which is responsible for running the llm-d-inference-sim executable and managing its lifecycle, similar to the existing run_benchmark_minimal code, although it won't manage the installation/compiling part.

    In order for inference-perf to work without a Hugging Face token, the test also bundles the google/gemma-3-270m tokenizer model as a .tar.gz file. This way, the CI won't need the token to download a model to run on. In the future, we can decide to give it a token, but for now, this adds a 5MB archive file into the repository.

    For now, this commit comes with a very basic set of inference-perf configurations and simple assertions inside a single test to make sure that at least some requests were actually done. This test is automatically skipped if LLMDInferenceSimRunner detects no llm-d-inference-sim in the local environment to prevent test result regressions.

    This commit also changed the current end-to-end GitHub workflow to use the Nix environment directly, which contains both pdm, llm-d-inference-sim, and other needed tools, rather than manually installing pdm through the setup-pdm action. This ensures that the package versions inside the CI environment matches those in the Nix (and therefore Docker) environment used locally. The workflow will also upload a test_e2e.out file at the end of its testing as an artifact for manual inspection of test outputs as needed.

  • add pdm run e2e:test:docker, which adds a Dockerfile.e2e-test that builds the local Nix environment and runs the end-to-end test suite (by default).

    pdm scripts are added to help using this Docker image:

    • e2e:test:docker itself is an alias to docker:e2e-test:run.
    • docker:e2e-test:build, which builds the Docker image locally using Buildkit.
    • docker:e2e-test:run, which builds and runs the Docker image locally in one step.

    To make maintenance easier, docker:e2e-test:run accepts trailing arguments to override the command that it will run. For example:

    • pdm run docker:e2e-test:run -- pdm run test:e2e -o log_cli_level=DEBUG will run all tests with debug logging enabled.
    • pdm run docker:e2e-test:run -- nix flake info path:///workspace will show a summary of the local Nix environment within Docker.

    For now, building the Docker image locally is easy enough that there isn't be a GitHub workflow to upload the Docker image onto an upstream Docker registry. This will be nice in the future, but it's not necessary for now. The first build will take a couple of minutes. After that, changing either the inference-perf code or the end-to-end test code will not cause a slow image rebuild, but changing pyproject.toml will.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 26, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: diamondburned
Once this PR has been reviewed and has the lgtm label, please assign sergeykanzhelev for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 26, 2025
@diamondburned diamondburned force-pushed the llm-d-inference-sim-tests branch 11 times, most recently from 506fce3 to 1dbb21d Compare November 26, 2025 08:57
@diamondburned
Copy link
Contributor Author

@diamondburned diamondburned force-pushed the llm-d-inference-sim-tests branch 2 times, most recently from 702d94f to 122d40f Compare November 26, 2025 22:54
also made run_benchmark_minimal more resilient by utilizing process
groups for cleaning up, which ensures that all forked multiprocessing
workers are also killed off when done.
@diamondburned diamondburned force-pushed the llm-d-inference-sim-tests branch from 122d40f to e378e20 Compare December 1, 2025 19:01
this commit adds a Nix flake [1] into the repository. it contains
several notable things:

- llm-d-inference-sim as a Nix package
- inference-perf as a Nix package
- a venv-enabled dev shell containing python, pdm, pyright and
  llm-d-inference-sim

if Nix is present in your environment, you may enter the dev shell using
`nix develop` (requires flakes to be enabled).

the GitHub Actions uses this file to install the necessary dependencies
to run end-to-end tests.

[1]: https://wiki.nixos.org/wiki/Flakes
run with the same `pdm run test:e2e`.

this requires `llm-d-inference-sim` to be present in the local
environment. see the module docstring for more information.

the `e2e_test-on-change.yml` workflow has been updated to run on all
`push` and `pull_request` events, not just this one.
this commit adds a new `Dockerfile.e2e-test` just for running the
end-to-end tests inside a Docker container. This is useful if you don't
have Nix in your local environment.
@diamondburned diamondburned force-pushed the llm-d-inference-sim-tests branch from e378e20 to 70bb77a Compare December 1, 2025 22:41
@diamondburned diamondburned marked this pull request as ready for review December 1, 2025 22:50
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants