-
Notifications
You must be signed in to change notification settings - Fork 52
Add end-to-end testing using llm-d-inference-sim #294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add end-to-end testing using llm-d-inference-sim #294
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: diamondburned The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
506fce3 to
1dbb21d
Compare
|
Commit diamondburned@1dbb21d successful run: https://github.com/diamondburned/inference-perf/actions/runs/19698009393 |
702d94f to
122d40f
Compare
also made run_benchmark_minimal more resilient by utilizing process groups for cleaning up, which ensures that all forked multiprocessing workers are also killed off when done.
122d40f to
e378e20
Compare
this commit adds a Nix flake [1] into the repository. it contains several notable things: - llm-d-inference-sim as a Nix package - inference-perf as a Nix package - a venv-enabled dev shell containing python, pdm, pyright and llm-d-inference-sim if Nix is present in your environment, you may enter the dev shell using `nix develop` (requires flakes to be enabled). the GitHub Actions uses this file to install the necessary dependencies to run end-to-end tests. [1]: https://wiki.nixos.org/wiki/Flakes
run with the same `pdm run test:e2e`. this requires `llm-d-inference-sim` to be present in the local environment. see the module docstring for more information. the `e2e_test-on-change.yml` workflow has been updated to run on all `push` and `pull_request` events, not just this one.
this commit adds a new `Dockerfile.e2e-test` just for running the end-to-end tests inside a Docker container. This is useful if you don't have Nix in your local environment.
e378e20 to
70bb77a
Compare
tl;dr: run the added tests within Docker using
pdm run e2e:test:dockerto get started.This PR adds additional end-to-end testing of inference-perf using llm-d-inference-sim. The following commits make up this addition:
convert e2e run_benchmark_minimal to async, which converts the existing inference-perf tests to be asynchronous. This will make lifecycle management of llm-d-inference-sim easier later on.
add Nix flake, which adds inference-perf and llm-d-inference-sim packaging for Nix and NixOS, which will be used in both opted-in local environments, the CI/CD environment, and the local Docker testing environment (added below). The flake declares the package formulas for compiling both libtokenizers, an llm-d-inference-sim dependency, and llm-d-inference-sim itself, from source.
Some irregular maintenance on the Nix side is introduced as a result of this addition. This will be easier with the new Docker image introduced below.
add test_llm_d_inference_sim end-to-end testing, which actually adds end-to-end testing infrastructure for llm-d-inference-sim inside
e2e/tests/test_llm_d_inference_sim.py.The test works by introducing a new
LLMDInferenceSimRunnerclass, which is responsible for running the llm-d-inference-sim executable and managing its lifecycle, similar to the existingrun_benchmark_minimalcode, although it won't manage the installation/compiling part.In order for inference-perf to work without a Hugging Face token, the test also bundles the
google/gemma-3-270mtokenizer model as a.tar.gzfile. This way, the CI won't need the token to download a model to run on. In the future, we can decide to give it a token, but for now, this adds a 5MB archive file into the repository.For now, this commit comes with a very basic set of inference-perf configurations and simple assertions inside a single test to make sure that at least some requests were actually done. This test is automatically skipped if
LLMDInferenceSimRunnerdetects no llm-d-inference-sim in the local environment to prevent test result regressions.This commit also changed the current end-to-end GitHub workflow to use the Nix environment directly, which contains both pdm, llm-d-inference-sim, and other needed tools, rather than manually installing pdm through the
setup-pdmaction. This ensures that the package versions inside the CI environment matches those in the Nix (and therefore Docker) environment used locally. The workflow will also upload atest_e2e.outfile at the end of its testing as an artifact for manual inspection of test outputs as needed.add
pdm run e2e:test:docker, which adds aDockerfile.e2e-testthat builds the local Nix environment and runs the end-to-end test suite (by default).pdmscripts are added to help using this Docker image:e2e:test:dockeritself is an alias todocker:e2e-test:run.docker:e2e-test:build, which builds the Docker image locally using Buildkit.docker:e2e-test:run, which builds and runs the Docker image locally in one step.To make maintenance easier,
docker:e2e-test:runaccepts trailing arguments to override the command that it will run. For example:pdm run docker:e2e-test:run -- pdm run test:e2e -o log_cli_level=DEBUGwill run all tests with debug logging enabled.pdm run docker:e2e-test:run -- nix flake info path:///workspacewill show a summary of the local Nix environment within Docker.For now, building the Docker image locally is easy enough that there isn't be a GitHub workflow to upload the Docker image onto an upstream Docker registry. This will be nice in the future, but it's not necessary for now. The first build will take a couple of minutes. After that, changing either the inference-perf code or the end-to-end test code will not cause a slow image rebuild, but changing
pyproject.tomlwill.