GuideLLM v0.3.1

Overview

Minor release focused on container build/tagging stability, UI polish and terminology alignment, improved OpenAI backend robustness/configurability, clearer JSON output, and new documentation (llama.cpp usage and a vLLM simulator walkthrough). Workflows now produce versioned artifacts and maintain latest/stable tags automatically.

To get started, install with:

pip install guidellm[recommended]==0.3.1

Or from source with:

pip install 'guidellm[recommended] @ git+https://github.com/vllm-project/guidellm.git'@v0.3.1

What's New

Recommended Extras Group: Install OpenAI tokenizer dependencies via guidellm[recommended] (tiktoken, blobfile)
llama.cpp Guide: New docs covering llama-server, model aliasing, and metadata handling
vLLM Simulator Example: Step-by-step “first benchmark” walkthrough with sample output images
Container Maintenance Workflow: Scheduled cleanup of old PR images; auto-retag latest and stable

What's Changed

UI Polish: Clearer labels (e.g., “Time Per Request”, “Measured RPS (Mean)”) and slider text
Versioned Reports: PROD/STAGING report URLs pinned to versioned UI builds
Container Build System: New top-level Containerfile using Fedora Python minimal + PDM; build type via GUIDELLM_BUILD_TYPE
Metrics JSON Output: UTF-8 encoding with pretty-printed, indented JSON
Endpoint Max Tokens Keys: Output-token limit now governed per-endpoint via GUIDELLM__OPENAI__MAX_OUTPUT_KEY

What's Fixed

Streaming Robustness: Safely handle missing delta.content for chat streams
Endpoint Token Keys: Configurable max output key per endpoint (max_tokens vs max_completion_tokens)
CI Stability: Fixes to RC tagging, GH Pages publish paths, and workflow typos; disable dry-run for image cleanup

Compatibility Notes

Python: 3.9–3.13
OS: Linux and MacOS
Dependencies: Optional extras via guidellm[recommended]; currently includes packages for OpenAI's tokenizer but may expand in the future
Breaking: Previously all endpoints used both max_tokens and max_completion_tokens to bound output; this caused issues with some servers
- The key is now controlled per-endpoint (defaults to max_tokens for legacy completions and max_completion_tokens for chat/completions)

New Contributors

@rgerganov: made their first contribution in PR #318
@git-jxj: made their first contribution in PR #316
@psydok: made their first contribution in PR #372

Changelog

UI & Presentation
- #386: Update TPOT to ITL across labels and code
- #298: Update RPS slider label
- #301: Fix GH Pages UI publish path (src/ui/out)
- #317: Correct type hint to fix Pydantic serialization warning
Backend
- #399: Make max_tokens/max_completion_tokens key configurable per endpoint
- #316: Handle missing content in streaming delta
Containers & CI
- #254: Overhaul container image and CI (new top-level Containerfile, PDM build)
- #379: Container CI bugfix and disable dry-run on image cleaner
- #310: Use versioned builds (and version-pinned report links)
- #389: Fix container RC tag
- #398: Fix container RC tag (Attempt 2)
- #400: Fix failing CI
- #401: Fix typo in CI
- #301: Correct UI src path in workflows (publish_dir)
Output & Tooling
- #372: Pretty-print and UTF-8 encode metrics JSON files
Documentation
- #318: Add documentation on how to use with llama.cpp
- #328: Add “first benchmark testing example” (vLLM simulator)
Packaging
- #313: Add recommended extras group

Changelog link: v0.3.0...v0.3.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GuideLLM v0.3.1

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

GuideLLM v0.3.1

Overview

What's New

What's Changed

What's Fixed

Compatibility Notes

New Contributors

Changelog

Contributors

Uh oh!