GuideLLM v0.3.1
Overview
Minor release focused on container build/tagging stability, UI polish and terminology alignment, improved OpenAI backend robustness/configurability, clearer JSON output, and new documentation (llama.cpp usage and a vLLM simulator walkthrough). Workflows now produce versioned artifacts and maintain latest/stable tags automatically.
To get started, install with:
pip install guidellm[recommended]==0.3.1
Or from source with:
pip install 'guidellm[recommended] @ git+https://github.com/vllm-project/guidellm.git'@v0.3.1
What's New
- Recommended Extras Group: Install OpenAI tokenizer dependencies via
guidellm[recommended]
(tiktoken, blobfile) - llama.cpp Guide: New docs covering llama-server, model aliasing, and metadata handling
- vLLM Simulator Example: Step-by-step “first benchmark” walkthrough with sample output images
- Container Maintenance Workflow: Scheduled cleanup of old PR images; auto-retag latest and stable
What's Changed
- UI Polish: Clearer labels (e.g., “Time Per Request”, “Measured RPS (Mean)”) and slider text
- Versioned Reports: PROD/STAGING report URLs pinned to versioned UI builds
- Container Build System: New top-level Containerfile using Fedora Python minimal + PDM; build type via GUIDELLM_BUILD_TYPE
- Metrics JSON Output: UTF-8 encoding with pretty-printed, indented JSON
- Endpoint Max Tokens Keys: Output-token limit now governed per-endpoint via
GUIDELLM__OPENAI__MAX_OUTPUT_KEY
What's Fixed
- Streaming Robustness: Safely handle missing delta.content for chat streams
- Endpoint Token Keys: Configurable max output key per endpoint (max_tokens vs max_completion_tokens)
- CI Stability: Fixes to RC tagging, GH Pages publish paths, and workflow typos; disable dry-run for image cleanup
Compatibility Notes
- Python: 3.9–3.13
- OS: Linux and MacOS
- Dependencies: Optional extras via
guidellm[recommended]
; currently includes packages for OpenAI's tokenizer but may expand in the future - Breaking: Previously all endpoints used both
max_tokens
andmax_completion_tokens
to bound output; this caused issues with some servers- The key is now controlled per-endpoint (defaults to
max_tokens
for legacycompletions
andmax_completion_tokens
forchat/completions
)
- The key is now controlled per-endpoint (defaults to
New Contributors
- @rgerganov: made their first contribution in PR #318
- @git-jxj: made their first contribution in PR #316
- @psydok: made their first contribution in PR #372
Changelog
- UI & Presentation
- Backend
- Containers & CI
- #254: Overhaul container image and CI (new top-level Containerfile, PDM build)
- #379: Container CI bugfix and disable dry-run on image cleaner
- #310: Use versioned builds (and version-pinned report links)
- #389: Fix container RC tag
- #398: Fix container RC tag (Attempt 2)
- #400: Fix failing CI
- #401: Fix typo in CI
- #301: Correct UI src path in workflows (publish_dir)
- Output & Tooling
- #372: Pretty-print and UTF-8 encode metrics JSON files
- Documentation
- Packaging
- #313: Add recommended extras group
Changelog link: v0.3.0...v0.3.1