feat(amd): integrate AMD Lemonade as inference backend#579
Merged
Lightheartdevs merged 3 commits intomainfrom Mar 23, 2026
Merged
feat(amd): integrate AMD Lemonade as inference backend#579Lightheartdevs merged 3 commits intomainfrom
Lightheartdevs merged 3 commits intomainfrom
Conversation
Replace llama-server with AMD Lemonade Server when AMD hardware is detected. Lemonade provides native NPU + Vulkan + ROCm acceleration, enabling hybrid NPU+GPU execution on Strix Halo and optimized inference on all AMD silicon. Windows: silent MSI install with user prompt, llama-server Vulkan fallback Linux: Lemonade Docker image with ROCm passthrough replaces toolbox image NPU detection added for Ryzen AI (Win32_PnPEntity + sysfs) New files: - config/litellm/lemonade.yaml (LiteLLM routing for Lemonade backend) Modified: - constants.ps1: Lemonade MSI URL, paths, health endpoint - detection.ps1/sh: NPU detection for Ryzen AI - env-generator.ps1: LLM_BACKEND and LLM_API_BASE_PATH variables - install-windows.ps1: Lemonade install + fallback flow in Phase 8 - dream.ps1: Lemonade-aware process management and health checks - docker-compose.windows-amd.yml: API path support for Lemonade - docker-compose.amd.yml: Lemonade Docker image with ROCm - amd.json: backend contract updated for Lemonade - .env.example: document LLM_BACKEND variable NVIDIA and CPU-only paths are completely untouched. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P0 fixes: - MSI install path: use "Lemonade Server" (with space) under Program Files, add ALLUSERS=1 for admin install, exe is in bin/ subdirectory - Remove broken /api/v1/load call: use --extra-models-dir flag instead, Lemonade auto-discovers GGUFs and loads on first request - Patch .env when user declines Lemonade: LLM_BACKEND and LLM_API_BASE_PATH are corrected to llama-server values using the bootstrap patching pattern P1 fixes: - Docker compose: add --no-tray (headless), --extra-models-dir /models, persistent volumes (cache, llama binaries, recipes), healthcheck override - OpenClaw provider URL: add /api prefix for Lemonade's /api/v1 endpoint - dream.ps1: add --no-tray, --llamacpp vulkan, --extra-models-dir to Lemonade startup args - strix-halo-config.yaml: update to /api/v1 endpoint with lemonade api_key P2 fixes: - .env.example: document LLM_API_BASE_PATH variable Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- docker-compose.amd.yml: override Open WebUI OPENAI_API_BASE_URL to /api/v1 for Lemonade (base compose hardcodes /v1) - dashboard-api setup.py: read LLM_API_BASE_PATH env var instead of hardcoding /v1/chat/completions - Both AMD overlays: pass LLM_API_BASE_PATH to dashboard-api container Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lightheartdevs
added a commit
that referenced
this pull request
Mar 23, 2026
The Linux installer's image pre-pull list (08-images.sh) still referenced the old kyuz0/amd-strix-halo-toolboxes:rocm-7.2 image, while docker-compose.amd.yml was updated to ghcr.io/lemonade-sdk/lemonade-server in PR #579. This caused the installer to download ~8GB of the wrong image, then docker compose up had to separately pull Lemonade at startup. Confirmed on Strix Halo (Ubuntu, 124GB RAM, Ryzen AI MAX+ 395). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks
Lightheartdevs
added a commit
that referenced
this pull request
Mar 23, 2026
) The Linux installer's image pre-pull list (08-images.sh) still referenced the old kyuz0/amd-strix-halo-toolboxes:rocm-7.2 image, while docker-compose.amd.yml was updated to ghcr.io/lemonade-sdk/lemonade-server in PR #579. This caused the installer to download ~8GB of the wrong image, then docker compose up had to separately pull Lemonade at startup. Confirmed on Strix Halo (Ubuntu, 124GB RAM, Ryzen AI MAX+ 395). Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Key changes
New files:
config/litellm/lemonade.yaml— LiteLLM routing config for LemonadeWindows installer (install-windows.ps1):
lemonade-server serve --extra-models-dir→ health checkDocker compose (docker-compose.amd.yml):
ghcr.io/lemonade-sdk/lemonade-server:latestwith ROCm,--no-tray,--extra-models-dir/api/v1/health/api/v1base pathCLI (dream.ps1):
Detection:
HasNpuflag for Strix Halo hybrid modeContext
AMD is offering DreamServer hardware and runs the Lemonade Developer Challenge. Integrating Lemonade gives native AMD optimization, contest eligibility, and a partnership story.
Test plan
docker compose -f docker-compose.base.yml -f docker-compose.amd.yml configvalidates"C:\Program Files\Lemonade Server\bin\lemonade-server.exe"🤖 Generated with Claude Code