Skip to content

feat(amd): integrate AMD Lemonade as inference backend#579

Merged
Lightheartdevs merged 3 commits intomainfrom
feat/lemonade-amd-backend
Mar 23, 2026
Merged

feat(amd): integrate AMD Lemonade as inference backend#579
Lightheartdevs merged 3 commits intomainfrom
feat/lemonade-amd-backend

Conversation

@Lightheartdevs
Copy link
Copy Markdown
Collaborator

Summary

  • Adds AMD Lemonade Server as the preferred inference backend when AMD hardware is detected
  • Lemonade provides native NPU + Vulkan + ROCm acceleration, enabling hybrid NPU+GPU execution on Strix Halo
  • Windows: silent MSI install with user prompt, llama-server Vulkan fallback if declined
  • Linux: Lemonade Docker image with ROCm passthrough replaces toolbox image
  • NPU detection added for Ryzen AI (Win32_PnPEntity + sysfs)
  • NVIDIA and CPU-only paths completely untouched

Key changes

New files:

  • config/litellm/lemonade.yaml — LiteLLM routing config for Lemonade

Windows installer (install-windows.ps1):

  • Phase 8: prompts user → MSI install → lemonade-server serve --extra-models-dir → health check
  • Falls back to Vulkan llama-server if user declines or install fails
  • Patches .env to correct backend/path on fallback

Docker compose (docker-compose.amd.yml):

  • ghcr.io/lemonade-sdk/lemonade-server:latest with ROCm, --no-tray, --extra-models-dir
  • 3 persistent volumes (model cache, llama binaries, recipes)
  • Healthcheck override for /api/v1/health
  • Open WebUI override for /api/v1 base path

CLI (dream.ps1):

  • Backend-aware process management (Lemonade vs llama-server)
  • Correct health check endpoints, startup args, chat API paths

Detection:

  • NPU detection on Windows (Win32_PnPEntity) and Linux (sysfs/lspci)
  • HasNpu flag for Strix Halo hybrid mode

Context

AMD is offering DreamServer hardware and runs the Lemonade Developer Challenge. Integrating Lemonade gives native AMD optimization, contest eligibility, and a partnership story.

Test plan

  • Windows AMD: install with Lemonade accepted → verify health, chat UI, dashboard
  • Windows AMD: install with Lemonade declined → verify llama-server fallback works
  • Windows NVIDIA: verify zero behavioral change
  • Windows no-GPU: verify zero behavioral change
  • Linux AMD: docker compose -f docker-compose.base.yml -f docker-compose.amd.yml config validates
  • Verify MSI install path matches "C:\Program Files\Lemonade Server\bin\lemonade-server.exe"

🤖 Generated with Claude Code

Lightheartdevs and others added 3 commits March 23, 2026 16:10
Replace llama-server with AMD Lemonade Server when AMD hardware is detected.
Lemonade provides native NPU + Vulkan + ROCm acceleration, enabling hybrid
NPU+GPU execution on Strix Halo and optimized inference on all AMD silicon.

Windows: silent MSI install with user prompt, llama-server Vulkan fallback
Linux: Lemonade Docker image with ROCm passthrough replaces toolbox image
NPU detection added for Ryzen AI (Win32_PnPEntity + sysfs)

New files:
- config/litellm/lemonade.yaml (LiteLLM routing for Lemonade backend)

Modified:
- constants.ps1: Lemonade MSI URL, paths, health endpoint
- detection.ps1/sh: NPU detection for Ryzen AI
- env-generator.ps1: LLM_BACKEND and LLM_API_BASE_PATH variables
- install-windows.ps1: Lemonade install + fallback flow in Phase 8
- dream.ps1: Lemonade-aware process management and health checks
- docker-compose.windows-amd.yml: API path support for Lemonade
- docker-compose.amd.yml: Lemonade Docker image with ROCm
- amd.json: backend contract updated for Lemonade
- .env.example: document LLM_BACKEND variable

NVIDIA and CPU-only paths are completely untouched.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
P0 fixes:
- MSI install path: use "Lemonade Server" (with space) under Program Files,
  add ALLUSERS=1 for admin install, exe is in bin/ subdirectory
- Remove broken /api/v1/load call: use --extra-models-dir flag instead,
  Lemonade auto-discovers GGUFs and loads on first request
- Patch .env when user declines Lemonade: LLM_BACKEND and LLM_API_BASE_PATH
  are corrected to llama-server values using the bootstrap patching pattern

P1 fixes:
- Docker compose: add --no-tray (headless), --extra-models-dir /models,
  persistent volumes (cache, llama binaries, recipes), healthcheck override
- OpenClaw provider URL: add /api prefix for Lemonade's /api/v1 endpoint
- dream.ps1: add --no-tray, --llamacpp vulkan, --extra-models-dir to
  Lemonade startup args
- strix-halo-config.yaml: update to /api/v1 endpoint with lemonade api_key

P2 fixes:
- .env.example: document LLM_API_BASE_PATH variable

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- docker-compose.amd.yml: override Open WebUI OPENAI_API_BASE_URL to
  /api/v1 for Lemonade (base compose hardcodes /v1)
- dashboard-api setup.py: read LLM_API_BASE_PATH env var instead of
  hardcoding /v1/chat/completions
- Both AMD overlays: pass LLM_API_BASE_PATH to dashboard-api container

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Lightheartdevs Lightheartdevs merged commit 45512c7 into main Mar 23, 2026
15 of 22 checks passed
Lightheartdevs added a commit that referenced this pull request Mar 23, 2026
The Linux installer's image pre-pull list (08-images.sh) still referenced
the old kyuz0/amd-strix-halo-toolboxes:rocm-7.2 image, while
docker-compose.amd.yml was updated to ghcr.io/lemonade-sdk/lemonade-server
in PR #579. This caused the installer to download ~8GB of the wrong image,
then docker compose up had to separately pull Lemonade at startup.

Confirmed on Strix Halo (Ubuntu, 124GB RAM, Ryzen AI MAX+ 395).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lightheartdevs added a commit that referenced this pull request Mar 23, 2026
)

The Linux installer's image pre-pull list (08-images.sh) still referenced
the old kyuz0/amd-strix-halo-toolboxes:rocm-7.2 image, while
docker-compose.amd.yml was updated to ghcr.io/lemonade-sdk/lemonade-server
in PR #579. This caused the installer to download ~8GB of the wrong image,
then docker compose up had to separately pull Lemonade at startup.

Confirmed on Strix Halo (Ubuntu, 124GB RAM, Ryzen AI MAX+ 395).

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant