feat(amd): first-class R9700 (gfx1201/RDNA4) support for Qwen3.6-27B by DeanoC · Pull Request #435 · Luce-Org/lucebox-hub

DeanoC · 2026-06-22T16:51:59Z

Summary

Makes the AMD Radeon AI PRO R9700 (gfx1201, RDNA4 / Navi 48) a first-class AMD target for the DFlash Qwen3.6-27B stack, and fixes a CMake-version build break that affects the cuda12 / rocm Docker images.

1. R9700 / `gfx1201` support

The HIP build pipeline compiled RDNA4 as gfx1200 (Navi 44, RX 9060-class) and mislabeled it "RX 9070". The R9700 is gfx1201, which is not code-object compatible with gfx1200 — so the published :rocm image and the documented build produced no native R9700 kernels (ggml-hip fails to load / falls back badly on the R9700).

Add gfx1201 to the default fat-binary HIP arch list in docker-bake.hcl, Dockerfile.rocm, and the CI main/release matrix (.github/workflows/docker.yml). PR builds stay gfx1151-only for fast CI. Corrected the RDNA4 labels (gfx1200 = RX 9060, gfx1201 = RX 9070 / R9700).
Build the rocWMMA flashprefill numerics test (test_flashprefill_kernels) under HIP too — it was CUDA-only, so there was previously no way to validate the rocWMMA path on AMD. The CUDA-spelled test compiles via the existing hip_compat/ shim.
Document the R9700 in the README "Tested Machines" table and the server/README.md AMD HIP section (gfx1201 build, --ddtree-budget=22, multi-GPU / Fedora-PIE build notes).

No kernel changes: ROCm 7.1's rocWMMA handles the gfx12 WMMA operand-format change internally.

2. Build fix: `json` FetchContent on CMake < 3.24

#433 added DOWNLOAD_EXTRACT_TIMESTAMP TRUE to the json FetchContent_Declare. That keyword (and policy CMP0135) only exist on CMake ≥ 3.24, but the cuda12 / rocm Docker base images ship CMake 3.22, where the unknown token is parsed as extra URL list entries and configure fails with At least one entry of URL is a path (invalid in a list). The option is now applied only on CMake ≥ 3.24 (where it still silences the dev warning); older CMake parses the declare correctly again. Reproduced and verified on CMake 3.22.6 and 4.3.

Validation — real R9700 (`gfx1201`, ROCm 7.1.1)

Native gfx1201 build (Phase 1 + Phase 2 rocWMMA); the built libggml-hip.so contains native hipv4-amdgcn-amd-amdhsa--gfx1201 code objects.

Check	Result
`test_server_unit`	1984 assertions, 0 failures
`test_flashprefill_kernels` (rocWMMA)	PASS — max diff 5e-4; e2e `flash_prefill_forward_bf16` @ S=8192 = 10.7 ms/iter
Qwen3.6-27B Q4_K_M + DFlash, `--ddtree-budget=22`	54.65 tok/s mean decode (`bench_he.py --n-gen 256`, AL 7.14, range 36.9–93.0 tok/s)
16K-context generation	coherent

Notes

Fedora's system ROCm lives under /usr and links PIE by default; the local build used -DCMAKE_HIP_COMPILER_ROCM_ROOT=/usr -DROCM_PATH=/usr -DCMAKE_EXE_LINKER_FLAGS=-no-pie (documented in server/README.md). The Ubuntu-based Dockerfile.rocm is unaffected.

The HIP build pipeline compiled RDNA4 as gfx1200 (Navi 44, RX 9060) and mislabeled it 'RX 9070'. The Radeon AI PRO R9700 is gfx1201 (Navi 48), which is NOT code-object compatible with gfx1200 — so the published :rocm image and the documented build shipped no native R9700 kernels. - Add gfx1201 to the default fat-binary HIP arch list (docker-bake.hcl, Dockerfile.rocm, CI main/release matrix) and correct the RDNA4 labels. - Build the rocWMMA flashprefill numerics test (test_flashprefill_kernels) under HIP too, not just CUDA, so the Phase 2 path can be validated on AMD. - Document the R9700: gfx1201 build, --ddtree-budget=22, the multi-GPU / Fedora-PIE build notes, and benchmark numbers. Validated on a real R9700 (gfx1201, ROCm 7.1.1): - test_server_unit: 1984 assertions, 0 failures - test_flashprefill_kernels (rocWMMA): PASS, max diff 5e-4 - Qwen3.6-27B Q4_K_M + DFlash, budget=22: 54.65 tok/s mean decode (AL 7.14) - coherent 16K-context generation Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The json FetchContent_Declare gained DOWNLOAD_EXTRACT_TIMESTAMP TRUE in Luce-Org#433 to silence the CMP0135 dev warning. That keyword (and CMP0135) only exist on CMake >= 3.24; on the 3.22 base used by the cuda12/rocm Docker builds the unknown token is parsed as extra URL list entries, failing configure with 'At least one entry of URL is a path (invalid in a list)'. This broke the Docker prebuilds on main, independent of the R9700 work. Apply the option only on CMake >= 3.24 (where it still silences the warning); older CMake parses the declare correctly again. Reproduced and verified on CMake 3.22.6 and 4.3. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

cubic-dev-ai

No issues found across 6 files

_{Re-trigger cubic}

DeanoC and others added 2 commits June 22, 2026 18:55

cubic-dev-ai Bot reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(amd): first-class R9700 (gfx1201/RDNA4) support for Qwen3.6-27B#435

feat(amd): first-class R9700 (gfx1201/RDNA4) support for Qwen3.6-27B#435
DeanoC wants to merge 2 commits into
Luce-Org:mainfrom
GeometricAGI:feat/r9700-gfx1201

DeanoC commented Jun 22, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DeanoC commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. R9700 / gfx1201 support

2. Build fix: json FetchContent on CMake < 3.24

Validation — real R9700 (gfx1201, ROCm 7.1.1)

Notes

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DeanoC commented Jun 22, 2026 •

edited

Loading

1. R9700 / `gfx1201` support

2. Build fix: `json` FetchContent on CMake < 3.24

Validation — real R9700 (`gfx1201`, ROCm 7.1.1)