Skip to content

Commit 3ffbbd5

Browse files
authored
HIP: rocWMMA documentation and enabling in workflow builds (ggml-org#12179)
* Enable rocWMMA for Windows CI build * Enable for Ubuntu * GGML_HIP_ROCWMMA_FATTN documentation work
1 parent 4299404 commit 3ffbbd5

File tree

2 files changed

+22
-0
lines changed

2 files changed

+22
-0
lines changed

.github/workflows/build.yml

+16
Original file line numberDiff line numberDiff line change
@@ -467,6 +467,7 @@ jobs:
467467
run: |
468468
cmake -B build -S . \
469469
-DCMAKE_HIP_COMPILER="$(hipconfig -l)/clang" \
470+
-DGGML_HIP_ROCWMMA_FATTN=ON \
470471
-DGGML_HIP=ON
471472
cmake --build build --config Release -j $(nproc)
472473
@@ -476,6 +477,7 @@ jobs:
476477
cmake -B build2 -S . \
477478
-DCMAKE_C_COMPILER=hipcc \
478479
-DCMAKE_CXX_COMPILER=hipcc \
480+
-DGGML_HIP_ROCWMMA_FATTN=ON \
479481
-DGGML_HIP=ON
480482
cmake --build build2 --config Release -j $(nproc)
481483
@@ -1202,6 +1204,11 @@ jobs:
12021204
id: checkout
12031205
uses: actions/checkout@v4
12041206

1207+
- name: Clone rocWMMA repository
1208+
id: clone_rocwmma
1209+
run: |
1210+
git clone https://github.com/rocm/rocwmma --branch rocm-6.2.4 --depth 1
1211+
12051212
- name: Install
12061213
id: depends
12071214
run: |
@@ -1231,8 +1238,10 @@ jobs:
12311238
cmake -G "Unix Makefiles" -B build -S . `
12321239
-DCMAKE_C_COMPILER="${env:HIP_PATH}\bin\clang.exe" `
12331240
-DCMAKE_CXX_COMPILER="${env:HIP_PATH}\bin\clang++.exe" `
1241+
-DCMAKE_CXX_FLAGS="-Irocwmma/library/include/" `
12341242
-DCMAKE_BUILD_TYPE=Release `
12351243
-DGGML_HIP=ON `
1244+
-DGGML_HIP_ROCWMMA_FATTN=ON `
12361245
-DGGML_RPC=ON
12371246
cmake --build build -j ${env:NUMBER_OF_PROCESSORS}
12381247
@@ -1251,6 +1260,11 @@ jobs:
12511260
with:
12521261
fetch-depth: 0
12531262

1263+
- name: Clone rocWMMA repository
1264+
id: clone_rocwmma
1265+
run: |
1266+
git clone https://github.com/rocm/rocwmma --branch rocm-6.2.4 --depth 1
1267+
12541268
- name: ccache
12551269
uses: hendrikmuhs/[email protected]
12561270
with:
@@ -1280,8 +1294,10 @@ jobs:
12801294
cmake -G "Unix Makefiles" -B build -S . `
12811295
-DCMAKE_C_COMPILER="${env:HIP_PATH}\bin\clang.exe" `
12821296
-DCMAKE_CXX_COMPILER="${env:HIP_PATH}\bin\clang++.exe" `
1297+
-DCMAKE_CXX_FLAGS="-Irocwmma/library/include/" `
12831298
-DCMAKE_BUILD_TYPE=Release `
12841299
-DAMDGPU_TARGETS=${{ matrix.gpu_target }} `
1300+
-DGGML_HIP_ROCWMMA_FATTN=ON `
12851301
-DGGML_HIP=ON `
12861302
-DGGML_RPC=ON
12871303
cmake --build build -j ${env:NUMBER_OF_PROCESSORS}

docs/build.md

+6
Original file line numberDiff line numberDiff line change
@@ -235,6 +235,12 @@ You can download it from your Linux distro's package manager or from here: [ROCm
235235
On Linux it is also possible to use unified memory architecture (UMA) to share main memory between the CPU and integrated GPU by setting `-DGGML_HIP_UMA=ON`.
236236
However, this hurts performance for non-integrated GPUs (but enables working with integrated GPUs).
237237

238+
To enhance flash attention performance on RDNA3+ or CDNA architectures, you can utilize the rocWMMA library by enabling the `-DGGML_HIP_ROCWMMA_FATTN=ON` option. This requires rocWMMA headers to be installed on the build system.
239+
240+
The rocWMMA library is included by default when installing the ROCm SDK using the `rocm` meta package provided by AMD. Alternatively, if you are not using the meta package, you can install the library using the `rocwmma-dev` or `rocwmma-devel` package, depending on your system's package manager.
241+
242+
As an alternative, you can manually install the library by cloning it from the official [GitHub repository](https://github.com/ROCm/rocWMMA), checkout the corresponding version tag (e.g. `rocm-6.2.4`) and set `-DCMAKE_CXX_FLAGS="-I<path/to/rocwmma>/library/include/"` in CMake. This also works under Windows despite not officially supported by AMD.
243+
238244
Note that if you get the following error:
239245
```
240246
clang: error: cannot find ROCm device library; provide its path via '--rocm-path' or '--rocm-device-lib-path', or pass '-nogpulib' to build without ROCm device library

0 commit comments

Comments
 (0)