Skip to content

Commit 31481ce

Browse files
mmoadeliyubingex007-a11ymmoadeli
authored
[SYCL][HIP] Support of AMD matrix core instructions (#11485)
* Support one block AMD matrix core instructions for `__gfx90a__` architecture. * Supports `__builtin_amdgcn_mfma_i32_32x32x8i8`, `__builtin_amdgcn_mfma_i32_16x16x16i8`, `__builtin_amdgcn_mfma_f64_16x16x4f64`, `__builtin_amdgcn_mfma_f32_32x32x8bf16_1k`, `__builtin_amdgcn_mfma_f32_16x16x16bf16_1k`, `__builtin_amdgcn_mfma_f32_32x32x8f16` and `__builtin_amdgcn_mfma_f32_16x16x16f16` instructions. * Add HIP matrix core support into joint_matrix documentation. Should be merged after - #11215 --------- Co-authored-by: Bing1 Yu <[email protected]> Co-authored-by: mmoadeli <[email protected]>
1 parent 9c07b46 commit 31481ce

16 files changed

+1268
-32
lines changed

sycl/doc/extensions/experimental/sycl_ext_matrix/sycl_ext_oneapi_matrix.asciidoc

Lines changed: 34 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ specification.*
5050
This extension is currently implemented in {dpcpp} only for devices
5151
that contain a matrix hardware, specifically Intel(R) Advanced Matrix
5252
Extensions (Intel(R) AMX), Intel(R) Xe Matrix Extensions (Intel(R)
53-
XMX) and Nvidia(R) Tensor Cores.
53+
XMX), Nvidia(R) Tensor Cores and AMD Matrix Cores(R).
5454

5555
The `joint_matrix` type and the `joint_matrix_mad` function are
5656
optional kernel features as defined in section 5.7 of the core SYCL
@@ -67,8 +67,8 @@ implementation throws a synchronous exception with the
6767

6868
== Overview
6969
Joint matrix is a SYCL extension for matrix hardware programming. It
70-
unifies targets like Intel AMX in CPUs, Intel XMX in Intel GPUs and
71-
Nvidia Tensor Cores. This provides a portable and performant API for
70+
unifies targets like Intel AMX in CPUs, Intel XMX in Intel GPUs,
71+
Nvidia Tensor Cores and AMD Matrix Cores(R). This provides a portable and performant API for
7272
users who want to build their own neural networks applications,
7373
perform custom optimizations, or experiment with new operations in a
7474
timely and performing manner.
@@ -921,7 +921,8 @@ the type of the A matrix must be the same as the type of the B
921921
matrix.
922922

923923
IMPORTANT: When compiling for the `ext_oneapi_cuda` backend the target
924-
arch backend flag, `-Xsycl-target-backend --cuda-gpu-arch=sm_xx`, must
924+
arch backend flag, `-fsycl-targets=nvidia_gpu_sm_xx`
925+
(or equivalents, e.g. `-Xsycl-target-backend --cuda-gpu-arch=sm_xx`), must
925926
be used, where `sm_xx` must be a Compute Capability that is equal to
926927
or greater than the appropriate Minimum Compute Capability. When an
927928
executable has been compiled for `sm_xx`, if the executable is run on
@@ -971,6 +972,34 @@ multiple of 4 when `T` is `float`; where `T` is the type of the
971972
`joint_matrix` elements. When `T` is not `half` or `float` there are
972973
no restrictions to `stride`.
973974

975+
==== AMD Matrix Cores Supported Combinations
976+
The complete set of matrix data types and dimensions that are supported by
977+
the `ext_oneapi_hip` backend are represented in the following
978+
table. In this architecture's implementation, A and B matrices must have the same type.
979+
Similarly, C and D matrices must share the same type.
980+
981+
IMPORTANT: The supported instructions may be run on GFX90A (MI200, MI210, MI250 and MI250X GPUs)
982+
architecture. When compiling for the `ext_oneapi_hip` backend the
983+
target arch backend flag, `-fsycl-targets=amd_gpu_gfx90a`, must
984+
be used. An attempt to run the compiled code on an unsupported architecture will throw an error.
985+
986+
987+
[frame="none",options="header"]
988+
|======================
989+
| A and B type | C and D type | M | N | K
990+
.2+| `matrix_type::fp16` .2+| `matrix_type::fp32`
991+
|32 |32 |8
992+
|16 |16 |16
993+
.2+| `matrix_type::sint8` .2+| `matrix_type::sint32`
994+
|32 |32 |8
995+
|16 |16 |16
996+
.2+|`matrix_type::bf16` .2+|`matrix_type::fp32`
997+
|32 |32 |8
998+
|16 |16 |16
999+
.1+|`matrix_type::fp64` .1+| `matrix_type::fp64`
1000+
|16 |16 |4
1001+
|======================
1002+
9741003
=== Revision History
9751004

9761005
[frame="none",options="header"]
@@ -990,4 +1019,5 @@ the Intel-specifics to a separate extension document
9901019
type, runtime query, and supported combinations appendix for Intel AMX
9911020
and Intel XMX
9921021
|7 |2023-04-11 |Jack Kirk |Add Nvidia Tensor Cores supported combinations
1022+
|8 |2023-10-05 |Mahmoud Moadeli |Add AMD Matrix Core supported combinations
9931023
|======================

sycl/include/sycl/detail/defines.hpp

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,9 +39,11 @@
3939
#define __SYCL_TYPE(x)
4040
#endif
4141

42-
// joint matrix should only be included by default for SPIR or NVPTX backends
43-
#if defined __SPIR__ || defined __NVPTX__ || !defined __SYCL_DEVICE_ONLY__
42+
// joint matrix should only be included by default for SPIR, NVPTX or HIP(GFX90A
43+
// only) backends
44+
#if defined __SPIR__ || defined __NVPTX__ || !defined __SYCL_DEVICE_ONLY__ || \
45+
defined __gfx90a__
4446
#ifndef SYCL_EXT_ONEAPI_MATRIX_VERSION
4547
#define SYCL_EXT_ONEAPI_MATRIX_VERSION 4
4648
#endif // SYCL_EXT_ONEAPI_MATRIX_VERSION
47-
#endif // __SPIR__ || __NVPTX__ || !__SYCL_DEVICE_ONLY
49+
#endif // __SPIR__ || __NVPTX__ || !__SYCL_DEVICE_ONLY || __gfx90a__

0 commit comments

Comments
 (0)