[SYCL][Matrix] syntax changes as preparation before moving joint matrix from experimental namespace #11215

yubingex007-a11y · 2023-09-19T03:50:32Z

As part of the effort to move joint matrix from experimental namespace to supported. A review of the API is being done as part of #7964. This results in the following changes in the syntax:
1- Add Td to joint_matrix_mad as Tc can be different from Td on the GPU, Now, we make D as an input argument to mad.
2- Change “packed” to ext_intel_packed:
3- Move EWOps (get_wi_data, wi_element, get_coord) to detail namespace) 4- add const to joint_matrix in store and mad
5 - add joint_matrix_copy/assignment function
6- add apply with coordination (change existing tests)
7- change get_coord vector type from int32_t to size_t
8- delete explicitly both = and copy ctor.

experimental namespace As part of the effort to move joint matrix from experimental namespace to supported. A review of the API is being done as part of intel#7964. This results in the following changes in the syntax: 1- Add Td to joint_matrix_mad as Tc can be different from Td on the GPU, Now, we make D as an input argument to mad. 2- Change “packed” to ext_intel_packed: 3- Move EWOps (get_wi_data, wi_element, get_coord) to detail namespace) 4- add const to joint_matrix in store and mad 5 - add joint_matrix_copy/assignment function 6- add apply with coordination (change existing tests) 7- change get_coord vector type from int32_t to size_t 8- delete explicitly both = and copy ctor.

dkhaldi · 2023-09-19T15:15:44Z

sycl/include/sycl/ext/oneapi/matrix/matrix-unified.hpp

@@ -138,9 +127,9 @@ template <typename Group, typename T, use Use, size_t Rows, size_t Cols,
 __SYCL2020_DEPRECATED("get_wi_data() is deprecated for CUDA backend. Please "


We should remove this. This is not really deprecated as joint_matrix is experimental so we can just remove APIs. Deprecated means they still exist and implementations maintain them. In the case of get_wi_data. it is replaced by joint_matrix_apply

Was this addressed?

This will be addressed by @JackAKirk among other CUDA changes in a separate PR.

Yes I will make this change as soon as this PR is merged.

dkhaldi · 2023-09-19T15:16:12Z

sycl/include/sycl/ext/oneapi/matrix/matrix-unified.hpp

@@ -99,7 +88,7 @@ class wi_data {
    return jm.cuda_impl.wi_marray.size();
 #else
    throw runtime_error("get_wi_data is available using: "
-                        "ext::intel::experimental::matrix::get_wi_data.",
+                        "ext::oneapi::detail::get_wi_data.",
                        PI_ERROR_INVALID_DEVICE);


We should not advise users to use get_wi_data. When does this runtime error occur?

ok, i see. wi_data class here is in sycl::ext::oneapi::experimental::matrix namespace and it is for NV. the errmsg is for intel users who uses NV's wi_data

how about
get_wi_data is available using: ext::oneapi::detail::get_wi_data but intel users are expected to use joint_matrix_copy instead

We should never advise users to use anything from detail namespace. Detail namespace are implementation details and can change at any time. It is not part of documented API.

dkhaldi · 2023-09-19T15:16:38Z

sycl/include/sycl/ext/oneapi/matrix/matrix-unified.hpp

@@ -109,7 +98,7 @@ class wi_data {
    return (jm.cuda_impl.wi_marray[i]);
 #else
    throw runtime_error("get_wi_data is available using: "
-                        "ext::intel::experimental::matrix::get_wi_data.",
+                        "ext::oneapi::detail::get_wi_data.",


same as above

sycl/include/sycl/ext/oneapi/matrix/matrix-unified.hpp

dkhaldi · 2023-09-19T15:18:08Z

sycl/include/sycl/ext/oneapi/matrix/matrix-unified.hpp

@@ -262,7 +251,7 @@ inline __SYCL_ALWAYS_INLINE void joint_matrix_load(
        Ptr, stride, __spv::MatrixLayout::ColumnMajor,
        spv_scope_traits<Group>::value);
    break;
-  case sycl::ext::intel::experimental::matrix::layout::packed:
+  case sycl::ext::oneapi::experimental::matrix::layout::ext_intel_packed:


Minor: you dont need to specify the whole namespace here

dkhaldi · 2023-09-19T15:20:26Z

sycl/include/sycl/ext/oneapi/matrix/matrix-unified.hpp

+          std::size_t M, std::size_t K, std::size_t N, layout LayoutA,
+          layout LayoutB>
+inline __SYCL_ALWAYS_INLINE void joint_matrix_mad(
+    Group sg, const joint_matrix<Group, Ta, use::a, M, K, LayoutA> &A,


D (destination) should be first, see #11007)

dkhaldi

Main comments:

We should not use get_wi_data or things in detail in tests or errors, these should be replaced with joint_matrix_apply
remove namespace when specifying ext_intel_packed so things look shorter

dkhaldi · 2023-09-19T15:22:54Z

sycl/test-e2e/Matrix/Legacy/joint_matrix_uu_int8_impl.hpp

+                     N * 4, matrix_layout::packed_b);
+                 sub_c = joint_matrix_mad(sg, sub_a, sub_b, sub_c);
+               }
+               joint_matrix_store(


You should be able to avoid changes in Legacy folder. Were they caused by clang-format?

dkhaldi · 2023-09-19T15:23:08Z

sycl/test-e2e/Matrix/element_wise_abc_impl.hpp

-                        ext::intel::experimental::matrix::layout::packed>
+           joint_matrix<
+               sub_group, T2, use::b, TK, TN,
+               ext::oneapi::experimental::matrix::layout::ext_intel_packed>


remove namespace

dkhaldi · 2023-09-19T15:23:39Z

sycl/test-e2e/Matrix/element_wise_abc_impl.hpp

@@ -65,8 +66,7 @@ void matrix_elem_wise_ops(big_matrix<T1, M, N> &C, big_matrix<T2, M, K> &A,
               accA.template get_multi_ptr<access::decorated::no>() +
                   (sg_startx * TM) * K,
               K);
-           auto wi_slice_a =
-               sycl::ext::intel::experimental::matrix::get_wi_data(sg, sub_a);
+           auto wi_slice_a = sycl::ext::oneapi::detail::get_wi_data(sg, sub_a);


We should not use get_wi_data or detail in tests, these should be replaced with joint_matrix_apply

dkhaldi · 2023-09-19T15:23:49Z

sycl/test-e2e/Matrix/element_wise_abc_impl.hpp

@@ -76,8 +76,7 @@ void matrix_elem_wise_ops(big_matrix<T1, M, N> &C, big_matrix<T2, M, K> &A,
               accB.template get_multi_ptr<access::decorated::no>() +
                   sg_starty / SG_SZ * TN * vnniFactor,
               N * vnniFactor);
-           auto wi_slice_b =
-               sycl::ext::intel::experimental::matrix::get_wi_data(sg, sub_b);
+           auto wi_slice_b = sycl::ext::oneapi::detail::get_wi_data(sg, sub_b);


We should not use get_wi_data or detail in tests, these should be replaced with joint_matrix_apply

yubingex007-a11y · 2023-09-21T12:10:11Z

sycl/test/matrix/matrix-bfloat16-test-coord-basicB.cpp previously failed
sycl/test-e2e/Matrix/get_coord_int8_matB.cpp failed previously
sycl/test-e2e/Matrix/element_wise_irreg_sum_rows.cpp can't be modified easily since:

             for (int i = 0; i < data.length() / (TK / 4); i++) { // 4 per row
               // i*SG_SIZE index is found based on the round robin
               // distribution we are using in the implementation
               sum_local_rows[row + global_idx * (TK / 4)] += data[i + row * 4];
             }

yubingex007-a11y · 2023-09-21T12:20:46Z

I am WIP on rebasing

yubingex007-a11y · 2023-09-21T12:31:41Z

will handle cuda testcase later

dkhaldi · 2023-09-21T14:40:50Z

sycl/include/sycl/ext/oneapi/matrix/matrix-unified.hpp

-                        "ext::intel::experimental::matrix::get_wi_data.",
-                        PI_ERROR_INVALID_DEVICE);
+    throw runtime_error(
+        "get_wi_data is available using: ext::oneapi::detail::get_wi_data, but "


Just say: "get_wi_data is unavailable, use joint_matrix_copy instead."

dkhaldi · 2023-09-21T14:55:56Z

sycl/test/matrix/query-use.cpp

@@ -0,0 +1,158 @@
+// RUN: %clangxx -DSYCL_EXT_ONEAPI_MATRIX_VERSION=4 -fsycl -o query-use %s


This is an old version of the test

dkhaldi · 2023-09-21T15:48:15Z

sycl/test/matrix/matrix-bfloat16-test-coord-basicB.cpp previously failed sycl/test-e2e/Matrix/get_coord_int8_matB.cpp failed previously sycl/test-e2e/Matrix/element_wise_irreg_sum_rows.cpp can't be modified easily since:
             for (int i = 0; i < data.length() / (TK / 4); i++) { // 4 per row
               // i*SG_SIZE index is found based on the round robin
               // distribution we are using in the implementation
               sum_local_rows[row + global_idx * (TK / 4)] += data[i + row * 4];
             }

We should probably remove this test because it assumes some distribution. Also, it does the same thing as sycl/test-e2e/Matrix/get_coord_int8_matB.cpp

dkhaldi · 2023-09-21T21:00:47Z

I looked more carefully at https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Matrix/element_wise_irreg_sum_rows_impl.hpp
this test should be removed as it is duplicate of get_coord_matB and assumes some coordinates.

dkhaldi

LGTM

yubingex007-a11y · 2023-10-12T05:23:48Z

@intel/llvm-gatekeepers ping?

steffenlarsen · 2023-10-12T08:29:06Z

@YuriPlyakhin has requested changes approval is needed.

yubingex007-a11y · 2023-10-12T09:58:31Z

@YuriPlyakhin could you approve the pr? Dounia has answered your comments above and if there should be small changes, we can create a new pr.

dkhaldi · 2023-10-12T13:25:52Z

@intel/llvm-gatekeepers, please help merge

dm-vodopyanov · 2023-10-12T13:43:16Z

@intel/llvm-gatekeepers, please help merge

There is an input from Steffen above.

dkhaldi · 2023-10-12T13:50:44Z

@intel/llvm-gatekeepers, please help merge

There is an input from Steffen above.

Correct, I missed that. Yury is OOO today but this can wait.
We should just make sure this gets merged before #11485 is merged

dm-vodopyanov · 2023-10-12T13:53:13Z

@intel/llvm-gatekeepers, please help merge

There is an input from Steffen above.

Correct, I missed that. Yury is OOO today but this can wait. We should just make sure this gets merged before #11485 is merged

Updated the description of #11485

YuriPlyakhin

LGTM. Important comments were addressed. Test fine tuning can be done later.

YuriPlyakhin · 2023-10-12T14:18:29Z

@intel/llvm-gatekeepers , I approved, please, merge.

As discussed in #11215 this patch: - removed mutable from `joint_matrix_cuda` (This change requires an upstream llvm patch (https://reviews.llvm.org/rGb781c7ab574f)) - removed `get_wi_data()` I also added back the cases that the change in the `joint_matrix_mad` interface allows: namely when the type of C/D matrices differ. I correspondingly updated the tests, to test the new cases that are supported. I also updated the support matrix for cuda in the spec doc for the newly supported combinations. --------- Signed-off-by: JackAKirk <[email protected]>

* Support one block AMD matrix core instructions for `__gfx90a__` architecture. * Supports `__builtin_amdgcn_mfma_i32_32x32x8i8`, `__builtin_amdgcn_mfma_i32_16x16x16i8`, `__builtin_amdgcn_mfma_f64_16x16x4f64`, `__builtin_amdgcn_mfma_f32_32x32x8bf16_1k`, `__builtin_amdgcn_mfma_f32_16x16x16bf16_1k`, `__builtin_amdgcn_mfma_f32_32x32x8f16` and `__builtin_amdgcn_mfma_f32_16x16x16f16` instructions. * Add HIP matrix core support into joint_matrix documentation. Should be merged after - #11215 --------- Co-authored-by: Bing1 Yu <[email protected]> Co-authored-by: mmoadeli <[email protected]>

yubingex007-a11y added 9 commits September 19, 2023 11:46

clang-format

5fbb285

fix typo: dest->dst

bf6cd56

fix testcase

b399041

fix mad bug

dae1ec6

fix cuda const joint_matrix_cuda

4ec8360

fix const issue of jm_store_cuda

a461cbb

fix const

5ff715b

lint

8ad7da9

dkhaldi reviewed Sep 19, 2023

View reviewed changes

sycl/include/sycl/ext/oneapi/matrix/matrix-unified.hpp Show resolved Hide resolved

dkhaldi reviewed Sep 19, 2023

View reviewed changes

dkhaldi requested changes Sep 19, 2023

View reviewed changes

yubingex007-a11y added 5 commits September 21, 2023 14:52

address dounia's comments and roll back all the testcase changes

26ea49d

test changes: mov D in mad

a09a778

testcase changes: ext_intel_layout

821fa89

testcase changes: wi_data=>jm_apply

a3921b5

lint

ef1bc67

Merge remote-tracking branch 'intel_llvm/sycl' into jm_syntax

f395199

yubingex007-a11y had a problem deploying to WindowsCILock September 21, 2023 13:11 — with GitHub Actions Failure

yubingex007-a11y temporarily deployed to WindowsCILock September 21, 2023 14:30 — with GitHub Actions Inactive

dkhaldi requested changes Sep 21, 2023

View reviewed changes

yubingex007-a11y added 2 commits October 12, 2023 01:31

address comments

a821107

Merge remote-tracking branch 'intel_llvm/sycl' into jm_syntax

3f1b575

yubingex007-a11y had a problem deploying to WindowsCILock October 11, 2023 17:54 — with GitHub Actions Failure

yubingex007-a11y temporarily deployed to WindowsCILock October 11, 2023 18:24 — with GitHub Actions Inactive

yubingex007-a11y added 2 commits October 12, 2023 02:51

rm element_wise_irreg_sum_rows_impl.hpp

1d091de

small fix

1e20968

yubingex007-a11y temporarily deployed to WindowsCILock October 11, 2023 18:56 — with GitHub Actions Inactive

small fix

1fe7fcd

yubingex007-a11y temporarily deployed to WindowsCILock October 11, 2023 19:08 — with GitHub Actions Inactive

dkhaldi approved these changes Oct 11, 2023

View reviewed changes

yubingex007-a11y temporarily deployed to WindowsCILock October 11, 2023 19:33 — with GitHub Actions Inactive

yubingex007-a11y requested a review from a team October 11, 2023 23:54

steffenlarsen requested a review from YuriPlyakhin October 12, 2023 08:28

JackAKirk self-requested a review October 12, 2023 11:12

JackAKirk approved these changes Oct 12, 2023

View reviewed changes

dm-vodopyanov changed the title ~~[Matrix] syntax changes as prepraration before moving joint matrix from experimental namespace~~ [SYCL][Matrix] syntax changes as preparation before moving joint matrix from experimental namespace Oct 12, 2023

YuriPlyakhin approved these changes Oct 12, 2023

View reviewed changes

dm-vodopyanov merged commit 687f579 into intel:sycl Oct 12, 2023

JackAKirk mentioned this pull request Oct 17, 2023

[SYCL][CUDA] joint_matrix required changes following #11215 #11563

Merged

		@@ -138,9 +127,9 @@ template <typename Group, typename T, use Use, size_t Rows, size_t Cols,
		__SYCL2020_DEPRECATED("get_wi_data() is deprecated for CUDA backend. Please "

		@@ -0,0 +1,158 @@
		// RUN: %clangxx -DSYCL_EXT_ONEAPI_MATRIX_VERSION=4 -fsycl -o query-use %s

[SYCL][Matrix] syntax changes as preparation before moving joint matrix from experimental namespace #11215

[SYCL][Matrix] syntax changes as preparation before moving joint matrix from experimental namespace #11215

Uh oh!

Conversation

yubingex007-a11y commented Sep 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkhaldi Oct 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkhaldi Sep 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkhaldi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yubingex007-a11y commented Sep 21, 2023

Uh oh!

yubingex007-a11y commented Sep 21, 2023

Uh oh!

yubingex007-a11y commented Sep 21, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dkhaldi commented Sep 21, 2023

Uh oh!

dkhaldi commented Sep 21, 2023

Uh oh!

dkhaldi left a comment

Choose a reason for hiding this comment

Uh oh!

yubingex007-a11y commented Oct 12, 2023

Uh oh!

steffenlarsen commented Oct 12, 2023

Uh oh!

yubingex007-a11y commented Oct 12, 2023

Uh oh!

dkhaldi commented Oct 12, 2023

Uh oh!

dm-vodopyanov commented Oct 12, 2023

Uh oh!

dkhaldi commented Oct 12, 2023

Uh oh!

dm-vodopyanov commented Oct 12, 2023

Uh oh!

YuriPlyakhin left a comment

Choose a reason for hiding this comment

Uh oh!

YuriPlyakhin commented Oct 12, 2023

yubingex007-a11y commented Sep 19, 2023 •

edited

Loading

dkhaldi Oct 12, 2023 •

edited

Loading

dkhaldi Sep 19, 2023 •

edited

Loading