Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][Matrix] Use KHR cooperative matrix instructions instead of Intel's #13817

Merged
merged 19 commits into from
Aug 15, 2024

Conversation

MrSidims
Copy link
Contributor

The usage is currently guarded by __SPIRV_USE_COOPERATIVE_MATRIX macro.

It's a split from #13316

Copy link
Contributor

@YuriPlyakhin YuriPlyakhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one nit comment from me above. otherwise, LGTM

Copy link
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add tests for this new functionality (i.e. for the case when __SPIRV_USE_COOPERATIVE_MATRIX is enabled)?

@MrSidims
Copy link
Contributor Author

MrSidims commented May 21, 2024

Is it possible to add tests for this new functionality (i.e. for the case when __SPIRV_USE_COOPERATIVE_MATRIX is enabled)?

@againull It's possible and E2E test will be passing in pre-commit on GPU (as pre-commit right now is using the freshest GPU drivers as possible). But it will be failing in nightly as it uses older driver. Also such test won't be passing any time soon on CPU.

For non E2E tests I'll be modifying check_device_code/matrix/matrix_load_store_as.cpp and some of matrix non-E2E tests once #13645 is merged.

Please tell me which option is preferable to you. On my side I'll will be modifying some internal scripts that @YuriPlyakhin has anyway to ensure __SPIRV_USE_COOPERATIVE_MATRIX case is also being validated.

@dkhaldi
Copy link
Contributor

dkhaldi commented May 21, 2024

@MrSidims

Please tell me which option is preferable to you. On my side I'll will be modifying some internal scripts that @YuriPlyakhin has anyway to ensure __SPIRV_USE_COOPERATIVE_MATRIX case is also being validated.

The current script works only on PVC.
Will you be able to make them also run on DG2 and SPR?

@againull
Copy link
Contributor

Is it possible to add tests for this new functionality (i.e. for the case when __SPIRV_USE_COOPERATIVE_MATRIX is enabled)?

@againull It's possible and E2E test will be passing in pre-commit on GPU (as pre-commit right now is using the freshest GPU drivers as possible). But it will be failing in nightly as it uses older driver. Also such test won't be passing any time soon on CPU.

For non E2E tests I'll be modifying check_device_code/matrix/matrix_load_store_as.cpp and some of matrix non-E2E tests once #13645 is merged.

Please tell me which option is preferable to you. On my side I'll will be modifying some internal scripts that @YuriPlyakhin has anyway to ensure __SPIRV_USE_COOPERATIVE_MATRIX case is also being validated.

Then in my opinion it will be perfect if we have these E2E tests. It is possible to add REQUIRE statement where you can indicate required minimum driver version and that tests require gpu.

@MrSidims
Copy link
Contributor Author

Okay, for E2E testing I need to merge #13923 and an extracted llvm-spirv part of #13316

@MrSidims
Copy link
Contributor Author

MrSidims commented Aug 8, 2024

@againull apologies for a long delay. I've added the tests. I plan to enable all of the tests for cooperative matrices in the next PR, but first need to have some driver version containing the fix I'm committing right now there.

Copy link
Contributor

@YuriPlyakhin YuriPlyakhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrSidims , If eventually we plan to make SPV Cooperative Matrix a default, I believe, we should enable all Matrix E2E tests in SPVCooperativeMatrix folder, not a subset, to make sure no regressions, when we try to switch the default.

@dkhaldi
Copy link
Contributor

dkhaldi commented Aug 8, 2024

@MrSidims , If eventually we plan to make SPV Cooperative Matrix a default, I believe, we should enable all Matrix E2E tests in SPVCooperativeMatrix folder, not a subset, to make sure no regressions, when we try to switch the default.

Do we really need a new folder?
We can just add a second line of execution with // RUN: %{build} -D__SPIRV_USE_COOPERATIVE_MATRIX -o %t_coop.out
// RUN: %{run} %t_coop.out

To all the existing tests.
I know, to isolate regression, we will need to run both and determine which one is really failing. But since the end goal is to make this the default, it is worthwhile adding the change directly to the tests.

//===----------------------------------------------------------------------===//
// SG size = 32 is not currently supported for SYCL Joint Matrix by IGC on DG2
// UNSUPPORTED: gpu-intel-dg2
// REQUIRES: gpu, aspect-ext_intel_matrix
Copy link
Contributor

@YuriPlyakhin YuriPlyakhin Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, clean-up REQUIRES: gpu that were not present in the original tests here and other tests

Suggested change
// REQUIRES: gpu, aspect-ext_intel_matrix
// REQUIRES: aspect-ext_intel_matrix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here an in another cases

// REQUIRES: gpu, aspect-ext_intel_matrix
// REQUIRES-INTEL-DRIVER: lin: 27501, win: 101.4943

// RUN: %{build} -D__SPIRV_USE_COOPERATIVE_MATRIX -o %t.out -ffp-model=precise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it seems you are using outdated set of tests. For example, please compare this file with: https://github.com/intel/llvm/blob/sycl/sycl/test-e2e/Matrix/SG32/joint_matrix_bf16_fill_k_cache.cpp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Signed-off-by: Sidorov, Dmitry <[email protected]>
@MrSidims
Copy link
Contributor Author

@YuriPlyakhin went manually through the tests, should be better now.

I've also removed XFAIL from several cpu runs. Few notes: the result of the testing on SPR inconsistent and it depends on the machine I'm running the tests. Not going to internal details: it might make sense. Eventually I expect more tests to pass in the post-commit, so something should be adjusted afterwards.

Signed-off-by: Sidorov, Dmitry <[email protected]>
Copy link
Contributor

@YuriPlyakhin YuriPlyakhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side I only found couple more places to fix.

Copy link
Contributor

@dkhaldi dkhaldi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except more xfails need to be removed on both cpu and pvc like.
SG32/joint_matrix_apply_two_matrices.cpp

If these are known bugs with the new coop matrix implementation, please create a jira.

Signed-off-by: Sidorov, Dmitry <[email protected]>
Copy link
Contributor

@YuriPlyakhin YuriPlyakhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MrSidims
Copy link
Contributor Author


Failed Tests (1):
SYCL :: ESIMD/mask_expand_load.cpp

is unrelated.

@intel/llvm-gatekeepers please help with the merge

@steffenlarsen
Copy link
Contributor

Issue reported in #15088.

@steffenlarsen steffenlarsen merged commit 15fbefc into intel:sycl Aug 15, 2024
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants