[AMD] Improve shared layout for Wmma's operands #7319

leeliu103 · 2025-06-25T20:18:14Z

Swizzling is always disabled for Wmma's B operand, it should be disabled only when k dimension is not contiguous.

Both vectorSize, perPhase and maxPhase are now determined using a heuristic approach.

New contributor declaration

I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run pre-commit run --from-ref origin/main --to-ref HEAD.
Select one of the following.
- I have added tests.
  - /test for lit tests
  - /unittest for C++ tests
  - /python/test for end-to-end tests
- This PR does not need a test because this PR only updates the shared layout for Wmma's operand, and it's applicable across various cases.
Select one of the following.
- I have not added any lit tests.
- The lit tests I have added follow these best practices,
  including the "tests should be minimal" section. (Usually running Python code
  and using the instructions it generates is not minimal.)

antiagainst · 2025-06-25T20:21:11Z

include/triton/Dialect/TritonGPU/IR/TritonGPUAttrDefs.td

-          } else {
-            // Do not swizzle in case k dimension is not innermost.
-            // In this case accesses will go in different banks even without swizzling.
+          int kDimIndex = dotOpEnc.getOpIdx() == 0 ? 1 : 0;


While we are here can we do something like MFMA layout in the above to use a helper function?

you mean put the logic into some helper function like composeSharedLayoutForOperand?

refractored, also add @zhanglx13 @joviliast to review

Swizzling is always disabled for Wmma's B operand, it should be disabled only when k dimension is not contiguous. Both vectorSize, perPhase and maxPhase are now determined using a heuristic approach.

lib/Dialect/TritonGPU/IR/Dialect.cpp

antiagainst requested changes Jun 25, 2025

View reviewed changes

[AMD] Improve shared layout for Wmma's operands

df5ad78

Swizzling is always disabled for Wmma's B operand, it should be disabled only when k dimension is not contiguous. Both vectorSize, perPhase and maxPhase are now determined using a heuristic approach.

antiagainst marked this pull request as ready for review June 26, 2025 18:52

antiagainst requested a review from ptillet as a code owner June 26, 2025 18:52

antiagainst approved these changes Jun 26, 2025

View reviewed changes

zhanglx13 reviewed Jun 26, 2025

View reviewed changes

lib/Dialect/TritonGPU/IR/Dialect.cpp Outdated Show resolved Hide resolved

leeliu103 and others added 3 commits June 26, 2025 23:23

Move logic into composeSharedLayoutForOperand helper function

eab9c7d

fix misleading simdWidth name

aee34b9

Merge branch 'main' into fixsharedlayoutforWmmaoperand

2e29bb2

joviliast approved these changes Jun 27, 2025

View reviewed changes

zhanglx13 merged commit 21d2ef2 into triton-lang:main Jun 27, 2025
15 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] Improve shared layout for Wmma's operands #7319

[AMD] Improve shared layout for Wmma's operands #7319

Uh oh!

leeliu103 commented Jun 25, 2025

Uh oh!

antiagainst Jun 25, 2025

Uh oh!

leeliu103 Jun 25, 2025

Uh oh!

antiagainst Jun 25, 2025

Uh oh!

leeliu103 Jun 26, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[AMD] Improve shared layout for Wmma's operands #7319

[AMD] Improve shared layout for Wmma's operands #7319

Uh oh!

Conversation

leeliu103 commented Jun 25, 2025

New contributor declaration

Uh oh!

antiagainst Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

leeliu103 Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

antiagainst Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

leeliu103 Jun 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!