-
Notifications
You must be signed in to change notification settings - Fork 4
Intial refactoring for mma layout types. #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: amd-integration
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR performs an initial refactoring to improve the documentation and code structure of MMA (Matrix Multiply-Accumulate) operations used in attention kernels. The refactoring extracts MMA layout definitions into dedicated header files and updates the prefill kernel to use the new abstractions.
Key Changes:
- Introduces platform-agnostic MMA layout type definitions with backend-specific implementations for HIP and CUDA
- Replaces hard-coded constants with named constants from layout structs (e.g.,
HALF_ELEMS_PER_THREAD→FRAG_SIZE,VECTOR_BIT_WIDTH→FRAG_BITWIDTH) - Adds comprehensive documentation for fragment layouts and thread-to-element ownership patterns
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 13 comments.
| File | Description |
|---|---|
libflashinfer/include/gpu_iface/mma_layout.hpp |
Defines base templates and enums for MMA layout abstractions with platform-agnostic interface |
libflashinfer/include/gpu_iface/backend/hip/mma_layout_hip.h |
Implements HIP/CDNA3-specific MFMA layout specializations for M16N16K16 operations |
libflashinfer/include/gpu_iface/backend/cuda/mma_layout_cuda.h |
Implements CUDA Tensor Core layout specializations for M16N8K16 and M16N16K16 operations |
libflashinfer/include/flashinfer/attention/generic/prefill.cuh |
Updates attention kernel to use new MMA layout constants and adds mma_layout.hpp include |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }; | ||
|
|
||
| template <> | ||
| struct wmma_c_layout<WmmaOp::M16N8K16, __nv_bfloat16> {}; |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty template specialization for wmma_c_layout<WmmaOp::M16N8K16, __nv_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.
| struct wmma_c_layout<WmmaOp::M16N8K16, __nv_bfloat16> {}; | |
| struct wmma_c_layout<WmmaOp::M16N8K16, __nv_bfloat16> { | |
| // Placeholder: bfloat16 support for wmma_c_layout<M16N8K16> is not yet implemented. | |
| static_assert(sizeof(__nv_bfloat16) == 0, | |
| "wmma_c_layout<WmmaOp::M16N8K16, __nv_bfloat16> is not implemented yet. bfloat16 support is planned for future releases."); | |
| }; |
| struct wmma_a_layout<WmmaOp::M16N16K16, __nv_bfloat16> {}; | ||
|
|
||
| /*! |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty template specialization for wmma_a_layout<WmmaOp::M16N16K16, __nv_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.
| struct wmma_a_layout<WmmaOp::M16N16K16, __nv_bfloat16> {}; | |
| /*! | |
| struct wmma_a_layout<WmmaOp::M16N16K16, __nv_bfloat16> { | |
| // bfloat16 support for M16N16K16 is not yet implemented. | |
| static_assert(sizeof(__nv_bfloat16) == 0, | |
| "wmma_a_layout<WmmaOp::M16N16K16, __nv_bfloat16> is not implemented: bfloat16 support is not yet available."); | |
| }; |
| }; | ||
|
|
||
| template <> | ||
| struct wmma_b_layout<WmmaOp::M16N16K16, __nv_bfloat16> {}; |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty template specialization for wmma_b_layout<WmmaOp::M16N16K16, __nv_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.
| struct wmma_b_layout<WmmaOp::M16N16K16, __nv_bfloat16> {}; | |
| struct wmma_b_layout<WmmaOp::M16N16K16, __nv_bfloat16> { | |
| // bfloat16 support for wmma_b_layout<WmmaOp::M16N16K16, __nv_bfloat16> is not yet implemented. | |
| // This is a placeholder specialization. If you see this error, bfloat16 support must be added. | |
| static_assert(sizeof(__nv_bfloat16) == 0, | |
| "wmma_b_layout<WmmaOp::M16N16K16, __nv_bfloat16> is not implemented: bfloat16 support is not available yet."); | |
| }; |
| template <> | ||
| struct wmma_a_layout<WmmaOp::M16N16K16, hip_bfloat16> {}; |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty template specialization for wmma_a_layout<WmmaOp::M16N16K16, hip_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.
| template <> | |
| struct wmma_a_layout<WmmaOp::M16N16K16, hip_bfloat16> {}; | |
| // Placeholder: bfloat16 support for wmma_a_layout is not yet implemented. | |
| template <> | |
| struct wmma_a_layout<WmmaOp::M16N16K16, hip_bfloat16> { | |
| static_assert(sizeof(hip_bfloat16) == 0, | |
| "wmma_a_layout<WmmaOp::M16N16K16, hip_bfloat16> is not implemented: bfloat16 support is not yet available for this specialization."); | |
| }; |
| template <> | ||
| struct wmma_b_layout<WmmaOp::M16N16K16, hip_bfloat16> {}; |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty template specialization for wmma_b_layout<WmmaOp::M16N16K16, hip_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.
| template <> | |
| struct wmma_b_layout<WmmaOp::M16N16K16, hip_bfloat16> {}; | |
| // Placeholder: bfloat16 support for wmma_b_layout is not yet implemented. | |
| template <> | |
| struct wmma_b_layout<WmmaOp::M16N16K16, hip_bfloat16> { | |
| static_assert(sizeof(hip_bfloat16) == 0, | |
| "wmma_b_layout<WmmaOp::M16N16K16, hip_bfloat16> is not implemented: bfloat16 support is not yet available."); | |
| }; |
| template <> | ||
| struct wmma_c_layout<WmmaOp::M16N16K16, __nv_bfloat16> {}; |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty template specialization for wmma_c_layout<WmmaOp::M16N16K16, __nv_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.
| template <> | |
| struct wmma_c_layout<WmmaOp::M16N16K16, __nv_bfloat16> {}; | |
| // Placeholder: bfloat16 support for wmma_c_layout<WmmaOp::M16N16K16, __nv_bfloat16> is not yet implemented. | |
| template <> | |
| struct wmma_c_layout<WmmaOp::M16N16K16, __nv_bfloat16> { | |
| static_assert(sizeof(__nv_bfloat16) == 0, | |
| "wmma_c_layout<WmmaOp::M16N16K16, __nv_bfloat16> is not implemented yet. bfloat16 support is planned for a future release."); | |
| }; |
| using IdType = IdType_; | ||
| using AttentionVariant = AttentionVariant_; | ||
|
|
||
| static constexpr uint32_t NUM_THREADS = NUM_WARPS_Q * NUM_WARPS_KV * WARP_SIZE; |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newly added line uses WARP_SIZE without qualification, but this constant may not be defined in the current scope. The old code used platform-specific values (64 for HIP, 32 for CUDA). Consider using a qualified constant like gpu_iface::WARP_SIZE or defining WARP_SIZE within KernelTraits, or keeping the platform-specific approach that was removed.
| static constexpr uint32_t NUM_THREADS = NUM_WARPS_Q * NUM_WARPS_KV * WARP_SIZE; | |
| static constexpr uint32_t NUM_THREADS = NUM_WARPS_Q * NUM_WARPS_KV * gpu_iface::kWarpSize; |
| @@ -0,0 +1,113 @@ | |||
| // SPDX-FileCopyrightText: 2025 Advanced Micro Devices, Inc. | |||
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title "Intial refactoring for mma layout types." contains a spelling error. "Intial" should be "Initial".
| M16N16K16, // Logical 16×16 output tile, k-dim=16 (may be composite) | ||
| }; | ||
|
|
||
| /*! * \brief Generic WMMA layout providing generic op-level traits |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment on line 41 has a formatting issue. There's an extra space and asterisk at the start: /*! * \brief should be /*! \brief (remove the extra space and asterisk after the opening /*!).
| /*! * \brief Generic WMMA layout providing generic op-level traits | |
| /*! \brief Generic WMMA layout providing generic op-level traits |
| }; | ||
|
|
||
| template <> | ||
| struct wmma_b_layout<WmmaOp::M16N8K16, __nv_bfloat16> {}; |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty template specialization for wmma_b_layout<WmmaOp::M16N8K16, __nv_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.
| struct wmma_b_layout<WmmaOp::M16N8K16, __nv_bfloat16> {}; | |
| struct wmma_b_layout<WmmaOp::M16N8K16, __nv_bfloat16> { | |
| // Placeholder: bfloat16 support for wmma_b_layout<M16N8K16> is not yet implemented. | |
| static_assert( | |
| sizeof(__nv_bfloat16) == 0, | |
| "wmma_b_layout<WmmaOp::M16N8K16, __nv_bfloat16> is not implemented: bfloat16 support is not yet available for this operation."); | |
| }; |
The PR does initial refactoring to improve the documentation and code structure of the MMA ops in prefill.cuh