Intial refactoring for mma layout types. #49

diptorupd · 2025-11-14T05:31:33Z

The PR does initial refactoring to improve the documentation and code structure of the MMA ops in prefill.cuh

Copilot

Pull Request Overview

This PR performs an initial refactoring to improve the documentation and code structure of MMA (Matrix Multiply-Accumulate) operations used in attention kernels. The refactoring extracts MMA layout definitions into dedicated header files and updates the prefill kernel to use the new abstractions.

Key Changes:

Introduces platform-agnostic MMA layout type definitions with backend-specific implementations for HIP and CUDA
Replaces hard-coded constants with named constants from layout structs (e.g., HALF_ELEMS_PER_THREAD → FRAG_SIZE, VECTOR_BIT_WIDTH → FRAG_BITWIDTH)
Adds comprehensive documentation for fragment layouts and thread-to-element ownership patterns

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 13 comments.

File	Description
`libflashinfer/include/gpu_iface/mma_layout.hpp`	Defines base templates and enums for MMA layout abstractions with platform-agnostic interface
`libflashinfer/include/gpu_iface/backend/hip/mma_layout_hip.h`	Implements HIP/CDNA3-specific MFMA layout specializations for M16N16K16 operations
`libflashinfer/include/gpu_iface/backend/cuda/mma_layout_cuda.h`	Implements CUDA Tensor Core layout specializations for M16N8K16 and M16N16K16 operations
`libflashinfer/include/flashinfer/attention/generic/prefill.cuh`	Updates attention kernel to use new MMA layout constants and adds mma_layout.hpp include

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-14T05:37:55Z

libflashinfer/include/gpu_iface/backend/cuda/mma_layout_cuda.h

+};
+
+template <>
+struct wmma_c_layout<WmmaOp::M16N8K16, __nv_bfloat16> {};


Empty template specialization for wmma_c_layout<WmmaOp::M16N8K16, __nv_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.

Suggested change

struct wmma_c_layout<WmmaOp::M16N8K16, __nv_bfloat16> {};

struct wmma_c_layout<WmmaOp::M16N8K16, __nv_bfloat16> {

// Placeholder: bfloat16 support for wmma_c_layout<M16N8K16> is not yet implemented.

static_assert(sizeof(__nv_bfloat16) == 0,

"wmma_c_layout<WmmaOp::M16N8K16, __nv_bfloat16> is not implemented yet. bfloat16 support is planned for future releases.");

};

Copilot · 2025-11-14T05:37:55Z

libflashinfer/include/gpu_iface/backend/cuda/mma_layout_cuda.h

+struct wmma_a_layout<WmmaOp::M16N16K16, __nv_bfloat16> {};
+
+/*!


Empty template specialization for wmma_a_layout<WmmaOp::M16N16K16, __nv_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.

Suggested change

struct wmma_a_layout<WmmaOp::M16N16K16, __nv_bfloat16> {};

/*!

struct wmma_a_layout<WmmaOp::M16N16K16, __nv_bfloat16> {

// bfloat16 support for M16N16K16 is not yet implemented.

static_assert(sizeof(__nv_bfloat16) == 0,

"wmma_a_layout<WmmaOp::M16N16K16, __nv_bfloat16> is not implemented: bfloat16 support is not yet available.");

};

Copilot · 2025-11-14T05:37:55Z

libflashinfer/include/gpu_iface/backend/cuda/mma_layout_cuda.h

+};
+
+template <>
+struct wmma_b_layout<WmmaOp::M16N16K16, __nv_bfloat16> {};


Empty template specialization for wmma_b_layout<WmmaOp::M16N16K16, __nv_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.

Suggested change

struct wmma_b_layout<WmmaOp::M16N16K16, __nv_bfloat16> {};

struct wmma_b_layout<WmmaOp::M16N16K16, __nv_bfloat16> {

// bfloat16 support for wmma_b_layout<WmmaOp::M16N16K16, __nv_bfloat16> is not yet implemented.

// This is a placeholder specialization. If you see this error, bfloat16 support must be added.

static_assert(sizeof(__nv_bfloat16) == 0,

"wmma_b_layout<WmmaOp::M16N16K16, __nv_bfloat16> is not implemented: bfloat16 support is not available yet.");

};

Copilot · 2025-11-14T05:37:55Z

libflashinfer/include/gpu_iface/backend/hip/mma_layout_hip.h

+template <>
+struct wmma_a_layout<WmmaOp::M16N16K16, hip_bfloat16> {};


Empty template specialization for wmma_a_layout<WmmaOp::M16N16K16, hip_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.

Suggested change

template <>

struct wmma_a_layout<WmmaOp::M16N16K16, hip_bfloat16> {};

// Placeholder: bfloat16 support for wmma_a_layout is not yet implemented.

template <>

struct wmma_a_layout<WmmaOp::M16N16K16, hip_bfloat16> {

static_assert(sizeof(hip_bfloat16) == 0,

"wmma_a_layout<WmmaOp::M16N16K16, hip_bfloat16> is not implemented: bfloat16 support is not yet available for this specialization.");

};

Copilot · 2025-11-14T05:37:56Z

libflashinfer/include/gpu_iface/backend/hip/mma_layout_hip.h

+template <>
+struct wmma_b_layout<WmmaOp::M16N16K16, hip_bfloat16> {};


Empty template specialization for wmma_b_layout<WmmaOp::M16N16K16, hip_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.

Suggested change

template <>

struct wmma_b_layout<WmmaOp::M16N16K16, hip_bfloat16> {};

// Placeholder: bfloat16 support for wmma_b_layout is not yet implemented.

template <>

struct wmma_b_layout<WmmaOp::M16N16K16, hip_bfloat16> {

static_assert(sizeof(hip_bfloat16) == 0,

"wmma_b_layout<WmmaOp::M16N16K16, hip_bfloat16> is not implemented: bfloat16 support is not yet available.");

};

Copilot · 2025-11-14T05:37:57Z

libflashinfer/include/gpu_iface/backend/cuda/mma_layout_cuda.h

+template <>
+struct wmma_c_layout<WmmaOp::M16N16K16, __nv_bfloat16> {};


Empty template specialization for wmma_c_layout<WmmaOp::M16N16K16, __nv_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.

Suggested change

template <>

struct wmma_c_layout<WmmaOp::M16N16K16, __nv_bfloat16> {};

// Placeholder: bfloat16 support for wmma_c_layout<WmmaOp::M16N16K16, __nv_bfloat16> is not yet implemented.

template <>

struct wmma_c_layout<WmmaOp::M16N16K16, __nv_bfloat16> {

static_assert(sizeof(__nv_bfloat16) == 0,

"wmma_c_layout<WmmaOp::M16N16K16, __nv_bfloat16> is not implemented yet. bfloat16 support is planned for a future release.");

};

Copilot · 2025-11-14T05:37:57Z

libflashinfer/include/flashinfer/attention/generic/prefill.cuh

  using IdType = IdType_;
  using AttentionVariant = AttentionVariant_;

+  static constexpr uint32_t NUM_THREADS = NUM_WARPS_Q * NUM_WARPS_KV * WARP_SIZE;


The newly added line uses WARP_SIZE without qualification, but this constant may not be defined in the current scope. The old code used platform-specific values (64 for HIP, 32 for CUDA). Consider using a qualified constant like gpu_iface::WARP_SIZE or defining WARP_SIZE within KernelTraits, or keeping the platform-specific approach that was removed.

Suggested change

static constexpr uint32_t NUM_THREADS = NUM_WARPS_Q * NUM_WARPS_KV * WARP_SIZE;

static constexpr uint32_t NUM_THREADS = NUM_WARPS_Q * NUM_WARPS_KV * gpu_iface::kWarpSize;

Copilot · 2025-11-14T05:37:57Z

libflashinfer/include/gpu_iface/mma_layout.hpp

@@ -0,0 +1,113 @@
+// SPDX-FileCopyrightText: 2025 Advanced Micro Devices, Inc.


The title "Intial refactoring for mma layout types." contains a spelling error. "Intial" should be "Initial".

Copilot · 2025-11-14T05:37:57Z

libflashinfer/include/gpu_iface/mma_layout.hpp

+  M16N16K16,  // Logical 16×16 output tile, k-dim=16 (may be composite)
+};
+
+/*! * \brief Generic WMMA layout providing generic op-level traits


The comment on line 41 has a formatting issue. There's an extra space and asterisk at the start: /*! * \brief should be /*! \brief (remove the extra space and asterisk after the opening /*!).

Suggested change

/*! * \brief Generic WMMA layout providing generic op-level traits

/*! \brief Generic WMMA layout providing generic op-level traits

Copilot · 2025-11-14T05:37:58Z

libflashinfer/include/gpu_iface/backend/cuda/mma_layout_cuda.h

+};
+
+template <>
+struct wmma_b_layout<WmmaOp::M16N8K16, __nv_bfloat16> {};


Empty template specialization for wmma_b_layout<WmmaOp::M16N8K16, __nv_bfloat16> is declared but not implemented. If bfloat16 support is not yet implemented, consider adding a comment explaining this is a placeholder for future implementation, or use static_assert to provide a clear error message if this specialization is used.

Suggested change

struct wmma_b_layout<WmmaOp::M16N8K16, __nv_bfloat16> {};

struct wmma_b_layout<WmmaOp::M16N8K16, __nv_bfloat16> {

// Placeholder: bfloat16 support for wmma_b_layout<M16N8K16> is not yet implemented.

static_assert(

sizeof(__nv_bfloat16) == 0,

"wmma_b_layout<WmmaOp::M16N8K16, __nv_bfloat16> is not implemented: bfloat16 support is not yet available for this operation.");

};

Intial refactoring for mma layout types.

7c1b051

diptorupd requested review from Copilot, demandal25 and rtmadduri November 14, 2025 05:31

Copilot started reviewing on behalf of diptorupd November 14, 2025 05:32 View session

Copilot finished reviewing on behalf of diptorupd November 14, 2025 05:35

Copilot AI reviewed Nov 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intial refactoring for mma layout types. #49

Intial refactoring for mma layout types. #49

Uh oh!

diptorupd commented Nov 14, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Copilot AI Nov 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-struct wmma_c_layout<WmmaOp::M16N8K16, __nv_bfloat16> {};
+struct wmma_c_layout<WmmaOp::M16N8K16, __nv_bfloat16> {
+  // Placeholder: bfloat16 support for wmma_c_layout<M16N8K16> is not yet implemented.
+  static_assert(sizeof(__nv_bfloat16) == 0,
+                "wmma_c_layout<WmmaOp::M16N8K16, __nv_bfloat16> is not implemented yet. bfloat16 support is planned for future releases.");
+};

		struct wmma_a_layout<WmmaOp::M16N16K16, __nv_bfloat16> {};

		/*!

		template <>
		struct wmma_a_layout<WmmaOp::M16N16K16, hip_bfloat16> {};

		template <>
		struct wmma_b_layout<WmmaOp::M16N16K16, hip_bfloat16> {};

		template <>
		struct wmma_c_layout<WmmaOp::M16N16K16, __nv_bfloat16> {};

	static constexpr uint32_t NUM_THREADS = NUM_WARPS_Q * NUM_WARPS_KV * WARP_SIZE;
	static constexpr uint32_t NUM_THREADS = NUM_WARPS_Q * NUM_WARPS_KV * gpu_iface::kWarpSize;

		@@ -0,0 +1,113 @@
		// SPDX-FileCopyrightText: 2025 Advanced Micro Devices, Inc.

	/! \brief Generic WMMA layout providing generic op-level traits
	/*! \brief Generic WMMA layout providing generic op-level traits

Intial refactoring for mma layout types. #49

Are you sure you want to change the base?

Intial refactoring for mma layout types. #49

Uh oh!

Conversation

diptorupd commented Nov 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant