Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReImplement SMEM to RMEM Based on SwizzledLayout #51

Merged
merged 11 commits into from
Feb 13, 2025

Conversation

KuangjuX
Copy link
Collaborator

@KuangjuX KuangjuX commented Jan 31, 2025

Progress:

  • Implemented loading from SMEM to RMEM and storage from RMEM to SMEM based on Swizzle<3, 3, 3>.
  • Modified the STileIterator to enable traversal based on the Swizzle layout.
  • Support float data type for R2S Storer.
  • Add support for different WarpLayout configurations.
  • Support ColMajor SMEM/RMEM Loader/Storer.
  • Support Single Warp MMA.
  • Support ColMajor GMEM/SMEM Loader/Storer.
  • Support SIngle Warp Whole GEMM.
  • Verify the performance.

@KuangjuX KuangjuX marked this pull request as draft January 31, 2025 14:56
@KuangjuX KuangjuX marked this pull request as ready for review February 13, 2025 05:08
@KuangjuX KuangjuX requested a review from haruhi55 February 13, 2025 07:18
@@ -397,9 +396,12 @@ struct SharedOffsetHelper<WarpLayout_, WarpShape_, Shared_, kMode_,
constexpr static int kTilePerRow = Shared::kRows / WarpShape::kRows;
constexpr static int kTilePerCol = Shared::kCols / WarpShape::kCols;

// constexpr static int kRowStride = kTilePerRow / tl::num_rows<WarpLayout>;
// constexpr static int kColStride =
// kTilePerCol / tl::num_cols<WarpLayout> * kTilePerRow;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are lines 399 ~ 401 no longer useful, thus can be deleted?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted useless codes.

// using Shared = SharedTile<Element, tl::RowMajor<16, 16>>;
using Shared = SharedTile<Element, tl::RowMajor<64, 128>>;
// using Reg = RegTile<BaseTileRowMajor<Element>, tl::RowMajor<1, 1>>;
using Reg = RegTile<BaseTileRowMajor<Element>, tl::RowMajor<4, 8>>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain why lines 157, 161, 167, 187, 192, 220, and 222 have been commented out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I want to use the WarpLayout format which is more suitable for the current implementation.


using SharedLayout = tl::RowMajor<kRows, kCols>;
const bool kUseSwizzledLayout = true;
using Shared = SharedTile<Element, SharedLayout, kUseSwizzledLayout>;
using Reg = RegTile<BaseTileRowMajor<Element>, tl::RowMajor<2, 2>>;
// using Reg = RegTile<BaseTileRowMajor<Element>, tl::RowMajor<2, 2>>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would combining lines 246–247 with line 255 produce a unit test case that's worth adding to the tests, or does it simply duplicate an existing test case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted useless codes.

const int kWarpPerRow = 2;
const int kWarpPerCol = 2;
// const int kWarpPerRow = 2;
// const int kWarpPerCol = 2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the test cases on lines 276–277 and 281–282 provide unique coverage worthy of being added to the tests, or do they simply duplicate existing tests?

If they offer unique coverage, please leave them in the comments; otherwise, they should be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted useless codes.

run_test<96, 48, 80>();
// run_test<16, 16, 16>(); // Test the `BaseTile` 16x16x16
// run_test<16, 32, 16>();
// run_test<96, 48, 80>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining why these test cases are commented out if they're intended to provide unique coverage but aren't supported yet. Otherwise, delete them if they're not needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted useless codes.

@lcy-seso lcy-seso self-requested a review February 13, 2025 08:50
Copy link
Contributor

@lcy-seso lcy-seso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@KuangjuX KuangjuX merged commit c75a343 into microsoft:master Feb 13, 2025
3 checks passed
@KuangjuX KuangjuX deleted the s2r branch February 13, 2025 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants