Skip to content

Conversation

alexsamardzic
Copy link

No description provided.

@meta-cla meta-cla bot added the cla signed label Oct 3, 2025
@alexsamardzic
Copy link
Author

Currently, torch_compile_grouped_gemm and preprocessed_pt2_triton_grouped_mm are the same.

I think there is no point to benchmark torch_compile_grouped_gemm, as it takes into account both the auto-tuning and "preprocessing" arguments. On the other hand, with the change in this PR, preprocessed_pt2_triton_grouped_mm is on par with preprocessed_aten_grouped_mm, which is expected.

(I believe the point about auto-tuning holds for triton_grouped_gemm too.)

@xuzhao9
Copy link
Contributor

xuzhao9 commented Oct 3, 2025

cc @NikhilAPatel, can you help take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants