Skip to content

Performance optimization for geometric kernel #1562

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 14, 2025
Merged

Conversation

xytintel
Copy link
Contributor

@xytintel xytintel commented Apr 9, 2025

Reproduce case:

import torch
from torch.profiler import profile, ProfilerActivity

shape_list = [(8192, 8192)]
backward = False

if __name__ == "__main__":
    for shape in shape_list:
        for dtype in [torch.bfloat16, torch.float16, torch.float32]:
            input = torch.randn(shape, dtype=torch.bfloat16, device=torch.device("xpu"))

            # warm up
            input.geometric_(0.5)

            # go
            print(
                "shape:",
                (shape),
                "; datatype:",
                dtype,
                "; P:",
                0.5,
                "; backward:",
                backward,
            )
            with profile(
                activities=[ProfilerActivity.CPU, ProfilerActivity.XPU],
                record_shapes=True,
            ) as prof:
                for i in range(20):
                    input.geometric_(0.5)
            print(prof.key_averages().table(sort_by="xpu_time_total"))

@xytintel
Copy link
Contributor Author

Original:

# shape: (8192, 8192) ; datatype: torch.bfloat16 ; P: 0.5 ; backward: False
# -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
#                                                    Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg      Self XPU    Self XPU %     XPU total  XPU time avg    # of Calls  
# -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
#                                        aten::geometric_        55.27%     974.043us       100.00%       1.762ms      88.113us      14.921ms       100.00%      14.921ms     746.064us            20  
# at::native::xpu::DistributionElementwiseKernelFuncto...         0.00%       0.000us         0.00%       0.000us       0.000us      14.921ms       100.00%      14.921ms     746.064us            20  
#                                   urEnqueueKernelLaunch        44.73%     788.208us        44.73%     788.208us      39.410us       0.000us         0.00%       0.000us       0.000us            20  
# -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
# Self CPU time total: 1.762ms
# Self XPU time total: 14.921ms

# shape: (8192, 8192) ; datatype: torch.float16 ; P: 0.5 ; backward: False
# -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
#                                                    Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg      Self XPU    Self XPU %     XPU total  XPU time avg    # of Calls  
# -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
#                                        aten::geometric_        64.93%     793.297us       100.00%       1.222ms      61.084us      14.586ms       100.00%      14.586ms     729.312us            20  
# at::native::xpu::DistributionElementwiseKernelFuncto...         0.00%       0.000us         0.00%       0.000us       0.000us      14.586ms       100.00%      14.586ms     729.312us            20  
#                                   urEnqueueKernelLaunch        35.07%     428.392us        35.07%     428.392us      21.420us       0.000us         0.00%       0.000us       0.000us            20  
# -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
# Self CPU time total: 1.222ms
# Self XPU time total: 14.586ms

# shape: (8192, 8192) ; datatype: torch.float32 ; P: 0.5 ; backward: False
# -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
#                                                    Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg      Self XPU    Self XPU %     XPU total  XPU time avg    # of Calls  
# -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
#                                        aten::geometric_        64.42%     815.162us       100.00%       1.265ms      63.270us      15.043ms       100.00%      15.043ms     752.160us            20  
# at::native::xpu::DistributionElementwiseKernelFuncto...         0.00%       0.000us         0.00%       0.000us       0.000us      15.043ms       100.00%      15.043ms     752.160us            20  
#                                   urEnqueueKernelLaunch        35.58%     450.228us        35.58%     450.228us      22.511us       0.000us         0.00%       0.000us       0.000us            20  
# -------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
# Self CPU time total: 1.265ms
# Self XPU time total: 15.043ms

Optimized:

shape: (8192, 8192) ; datatype: torch.bfloat16 ; P: 0.5 ; backward: False
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg      Self XPU    Self XPU %     XPU total  XPU time avg    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                       aten::geometric_        57.44%     859.941us       100.00%       1.497ms      74.860us       9.351ms       100.00%       9.351ms     467.544us            20  
at::native::xpu::DistributionElementwiseKernelFuncto...         0.00%       0.000us         0.00%       0.000us       0.000us       9.351ms       100.00%       9.351ms     467.544us            20  
                                  urEnqueueKernelLaunch        42.56%     637.265us        42.56%     637.265us      31.863us       0.000us         0.00%       0.000us       0.000us            20  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 1.497ms
Self XPU time total: 9.351ms

shape: (8192, 8192) ; datatype: torch.float16 ; P: 0.5 ; backward: False
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg      Self XPU    Self XPU %     XPU total  XPU time avg    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                       aten::geometric_        66.84%     761.641us       100.00%       1.139ms      56.972us       8.922ms       100.00%       8.922ms     446.104us            20  
at::native::xpu::DistributionElementwiseKernelFuncto...         0.00%       0.000us         0.00%       0.000us       0.000us       8.922ms       100.00%       8.922ms     446.104us            20  
                                  urEnqueueKernelLaunch        33.16%     377.805us        33.16%     377.805us      18.890us       0.000us         0.00%       0.000us       0.000us            20  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 1.139ms
Self XPU time total: 8.922ms

shape: (8192, 8192) ; datatype: torch.float32 ; P: 0.5 ; backward: False
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg      Self XPU    Self XPU %     XPU total  XPU time avg    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                       aten::geometric_        66.84%     751.307us       100.00%       1.124ms      56.204us       9.132ms       100.00%       9.132ms     456.584us            20  
at::native::xpu::DistributionElementwiseKernelFuncto...         0.00%       0.000us         0.00%       0.000us       0.000us       9.132ms       100.00%       9.132ms     456.584us            20  
                                  urEnqueueKernelLaunch        33.16%     372.776us        33.16%     372.776us      18.639us       0.000us         0.00%       0.000us       0.000us            20  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
Self CPU time total: 1.124ms
Self XPU time total: 9.132ms

@xytintel xytintel added this pull request to the merge queue Apr 14, 2025
Merged via the queue into main with commit 5dac48a Apr 14, 2025
5 of 7 checks passed
@xytintel xytintel deleted the xyt/geometric_kernel branch April 14, 2025 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants