Optimize roi_align on BMG #1698

jianyizh · 2025-05-26T02:33:26Z

For input [1, 2048, 50, 75], rois [1000,5], roi align takes 4.7 ms on PVC but 75 ms on BMG. Each roi will have 2048xoutput_hxoutput_w work items reading the same value from LLC, and it's very slow on BMG. After put them into shared local memory, PVC takes 4.0ms, BMG reaches 7.5ms. I also removed some if else branching by min/max. I also fix a code style issue.

src/ATen/native/xpu/sycl/RoiAlignKernels.cpp

src/ATen/native/xpu/sycl/UpSampleBilinear2dKernels.cpp

Copilot

Pull Request Overview

This PR aims to optimize the roi_align performance on BMG by reducing repeated LLC memory accesses and streamlining conditional execution. Key changes include refactoring boundaries and conditional checks in the upsample kernels, and enhancing workgroup-based caching and indexing in the roi_align implementation.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
src/ATen/native/xpu/sycl/UpSampleBilinear2dKernels.cpp	Refined boundary condition handling and restructured the can_optimize condition
src/ATen/native/xpu/sycl/RoiAlignKernels.cpp	Updated bilinear interpolation clamping and improved ROI workgroup indexing with shared memory caching

Comments suppressed due to low confidence (1)

src/ATen/native/xpu/sycl/UpSampleBilinear2dKernels.cpp:608

Consider refactoring this compound conditional for 'can_optimize' to improve readability and maintainability, perhaps by extracting it into a helper function if it is reused.

can_optimize = can_optimize && (align_corners || (input_width == (rwidth * output_width) &&

src/ATen/native/xpu/sycl/RoiAlignKernels.cpp

jianyizh added 2 commits May 24, 2025 23:29

save

a56ddb6

style

3866176

jianyizh added kernel_optimization hw: BMG labels May 26, 2025

small dim

57fbe22

xytintel approved these changes May 26, 2025

View reviewed changes

Merge branch 'main' into jianyi/roi_align

60a466d

xytintel marked this pull request as ready for review May 26, 2025 04:43

EikanWang requested changes May 26, 2025

View reviewed changes

jianyizh and others added 5 commits May 26, 2025 23:25

update

378a035

style

2b07b17

fix barrier

8ea16f8

fix

834be4c

Merge branch 'main' into jianyi/roi_align

5153f32

jianyizh requested a review from Copilot May 28, 2025 05:49

Copilot AI reviewed May 28, 2025

View reviewed changes

src/ATen/native/xpu/sycl/RoiAlignKernels.cpp Show resolved Hide resolved

jianyizh added 2 commits May 28, 2025 13:28

remove some if branch

a764ebe

style

1775046

EikanWang approved these changes Jun 17, 2025

View reviewed changes

xytintel added this pull request to the merge queue Jun 17, 2025

Merged via the queue into main with commit 337deed Jun 17, 2025
7 checks passed

xytintel deleted the jianyi/roi_align branch June 17, 2025 03:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize roi_align on BMG #1698

Optimize roi_align on BMG #1698

jianyizh commented May 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Optimize roi_align on BMG #1698

Optimize roi_align on BMG #1698

Conversation

jianyizh commented May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jianyizh commented May 26, 2025 •

edited

Loading