Skip to content

Conversation

jataylo
Copy link

@jataylo jataylo commented Sep 11, 2025

Change list:

  1. Use tl.minimum, tl.maximum instead of libdevice implementations
  2. Enable pipelining in reduction codegen
  3. Increase max block size in tuning
  4. Increase spill thresholding to avoid filtering out good configs
  5. Add waves_per_eu tune support for reduction configs
  6. Add new pointwise tuning configuration
  7. Add new reduction tuning configuration

iupaikov-amd and others added 14 commits September 11, 2025 12:37
(cherry picked from commit 5515cea)
(cherry picked from commit 4b62333)
Original PR had incorrect indentation. Updated PR such that autotune will always add tiny configs, otherwise use the hinted configs only.

(cherry picked from commit 8c58805)
(cherry picked from commit c8f6b02)
(cherry picked from commit a6e5fe6)
(cherry picked from commit 7c7ad78)
(cherry picked from commit 4377d1f)
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Sep 11, 2025

Jenkins build for 5e8eb7823acd695ae930a8bc5174b3892a6a5d11 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Sep 11, 2025

Jenkins build for 5e8eb7823acd695ae930a8bc5174b3892a6a5d11 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants