[release/2.9] Cherry-picks from release/2.8 10/06/2025 #2701

pragupta · 2025-10-08T22:09:47Z

Steps followed:

>>> git fetch rocm_fork release/2.8:rocm/release/2.8
>>> git fetch origin release/2.8:release/2.8
>>> first_commit=$(git log --reverse --pretty=format:"%H" rocm/release/2.8 ^release/2.8 | head -n 1)
>>> git cherry release/2.8 rocm/release/2.8 ${first_commit}~1 -v | grep -v "^-" | grep -n "^" | tee commits.txt
<Review list of commits and add relevant tags. Any lines with "CHERRY-PICK " would be considered for cherry-picking into release/2.9>
>>> git checkout release_2.9_IFU_20251006

Full list of commits considered for cherry-picking:
commits_2_9.txt

Tested on gfx942 using the following build: registry-sc-harbor.amd.com/framework/compute-rocm-rel-7.0:38_ubuntu22.04_py3.10_pytorch_rocm7.1_internal_testing_28f820ab

(cherry picked from commit e294d4d with modifications for release/2.8) Reintroduce CIRCLE_TAG to be able to set PYTORCH_BUILD_VERSION without date (cherry picked from commit 71a30ea)

…for py3.9; upgrade tensorboard compatible with numpy 2 Co-authored-by: Ethan Wee <[email protected]> (cherry picked from commit e867a3d) (cherry picked from commit c7a1e32) (cherry picked from commit 2a215e4) (cherry picked from commit 866cc1d) (cherry picked from commit 4b46310)

(cherry picked from commit 3d102a0)

(cherry picked from commit cb98724)

(cherry picked from commit ba1ba26) (cherry picked from commit 4e3462e) (cherry picked from commit 85ac538)

…_rcpf(x) instead of 1.f/x (#1800) Cherry-pick of #1688 Co-authored-by: Michael Halkenhäuser <[email protected]> Co-authored-by: Hashem Hashemi <[email protected]> (cherry picked from commit f8544af) (cherry picked from commit ed48754) (cherry picked from commit d62a39e) (cherry picked from commit b26ddb8)

Related to c7a1e32 Fixes https://ontrack-internal.amd.com/browse/SWDEV-537835 Not a Navi specific failure: ``` File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_device_type.py", line 1412, in only_fn return fn(slf, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "/var/lib/jenkins/pytorch/test/test_binary_ufuncs.py", line 1671, in test_cuda_tensor_pow_scalar_tensor self._test_pow(base, exp) File "/var/lib/jenkins/pytorch/test/test_binary_ufuncs.py", line 1482, in _test_pow self.assertEqual(actual, expected) File "/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/testing/_internal/common_utils.py", line 4052, in assertEqual raise error_metas.pop()[0].to_error( AssertionError: The values for attribute 'dtype' do not match: torch.float32 != torch.float64. ``` Using .to(actual) without specifying dtype/device assumes actual is a tensor or tensor-like, which may fail silently or promote. Fixed by explicitly matching dtype and device. Going from pytorch#107302 Fix: ``` root@ubb4-rack-22:/var/lib/jenkins/pytorch# TEST_CONFIG=default HIP_VISIBLE_DEVICES=0 PYTORCH_TEST_WITH_ROCM=1 python test/test_binary_ufuncs.py TestBinaryUfuncsCUDA.test_cuda_tensor_pow_scalar_tensor_cuda /opt/conda/envs/py_3.12/lib/python3.12/site-packages/hypothesis/entry_points.py:23: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. import pkg_resources Running tests... ---------------------------------------------------------------------- . ---------------------------------------------------------------------- Ran 1 test in 0.141s OK Generating XML reports... root@ubb4-rack-22:/var/lib/jenkins/pytorch# pip list | grep numpy numpy 2.1.2 ``` (cherry picked from commit a4d60fa) (cherry picked from commit 9f11871)

This PR fixes the unit test, test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction FAILED [0.1163s] ``` Traceback (most recent call last): File "/var/lib/jenkins/pytorch/test/test_cuda.py", line 471, in test_set_per_process_memory_fraction tmp_tensor = torch.empty(application, dtype=torch.int8, device="cuda") RuntimeError: Trying to create tensor with negative dimension -5681285432: [-5681285432] ``` This error occurs only on gfx1101 arch. This error is coming from an integer overflow when another unit test, test/test_cuda.py::TestCuda::test_randint_generation_for_large_numel creates a tensor with a huge numel, which overflows into a higher torch.cuda.max_memory_reserved() when you call test/test_cuda.py::TestCuda::test_set_per_process_memory_fraction afterward. To avoid this we introduced torch.cuda.empty_cache() and torch.cuda.reset_peak_memory_stats() to clean up CUDA states. JIRA: https://ontrack-internal.amd.com/browse/SWDEV-535295 (cherry picked from commit f86d184) (cherry picked from commit 1b44228)

…g torch and numpy tensors (#2362) Cherry-pick of #2340 Co-authored-by: Dmitry Nikolaev <[email protected]> (cherry picked from commit 22c98ea) (cherry picked from commit 2d72fcd)

pip installed requirements.txt and .ci/docker/requirements-ci.txt Local validation: `Successfully installed jinja2-3.1.6 lintrunner-0.12.7 mypy-1.14.0 onnxscript-0.2.2 sympy-1.13.3 tlparse-0.3.30 z3-solver-4.12.6.0` (cherry picked from commit 30508ff) (cherry picked from commit 22d02e8)

Adds initial autotuning for foreach support required for https://ontrack-internal.amd.com/browse/SWDEV-539076 4x improvement for some kernels Before: triton_for_fused_18.kd 🔍 | 4.986 ms | 4.986 ms | 2.493 ms | 2 | triton_for_fused_6.kd 🔍 | 0.098 ms | 0.098 ms | 0.049 ms | 2 | triton_for_fused_7.kd 🔍 | 0.036 ms | 0.036 ms | 0.018 ms | 2 | After: triton_for_fused_18.kd 🔍 | 1.273 ms | 1.273 ms | 0.636 ms | 2 | triton_for_fused_6.kd 🔍 | 0.044 ms | 0.044 ms | 0.022 ms | 2 | triton_for_fused_7.kd 🔍 | 0.024 ms | 0.024 ms | 0.012 ms | 2 | (cherry picked from commit f07b7f7) (cherry picked from commit ed0d0a7)

Relands #2416 with caching fix Upstream equivalent pytorch#159146 --------- Co-authored-by: Jithun Nair <[email protected]> (cherry picked from commit f0aebdc) (cherry picked from commit 9c429dd)

… Fix warps runtime part 2 (#2455) Cherry-pick of #2442 Co-authored-by: Jack Taylor <[email protected]> (cherry picked from commit 77a6760)

…ersistent reduction and no_x_dim removal (#2454) Cherry-pick of #2417 Need to resolve conflicts --------- Co-authored-by: Jack Taylor <[email protected]> (cherry picked from commit eb47158)

Perf improvement for triton tanh (cherry picked from commit 4febbd8)

… rocm version (#2529) Cherry-pick of #2518 Co-authored-by: Ethan Wee <[email protected]> (cherry picked from commit c03be63)

Fixes SWDEV-543698 (https://ontrack-internal.amd.com/browse/SWDEV-543698) Cherry-picked from #2502 This PR fixes the errors like below: ``` [rank3]: RuntimeError: The following operation failed in the TorchScript interpreter. [rank3]: Traceback of TorchScript (most recent call last): [rank3]: RuntimeError: /tmp/comgr-28f951/input/CompileSourceACC062:67:7: error: unknown type name 'uint32_t'; did you mean '__hip_internal::uint32_t'? [rank3]: 67 | uint32_t int32; [rank3]: | ^~~~~~~~ [rank3]: | __hip_internal::uint32_t ``` Earlier uint32_t was defined in HIP headers in std namespace. Now it is moved to __hip_internal namespace in hip headers. This change is made in ROCm 7.0. (cherry picked from commit b2fb688)

…2598) Cherry-pick of #2597 Co-authored-by: Jerry Mannil <[email protected]> (cherry picked from commit 9ea02c4)

Original PR (#2417) had incorrect indentation. Updated PR such that autotune will always add tiny configs, otherwise use the hinted configs only. Tested locally on test_torchinductor: Ran 894 tests in 952.242s FAILED (failures=1, skipped=28) And completed autotune runs for microbench models Microbenchmark for network : resnet152 Num devices: 1 Dtype: FP32 Mini batch size [img] : 64 Time per mini-batch : 0.09107530117034912 Throughput [img/sec] : 702.7152167226226 (cherry picked from commit db3ba66)

cherry-pick of 8d42697 (cherry picked from commit 0b82d9a)

* cherry-pick of pytorch@2aadcea (cherry picked from commit bd74018)

cherry-pick of pytorch#163869 (cherry picked from commit dfd386f)

rocm-repo-management-api · 2025-10-08T22:17:48Z

Jenkins build for 2fe5c2e6145fe3efa37d93597fcdc8d53fed41f2 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

/var/lib/jenkins/pytorch/torch/headeronly/macros/Export.h:130:9: note: previous definition is here
  130 | #define TORCH_HIP_API C10_IMPORT
      |         ^
2 warnings generated when compiling for host.
[7430/8156] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/torch_hip_generated_SparseSemiStructuredApplyDense.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/torch_hip_generated_SparseSemiStructuredApplyDense.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/torch_hip_generated_SparseSemiStructuredApplyDense.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip && /opt/conda/envs/py_3.12/lib/python3.12/site-packages/cmake/data/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/. && /opt/conda/envs/py_3.12/lib/python3.12/site-packages/cmake/data/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/./torch_hip_generated_SparseSemiStructuredApplyDense.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/torch_hip_generated_SparseSemiStructuredApplyDense.hip.o.cmake
clang++: warning: argument unused during compilation: '--offload-compress' [-Wunused-command-line-argument]
sccache: encountered fatal error
sccache: error: Failed to parse included file path
sccache: caused by: Failed to parse included file path

rocm-repo-management-api · 2025-10-09T18:20:39Z

Jenkins build for 7f74e862eb6dd84c9e6cbd5b551370b983378666 commit finished as ABORTED
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-10-09T19:19:50Z

Jenkins build for 7f74e862eb6dd84c9e6cbd5b551370b983378666 commit finished as NOT_BUILT
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-10-09T19:20:30Z

Jenkins build for 7f74e862eb6dd84c9e6cbd5b551370b983378666 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

/var/lib/jenkins/pytorch/torch/headeronly/macros/Export.h:130:9: note: previous definition is here
  130 | #define TORCH_HIP_API C10_IMPORT
      |         ^
2 warnings generated when compiling for host.
[7285/8156] Building HIPCC object caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/torch_hip_generated_SparseSemiStructuredApplyDense.hip.o
FAILED: caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/torch_hip_generated_SparseSemiStructuredApplyDense.hip.o /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/torch_hip_generated_SparseSemiStructuredApplyDense.hip.o 
cd /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip && /opt/conda/envs/py_3.12/lib/python3.12/site-packages/cmake/data/bin/cmake -E make_directory /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/. && /opt/conda/envs/py_3.12/lib/python3.12/site-packages/cmake/data/bin/cmake -D verbose:BOOL=OFF -D build_configuration:STRING=RELEASE -D generated_file:STRING=/var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/./torch_hip_generated_SparseSemiStructuredApplyDense.hip.o -P /var/lib/jenkins/pytorch/build/caffe2/CMakeFiles/torch_hip.dir/__/aten/src/ATen/native/sparse/hip/torch_hip_generated_SparseSemiStructuredApplyDense.hip.o.cmake
clang++: warning: argument unused during compilation: '--offload-compress' [-Wunused-command-line-argument]
sccache: encountered fatal error
sccache: error: Failed to parse included file path
sccache: caused by: Failed to parse included file path

jithunnair-amd · 2025-10-10T19:43:12Z

@pragupta Just confirming: all the SKIPPED commits I see regarding unit test failures are skipped because we want to see if those are still needed, and not necessarily because they are upstreamed, right?

jithunnair-amd and others added 22 commits October 6, 2025 20:01

[release/2.8] Enable wheels

d35a48b

(cherry picked from commit e294d4d with modifications for release/2.8) Reintroduce CIRCLE_TAG to be able to set PYTORCH_BUILD_VERSION without date (cherry picked from commit 71a30ea)

Use ROCm/triton and update triton.txt

aa7e36e

(cherry picked from commit 3d102a0)

Add related_commits file (#2396)

688d400

(cherry picked from commit cb98724)

Add QA automation scripts for running PyTorch unit tests

e4b5213

(cherry picked from commit ba1ba26) (cherry picked from commit 4e3462e) (cherry picked from commit 85ac538)

[AUTOGENERATED] [release/2.7] [release/2.6] Fix dtype before comparin…

8326dc9

…g torch and numpy tensors (#2362) Cherry-pick of #2340 Co-authored-by: Dmitry Nikolaev <[email protected]> (cherry picked from commit 22c98ea) (cherry picked from commit 2d72fcd)

[release/2.7] [SWDEV-543214] Reland #2416 Fix warps runtime (#2421)

e5e987e

Relands #2416 with caching fix Upstream equivalent pytorch#159146 --------- Co-authored-by: Jithun Nair <[email protected]> (cherry picked from commit f0aebdc) (cherry picked from commit 9c429dd)

[AUTOGENERATED] [release/2.8] [release/2.7] [SWDEV-543214] Reland #2416…

accb1ab

… Fix warps runtime part 2 (#2455) Cherry-pick of #2442 Co-authored-by: Jack Taylor <[email protected]> (cherry picked from commit 77a6760)

[AUTOGENERATED] [release/2.8] [SWDEV-539215] - Autotune support for p…

5343942

…ersistent reduction and no_x_dim removal (#2454) Cherry-pick of #2417 Need to resolve conflicts --------- Co-authored-by: Jack Taylor <[email protected]> (cherry picked from commit eb47158)

[SWDEV-539119] [release/2.8] Add fast_tanh support (#2484)

6c9047a

Perf improvement for triton tanh (cherry picked from commit 4febbd8)

[AUTOGENERATED] [release/2.8] Change triton package name depending on…

dec58b7

… rocm version (#2529) Cherry-pick of #2518 Co-authored-by: Ethan Wee <[email protected]> (cherry picked from commit c03be63)

[AUTOGENERATED] [release/2.8] [ROCm] OffsetCalc Unroll Optimization (#…

28980b0

…2598) Cherry-pick of #2597 Co-authored-by: Jerry Mannil <[email protected]> (cherry picked from commit 9ea02c4)

[ROCm] Fix indexing_backward_kernel perf (#2667)

9c1d58b

cherry-pick of 8d42697 (cherry picked from commit 0b82d9a)

[ROCm] Improve perf for elementwise broadcast with mixed dtype (#2672)

d69ea0f

* cherry-pick of pytorch@2aadcea (cherry picked from commit bd74018)

[ROCm] Implement float32 copy kernel (#2683)

86692dd

cherry-pick of pytorch#163869 (cherry picked from commit dfd386f)

pragupta requested review from jataylo, jeffdaily, jithunnair-amd and pruthvistony as code owners October 8, 2025 22:09

pragupta force-pushed the release_2.9_IFU_20251006 branch from 2fe5c2e to 20ac4a0 Compare October 9, 2025 18:04

Bump triton to 3.5.x

3b60258

Update fbgemm submodule to avoid ck errors

7f74e86

pragupta force-pushed the release_2.9_IFU_20251006 branch from 20ac4a0 to 7f74e86 Compare October 9, 2025 18:50

jithunnair-amd merged commit 426b2e8 into release/2.9 Oct 10, 2025
6 of 8 checks passed

jithunnair-amd deleted the release_2.9_IFU_20251006 branch October 10, 2025 19:55

jithunnair-amd changed the title ~~[release/2.9] IFU on 10/06/2025~~ [release/2.9] Cherry-picks 10/06/2025 Oct 10, 2025

jithunnair-amd changed the title ~~[release/2.9] Cherry-picks 10/06/2025~~ [release/2.9] Cherry-picks from release/2.8 10/06/2025 Oct 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[release/2.9] Cherry-picks from release/2.8 10/06/2025 #2701

[release/2.9] Cherry-picks from release/2.8 10/06/2025 #2701

Uh oh!

pragupta commented Oct 8, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Oct 9, 2025

Uh oh!

rocm-repo-management-api bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

jithunnair-amd commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

[release/2.9] Cherry-picks from release/2.8 10/06/2025 #2701

[release/2.9] Cherry-picks from release/2.8 10/06/2025 #2701

Uh oh!

Conversation

pragupta commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Oct 9, 2025

Uh oh!

rocm-repo-management-api bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jithunnair-amd commented Oct 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

pragupta commented Oct 8, 2025 •

edited

Loading

rocm-repo-management-api bot commented Oct 8, 2025 •

edited

Loading

rocm-repo-management-api bot commented Oct 9, 2025 •

edited

Loading

rocm-repo-management-api bot commented Oct 9, 2025 •

edited

Loading