[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-10-03 #2694

jithunnair-amd · 2025-10-03T18:17:37Z

rocm_base: cd3f5d1

Summary: Previously, many arvr targets transitively depended on c10, not c10_ovrsource, because they either explicitly depended on c10 (because they didn't know better) or they depended on legacy Caffe2, which never got the ovrsource treatment. So we found all these spots (driven by D82283623) and forced them to query arvr mode to figure out which one they should use. The goal is you NEVER have both targets in the same build rule at the same time. This diff could be reverted if D82224960 works out but I haven't gotten it to work yet. Test Plan: sandcastle Reviewed By: EscapeZero Differential Revision: D82390436 Pull Request resolved: pytorch#164128 Approved by: https://github.com/albanD, https://github.com/malfet

…d some other changes (pytorch#164016) * Changes some internal logic for grouping so hopefully it's slightly less annoying write code for * Changes the invoking file summary to just use file, which I think is correct most of the time * Adds some fields to the file summary, like skips, errors, etc so I can reuse it for file report regression things Output should be the same, maybe with slightly more fields since I got rid of some of the pops Pull Request resolved: pytorch#164016 Approved by: https://github.com/huydhn

…6712) Previously we already replaced most use of `python setup.py develop/install`. This PR also replaces the use of `setup.py bdist_wheel` with the modern `python -m build --wheel` alternative. Pull Request resolved: pytorch#156712 Approved by: https://github.com/atalman ghstack dependencies: pytorch#156711

This reverts commit b5d4d35. Reverted pytorch#163959 on behalf of https://github.com/yangw-dev due to seems fails inductor/test_aten_comm_compute_reordering for macos test, see https://hud.pytorch.org/pytorch/pytorch/commit/c9b5af9a384e7ef5f95613abe1622f5f55133c3a#51526707590-box ([comment](pytorch#163215 (comment)))

This reverts commit e1bd5b6. Reverted pytorch#163754 on behalf of https://github.com/yangw-dev due to seems fails inductor/test_aten_comm_compute_reordering for macos test, see https://hud.pytorch.org/pytorch/pytorch/commit/c9b5af9a384e7ef5f95613abe1622f5f55133c3a#51526707590-box ([comment](pytorch#163215 (comment)))

…163215)" This reverts commit c9b5af9. Reverted pytorch#163215 on behalf of https://github.com/yangw-dev due to seems fails inductor/test_aten_comm_compute_reordering for macos test, see https://hud.pytorch.org/pytorch/pytorch/commit/c9b5af9a384e7ef5f95613abe1622f5f55133c3a#51526707590-box ([comment](pytorch#163215 (comment)))

* Bump protobuf from 5.29.4 to 5.29.5 in /.ci/docker Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 5.29.4 to 5.29.5. - [Release notes](https://github.com/protocolbuffers/protobuf/releases) - [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/protobuf_release.bzl) - [Commits](protocolbuffers/protobuf@v5.29.4...v5.29.5) --- updated-dependencies: - dependency-name: protobuf dependency-version: 5.29.5 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * Update .ci/docker/requirements-ci.txt --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Nikita Shulga <[email protected]>

Pull Request resolved: pytorch#164149 Approved by: https://github.com/fmassa

Changing PP submodules' name from `submod_i` to `submod_pp_i` to distinguish from the submodule created by HOP. Pull Request resolved: pytorch#164037 Approved by: https://github.com/H-Huang ghstack dependencies: pytorch#164045, pytorch#164035

…torch#164187) I believe this image is not used anywhere anymore. Test: ``` git grep manylinuxcxx11-abi-builder git grep manylinuxcxx11 ``` Return no results. Pull Request resolved: pytorch#164187 Approved by: https://github.com/izaitsevfb, https://github.com/malfet, https://github.com/seemethere

…rch#164104) This is the result of applying the ruff `UP035` check. `Callable` is imported from `collections.abc` instead of `typing`. This PR is the follow-up of pytorch#164054. Pull Request resolved: pytorch#164104 Approved by: https://github.com/Skylion007

`fmtlib` version was updated to 12.0.0 in pytorch#163441. In this new version, due to fmtlib/fmt#4536, PyTorch started not installing `fmtlib` headers anymore. Because of that, PyTorch/XLA build CI started to fail pytorch/xla#9653. While we did fix it internally pytorch/xla#9650, I believe that PyTorch should continue installing the `fmtlib` headers, since it is a dependency of its C API [`python_arg_parser.h`][1]. PyTorch/XLA CI was moved to `unstable.yml` in pytorch#159272, and later removed in pytorch#163564. This PyTorch/XLA build failure went under the radar, since the `fmtlib` update only landed on September 22. [1]: https://github.com/pytorch/pytorch/blob/84d673ef577d42d6ec20c6c9f09863583c3111f5/torch/csrc/utils/python_arg_parser.h#L42 Pull Request resolved: pytorch#164139 Approved by: https://github.com/Skylion007, https://github.com/malfet

Summary: Generates new unbacked symbols for slice output size & storage offset, when appropriate semantics are unclear. Teaches inductor to codegen the slice with flexible semantics. Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/56218d85e2da09d9ede3809718ec989c2151632c Rollback Plan: Differential Revision: D80948073 Pull Request resolved: pytorch#161414 Approved by: https://github.com/laithsakka

…#164001) The CUDACachingAllocator already does this, so there is precedent. Pull Request resolved: pytorch#164001 Approved by: https://github.com/eqy

…torch#163988) See also pytorch#163972, which was intended to be this PR. Triton (release/3.5.x) by default ships CUDA12.8 ptxas. This PR tries to bundle a ptxas version for cuda13, so that it can help pytorch#163801 when users run on new devices like THOR and Spark. Fixes pytorch#163801 Test Plan: Check binary size increase against nightly or v2.9RC Install the binary from into a working THOR and GB200/GH100 machine (reproduce the original issue first on THOR), then install the binary built from this PR and we expect the issue to be gone without any additional user setting. Testing on GB200 is to ensure no regression. Reference: pytorch#119750 and pytorch/builder@5c814e2 Note: with this PR, the pytorch world's torch.compile is supposed to find ptxas via "torch/_inductor/runtime/compile_tasks.py" and "_set_triton_ptxas_path". Use cases that do not go through "_set_triton_ptxas_path" may not be able to use the cuda13 ptxas binary. However, as is, the triton world does not know the existence of this new cuda13 ptxas. So IF a users thinks there is already pytorch/bin/ptxas and delete the ptxas from triton, then https://github.com/triton-lang/triton/blob/c6ad34f7eb42630533412d93ca2cc00a4b4f8f3c/python/triton/knobs.py#L216 would still complain ptxas not found (if removed - it won't know this new one available) Pull Request resolved: pytorch#163988 Approved by: https://github.com/atalman

Upgrade all the ROCm docker image to ROCm 7.0 release version. Pull Request resolved: pytorch#163140 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <[email protected]>

---- - `cmake_dependent_option` condition should be `USE_ROCM OR (USE_CUDA AND NOT MSVC)` (similar to the one for flash attention) - Default settings should be user overridable, i.e. even if one builds for SM_10, they should be able to pass `USE_FBGEMM_GENAI=0` and skip the build Pull Request resolved: pytorch#164165 Approved by: https://github.com/Skylion007

…orch#163794) Summary: Add a OSS user manual for AOTI intermediate debug printer so we can link it in the Pytorch conference poster. Test Plan: N/A Differential Revision: D83171374 Pull Request resolved: pytorch#163794 Approved by: https://github.com/yushangdi

This reverts commit 872edd8. Reverted pytorch#163884 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](pytorch#163884 (comment)))

Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#164103 Approved by: https://github.com/Lucaskabela, https://github.com/mlazos

Every time viable strict is updated Pull Request resolved: pytorch#164183 Approved by: https://github.com/seemethere

Fixes invalid f-strings detected by `ruff`. Pull Request resolved: pytorch#164112 Approved by: https://github.com/Skylion007, https://github.com/mlazos

…#164034) Pull Request resolved: pytorch#164034 Approved by: https://github.com/pianpwk

This is first part of the stack that does comm/compute reordering, and then uses the exposure analysis to do bucketing. Subsequent prs will handle: - use of exposure analysis to do bucketing - make sure inductor respects comm/compute overlapping done at fx level - non-profiling mm estimation/rank broadcasting of profile results Other mis: - Validate accuracy of nccl estimations ( use ruisi's profiling instead ?) For a llama 2d parallelism test, on forward, we overlap all but 2 of potentially hidden collectives. For backward, we overlap 217/269 of potentially hidden collectives. If you increase `compute_overlap_multipler` (for fudge factor of inaccurate comms estimation), that goes down to all but 16 of potentially hidden collectives. fwd example: https://gist.github.com/eellison/76209c49d8829c5f1e323d34a3f040c3 bwd example: https://gist.github.com/eellison/6cfc2285df53a94cfa4012f5fdae5c51 Pull Request resolved: pytorch#163215 Approved by: https://github.com/IvanKobzarev

Preparatory refactory Pull Request resolved: pytorch#163754 Approved by: https://github.com/IvanKobzarev ghstack dependencies: pytorch#163215

In comm-compute overlap we will have a graph with: ``` def foo(...): ag = all_gather(...) hiding_compute = mm(...) wait(ag) ``` There is no explicit dependency between the hiding compute and the collectives, but we want to add implicit dependencies from wait->hiding_compute, and from hiding_compute->all_gather to preserve overlap. Additionally, while bucketing, we will merge collective starts and collective waits together. In this case, we will want to treat the two nodes as a single subgraph - each node in the merged set will have the union of all deps in the set. This pr adds `AugmentedGraphHelper` that adds the apis, and allows querying for dependency with this augmented graph. Pull Request resolved: pytorch#163959 Approved by: https://github.com/v0i0, https://github.com/IvanKobzarev ghstack dependencies: pytorch#163215, pytorch#163754

tl;dr performs bucketing while preserving comm-compute overlap. In comm-compute overlap we will have a graph with: ``` def foo(...): ag = all_gather(...) hiding_compute = mm(...) wait(ag) ``` There is no explicit dependency between the hiding compute and the collectives, but we want to add implicit dependencies from wait->hiding_compute, and from hiding_compute->all_gather to preserve overlap. Additionally, while bucketing, we will merge collective starts and collective waits together. In this case, we will want to treat the two nodes as a single subgraph - each node in the merged set will have the union of all deps in the set. We perform bucketing while augmenting the graph with these relationships. This can be done separably from comm-compute overlap, so long as the hiding compute relationships are passed in. TODO: - need to instrument fx graph so inductor respects these relationships. - the compile time of the bucketing search can be sped up significantly by limiting what portion of the graph we traverse through - more memory aware handling Pull Request resolved: pytorch#163960 Approved by: https://github.com/ruisizhang123, https://github.com/v0i0, https://github.com/IvanKobzarev ghstack dependencies: pytorch#163215, pytorch#163754, pytorch#163959

…pytorch#160965) … - issue#153281 Fixes pytorch#153281 Pull Request resolved: pytorch#160965 Approved by: https://github.com/janeyx99

…164081) Pull Request resolved: pytorch#164081 Approved by: https://github.com/tugsbayasgalan, https://github.com/mlazos

Summary: Original commit changeset: 06888d7ebff0 Original Phabricator Diff: D82932788 Restricted the test to SM90 for scaled_grouped_mm Test Plan: TBD (will share the linux CI results) Differential Revision: D83283991 Pull Request resolved: pytorch#163905 Approved by: https://github.com/angelayi

Adds suppressions to pyrefly will typecheck clean: pytorch#163283 Test plan: dmypy restart && python3 scripts/lintrunner.py -a pyrefly check --- step 1: uncomment lines in the `pyrefly.toml` file before: https://gist.github.com/maggiemoss/911b4d0bc88bf8cf3ab91f67184e9d46 after: ``` INFO Checking project configured at `/Users/maggiemoss/python_projects/pytorch/pyrefly.toml` INFO 0 errors (1,152 ignored) ``` Pull Request resolved: pytorch#164513 Approved by: https://github.com/oulgen

Test Plan: Sandcastle Differential Revision: D83492704 Pull Request resolved: pytorch#164159 Approved by: https://github.com/Skylion007, https://github.com/mlazos

…ch#163213) We want to refactor the internal bookkeeping of DeviceMesh so that: Simply the bookkeeping logics and make it generic enough so that it is easy to support new transformations like flatten noncontiguous dim, reshape and unflatten. (We leveraged the CuTe layout). This new layout also let us handle non-contiguous slicing, flatten, transpose possible. Concretely, in this PR, we do the following: 1. Use the `_MeshLayout` to handle all index operations rather use a map to record mesh dims. 2. Removed `flatten_name_to_root_dims`, because now we can directly get layout from a flattened device mesh. 3. Replaced `_get_slice_mesh_dims` with `_get_slice_mesh_layout`. 4. Use the newly added function `check_overlap` to check layout overlap. 5. Use a new function `to_remapping_tensor` to use layout ranks as indices when the mesh tensor is not representable as CuTe. The reason is that layout acts as a backend of mesh tensor bookkeeping (indexing indices), it needs to be used as indices for remap back to the mesh tensor for new DeviceMesh generation and backend init. For example, in the case of 2K to 4K, the underlying layout is (2K, 1) but the actual value of the mesh tensor is [2K, 2K+1, ....,]. While flattening, slicing, we need to remap the layout back to the new mesh tensor so it maps the actual device allocation. For example, in the 2K to 4K case, if the shape is (1K, 1K) with dim_names ("dp", "tp"). Then when slicing "tp", the mesh tensor should be (2K, 2K+1, ..., 3K-1) or (3K, 3K+1, ... 4K-1). not the global ranks generated from the layout. (1K, 1). Verified that loss curve is very close for DeepSeekV3 on torchtitan, note that exact same match is challenging because even if we run the baseline twice, the loss curve does not exactly match. <img width="1113" height="490" alt="image" src="https://github.com/user-attachments/assets/7877b5a4-337e-4ad8-b878-2378f4f0f38d" /> The PR looks big indeed but we don't change any existing behavior of DeviceMesh, so it is a pure refactor. With this refactoring we also enabled the slicing and flatten of non-contiguous dims of a device mesh which is hard to implement without cute layout. This is a continue of pytorch#161106 (original one got messed with EasyCLA) Pull Request resolved: pytorch#163213 Approved by: https://github.com/lw, https://github.com/fegin

…ytorch#164432) Pull Request resolved: pytorch#164432 Approved by: https://github.com/pianpwk

This pull request adds support for running operator microbenchmarks on ROCm (AMD GPU) environments in the CI workflow. The main changes involve introducing new build and test jobs for ROCm in the `.github/workflows/operator_microbenchmark.yml` file. Pull Request resolved: pytorch#164173 Approved by: https://github.com/huydhn

This PR moves the call to copy the generated code from `/tmp/...` so that it is still called if attempting to compile the generated code fails. In both cases now, the generated code will be copied across to `torch_compile_debug/run_.../torchinductor/output_code.py` which makes debugging bad generated code easier. Pull Request resolved: pytorch#161615 Approved by: https://github.com/eellison

Test Plan: ``` buck test fbcode//mode/opt caffe2/test/inductor:caching ``` Reviewed By: aorenste Differential Revision: D83714687 Pull Request resolved: pytorch#164512 Approved by: https://github.com/jananisriram

This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/nightly.yml). Update the pinned vllm hash. Pull Request resolved: pytorch#164319 Approved by: https://github.com/pytorchbot Co-authored-by: Huy Do <[email protected]>

…ytorch#164539) Because torch.testing.test_allclose is deprecated. Pull Request resolved: pytorch#164539 Approved by: https://github.com/mlazos

Pull Request resolved: pytorch#164434 Approved by: https://github.com/pianpwk ghstack dependencies: pytorch#164432

Pull Request resolved: pytorch#164514 Approved by: https://github.com/pianpwk ghstack dependencies: pytorch#164432, pytorch#164434

Mitigates pytorch#164574 Remove unused CUDA_CHANNEL var - this was used before when we had pytorch install via conda. Please note: CUDA 13.0 failures are expected since the CI tries to build against prod and CUDA 13.0 is not available in prod yet. Pull Request resolved: pytorch#164575 Approved by: https://github.com/malfet, https://github.com/Camyll

PR pytorch#164481 added unit test test_scaled_mm_preserves_strides in test/inductor/test_fp8.py. It was missing the adjustment for ROCm's F8 types on MI300. Pull Request resolved: pytorch#164578 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <[email protected]>

…ytorch#163521) Differential Revision: [D82735769](https://our.internmc.facebook.com/intern/diff/D82735769/) Pull Request resolved: pytorch#163521 Approved by: https://github.com/zhxchen17

Pull Request resolved: pytorch#164399 Approved by: https://github.com/albanD

So this fixes at least two issues: 1) When we are invoking inductor backend, we apply pre-grad passes which try to find correct fake mode to use. In the nested case, we will run into clash when there is closure variable in the inductor region because non-strict would have fakified this variable before hand and inner torch.compile would have created a new fresh fake mode. This is not a problem in regular torch.compile because inner torch.compile gets ignored. I don't know if we are supposed to inherit fake mode from parent context in this case. But we can avoid this problem if we just default to eager backend which is fine in this case because the point of export is to capture aten operators. Going to inductor would mean we will lose inner torch.compile ops. 2) There is custom torch function modes in export that track number of torch fns executed and inner compile itself doesn't work because of guard failure as this mode state gets changed. I noticed torch.cond fixes this problem by carefully stashing the torch function mode and defer it in the backend. So the correct thing to do here is just re-use torch.cond implementation unconditionally. So the things i did for fixing above were: 1) Always default to eager backend when compile is invoked inside export. I needed to make how torch.cond sets up the fresh tracing env into an util that can be shared. 2) The previous eager backend for torch.cond was wrong because the context managers didn't actually persist until the backend is invoked. 3) torch.cond used only disable TorchFunctionMetadata tf mode and stash it for later, but in fact, we should do both TorchFunctionMetadata and PreDispatchTorchFunctionMode. With above fixes, we are able to export flex attention in export. Pull Request resolved: pytorch#164171 Approved by: https://github.com/ydwu4

skips DTensorSpec.sizes/strides in metadata guard checks Pull Request resolved: pytorch#163820 Approved by: https://github.com/azahed98

) Remove workaround for CUDA 11.4 . Pull Request resolved: pytorch#164567 Approved by: https://github.com/Aidyn-A, https://github.com/Skylion007

…ent to avoid slow paths (pytorch#164501) Summary: This diff adds the feature of allocating a large pinned memory segment upfront based on the provided config. This large segment is then used to serve all the small pinned memory requests to avoid expensive device level APIs (slow paths). Example: PYTORCH_CUDA_ALLOC_CONF=pinned_reserve_segment_size_mb:2048 This reserves a 2GB pinned memory segment for the process and then all incoming small requests are just served from this segment and no cudaHostAlloc/cudaHostRegister apis are being called. Differential Revision: D83779074 Pull Request resolved: pytorch#164501 Approved by: https://github.com/yangw-dev

…sting_IFU_2025-10-03 # Conflicts: # .ci/docker/ci_commit_pins/triton.txt # .ci/docker/libtorch/build.sh # CMakeLists.txt # requirements-build.txt # test/test_matmul_cuda.py # torch/_inductor/runtime/triton_heuristics.py # torch/testing/_internal/common_utils.py

rocm-repo-management-api · 2025-10-03T18:58:07Z

Jenkins build for 67823272940c3318a025c9f95c752d56fdd1ea77 commit finished as NOT_BUILT
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-10-03T19:12:28Z

Jenkins build for 67823272940c3318a025c9f95c752d56fdd1ea77 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

Detected error during Pytorch building:

HIP VERSION: 7.0.51831-a3e329ad8
CMake Warning (dev) at /opt/rocm/lib/cmake/hip/hip-config-amd.cmake:98 (message):
   GPU_TARGETS was not set, and system GPU detection was unsuccsessful.
   
   The amdgpu-arch tool failed:
   Error: 'Failed to get device count'
   Output: ''
   
   As a result, --offload-arch will not be set for subsequent
   compilations, and the default architecture
   (gfx906 for dynamic build / gfx942 for static build) will be used

jagadish-amd · 2025-10-03T19:13:33Z

The diffs in test_matmul_cuda.py is due to refactoring of fp8, fp4 tests (scaled_mm ops), all the tests related to fp8 and fp4 have been moved to a new file test_scaled_matmul_cuda.py

rocm-repo-management-api · 2025-10-03T21:43:41Z

Jenkins build for 67823272940c3318a025c9f95c752d56fdd1ea77 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

rocm-repo-management-api · 2025-10-08T22:15:47Z

Jenkins build for 67823272940c3318a025c9f95c752d56fdd1ea77 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

pruthvistony · 2025-10-19T02:56:55Z

Should this PR dump to newer upstream commit?

cc @pragupta @jithunnair-amd

ezyang and others added 30 commits September 29, 2025 20:47

[Easy] Add pointwise tag to fma (pytorch#164149)

d58f7c3

Pull Request resolved: pytorch#164149 Approved by: https://github.com/fmassa

CUDACachingHostAllocatorImpl skip event query during capture (pytorch…

4cf2900

…#164001) The CUDACachingAllocator already does this, so there is precedent. Pull Request resolved: pytorch#164001 Approved by: https://github.com/eqy

[ROCm][CI] Upgrade ROCm to 7.0 (pytorch#163140)

b7419b9

Upgrade all the ROCm docker image to ROCm 7.0 release version. Pull Request resolved: pytorch#163140 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <[email protected]>

Revert "Enable outer reductions in fbcode (pytorch#163884)"

ca19815

This reverts commit 872edd8. Reverted pytorch#163884 on behalf of https://github.com/facebook-github-bot due to Diff reverted internally ([comment](pytorch#163884 (comment)))

Remove unnecessary list comprehensions (pytorch#164103)

85012fe

Fixes #ISSUE_NUMBER Pull Request resolved: pytorch#164103 Approved by: https://github.com/Lucaskabela, https://github.com/mlazos

[CI] Push viable/strict/${time} tags (pytorch#164183)

9f27b0c

Every time viable strict is updated Pull Request resolved: pytorch#164183 Approved by: https://github.com/seemethere

Fix invalid f-strings (pytorch#164112)

a293206

Fixes invalid f-strings detected by `ruff`. Pull Request resolved: pytorch#164112 Approved by: https://github.com/Skylion007, https://github.com/mlazos

[torchfuzz] Make scalar and tensor distribution configurable (pytorch…

c39357b

…#164034) Pull Request resolved: pytorch#164034 Approved by: https://github.com/pianpwk

refactor bucketing (pytorch#163754)

0b2fdc3

Preparatory refactory Pull Request resolved: pytorch#163754 Approved by: https://github.com/IvanKobzarev ghstack dependencies: pytorch#163215

better error handling for rrelu when lower or upper range is infinite (…

ace8935

…pytorch#160965) … - issue#153281 Fixes pytorch#153281 Pull Request resolved: pytorch#160965 Approved by: https://github.com/janeyx99

[dynamo] Special path for cloning of torch dispatch tensors (pytorch#…

bbf6816

…164081) Pull Request resolved: pytorch#164081 Approved by: https://github.com/tugsbayasgalan, https://github.com/mlazos

maggiemoss and others added 20 commits October 3, 2025 02:46

[cutlass-4][take 2] upgrade to cutlass 4.2.1 (pytorch#164159)

6c209bf

Test Plan: Sandcastle Differential Revision: D83492704 Pull Request resolved: pytorch#164159 Approved by: https://github.com/Skylion007, https://github.com/mlazos

[torchfuzz] Support EagerVsFullGraphDynamicCompileWithNumericsCheck (p…

7617b11

…ytorch#164432) Pull Request resolved: pytorch#164432 Approved by: https://github.com/pianpwk

config for dcache + unit tests (pytorch#164512)

6c3c941

Test Plan: ``` buck test fbcode//mode/opt caffe2/test/inductor:caching ``` Reviewed By: aorenste Differential Revision: D83714687 Pull Request resolved: pytorch#164512 Approved by: https://github.com/jananisriram

Use torch.testing.test_close instead of torch.testing.test_allclose (p…

5743d73

…ytorch#164539) Because torch.testing.test_allclose is deprecated. Pull Request resolved: pytorch#164539 Approved by: https://github.com/mlazos

[torchfuzz] add nn functional ops (pytorch#164434)

5bb8f04

Pull Request resolved: pytorch#164434 Approved by: https://github.com/pianpwk ghstack dependencies: pytorch#164432

[torchfuzz] add norm operators (pytorch#164514)

3db2164

Pull Request resolved: pytorch#164514 Approved by: https://github.com/pianpwk ghstack dependencies: pytorch#164432, pytorch#164434

Support partial _DynamoCacheEntries when not all backends available (p…

fa5306b

…ytorch#163521) Differential Revision: [D82735769](https://our.internmc.facebook.com/intern/diff/D82735769/) Pull Request resolved: pytorch#163521 Approved by: https://github.com/zhxchen17

Change default device to current acclerator (pytorch#164399)

3288fbf

Pull Request resolved: pytorch#164399 Approved by: https://github.com/albanD

[dtensor] avoid shape recompilations on DTensorSpec (pytorch#163820)

5b0b4cd

skips DTensorSpec.sizes/strides in metadata guard checks Pull Request resolved: pytorch#163820 Approved by: https://github.com/azahed98

Remove old workaround in launch_logcumsumexp_cuda_kernel (pytorch#164567

3d9d41c

) Remove workaround for CUDA 11.4 . Pull Request resolved: pytorch#164567 Approved by: https://github.com/Aidyn-A, https://github.com/Skylion007

jithunnair-amd requested review from jataylo, jeffdaily and pruthvistony as code owners October 3, 2025 18:17

Fix merge conflicts

6782327

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-10-03 #2694

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-10-03 #2694

jithunnair-amd commented Oct 3, 2025

Uh oh!

rocm-repo-management-api bot commented Oct 3, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Oct 3, 2025 •

edited

Loading

Uh oh!

jagadish-amd commented Oct 3, 2025

Uh oh!

rocm-repo-management-api bot commented Oct 3, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Oct 8, 2025 •

edited

Loading

Uh oh!

pruthvistony commented Oct 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

95 participants

Uh oh!

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-10-03 #2694

Are you sure you want to change the base?

[AUTOGENERATED] rocm7.1_internal_testing_IFU_2025-10-03 #2694

Conversation

jithunnair-amd commented Oct 3, 2025

Uh oh!

rocm-repo-management-api bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jagadish-amd commented Oct 3, 2025

Uh oh!

rocm-repo-management-api bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pruthvistony commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

95 participants

rocm-repo-management-api bot commented Oct 3, 2025 •

edited

Loading

rocm-repo-management-api bot commented Oct 3, 2025 •

edited

Loading

rocm-repo-management-api bot commented Oct 3, 2025 •

edited

Loading

rocm-repo-management-api bot commented Oct 8, 2025 •

edited

Loading

pruthvistony commented Oct 19, 2025 •

edited

Loading