Skip to content

Conversation

akashveramd
Copy link

@akashveramd akashveramd commented Jul 31, 2025

This PR has fixes for P1 Jira https://ontrack-internal.amd.com/browse/SWDEV-542659.
In this Jira, there are 3 test files with failing tests.

  1. distributed.test_distributed_spawn
  2. test_binary_ufuncs
  3. test_nn

The test files distributed.test_distributed_spawn & test_binary_ufuncs are passing with latest mainline build-
registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9.

The test file test_nn has 2 failing tests- test_batchnorm_3D_train_NCHW_vs_native_mixed_float16 & test_RNN_dropout_state.
The test_batchnorm_3D_train_NCHW_vs_native_mixed_float16 test is skipped from PR #2370.
The test_RNN_dropout_state is fixed by cherry picking upstream commit 1aa971a.

Tested on MI200 with docker image-
registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9.

Cherry-picked to release/2.5 branch via #2506

Cherry-picked to release/2.6 branch via #2507

Cherry-picked to release/2.8 branch via #2509

Cherry-picked to rocm7.0_internal_testing branch via #2510

iupaikov-amd and others added 2 commits July 31, 2025 17:56
…#144572)

This PR fixes pytorch#107183 for ROCm.

Implemented the usage of new RNN descriptor for MIOpen backend that takes into account dropout rate value using dropout descriptor. This fixes associated test_RNN_dropout_state test.

Pull Request resolved: pytorch#144572
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jul 31, 2025

Jenkins build for 9bd1a832b0b2a9eef7d5d5bd9c9ed425e95ee85e commit finished as NOT_BUILT
Links: Blue Ocean view / Build artifacts

@akashveramd akashveramd requested a review from iupaikov-amd July 31, 2025 21:13
@rocm-repo-management-api
Copy link

rocm-repo-management-api bot commented Jul 31, 2025

Jenkins build for 9bd1a832b0b2a9eef7d5d5bd9c9ed425e95ee85e commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

@akashveramd akashveramd changed the title [release/2.7] Fix test_RNN_dropout_state and test_rnn_check_device tests for P1 Jira SWDEV-542659 [release/2.7] Fix test_rnn_check_device tests for P1 Jira SWDEV-542659 Jul 31, 2025
Copy link

@iupaikov-amd iupaikov-amd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm on the dropout side of things

@akashveramd akashveramd requested a review from pragupta August 1, 2025 16:24
@akashveramd akashveramd dismissed pragupta’s stale review August 1, 2025 16:28

@pragupta: As discussed, I took off the changes made to test_nn.py. Hence dismissing the review. The PR has no changes to test_nn.py file now.

@akashveramd akashveramd self-assigned this Aug 1, 2025
@pruthvistony
Copy link
Collaborator

@akashveramd , @iupaikov-amd ,
Is the dropout change upstreamed?

@pruthvistony pruthvistony merged commit 699f463 into release/2.7 Aug 2, 2025
1 of 6 checks passed
@pruthvistony pruthvistony deleted the av_jira_542659_rel2.7 branch August 2, 2025 06:01
@akashveramd
Copy link
Author

@akashveramd , @iupaikov-amd , Is the dropout change upstreamed?

@pruthvistony : This was cherry-picked from upstream.

@akashveramd
Copy link
Author

! cherry-pick --onto release/2.5 release/2.6 release/2.7 release/2.8 rocm7.0_internal_testing

dhonnappa-amd pushed a commit that referenced this pull request Aug 13, 2025
#2440)

This PR has fixes for P1 Jira
https://ontrack-internal.amd.com/browse/SWDEV-542659.
In this Jira, there are 3 test files with failing tests.
1) distributed.test_distributed_spawn
2) test_binary_ufuncs
3) test_nn 

The test files **distributed.test_distributed_spawn** &
**test_binary_ufuncs** are passing with latest mainline build-

**registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**.

The test file **test_nn** has 2 failing tests-
**test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** &
**test_RNN_dropout_state**.
The **test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** test is
skipped from PR #2370.
The **test_RNN_dropout_state** is fixed by cherry picking upstream
commit 1aa971a.

Tested on MI200 with docker image-

**registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**.

---------

Co-authored-by: Iurii Paikov <[email protected]>
Co-authored-by: Jeff Daily <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
dhonnappa-amd pushed a commit that referenced this pull request Aug 13, 2025
#2440)

This PR has fixes for P1 Jira
https://ontrack-internal.amd.com/browse/SWDEV-542659.
In this Jira, there are 3 test files with failing tests.
1) distributed.test_distributed_spawn
2) test_binary_ufuncs
3) test_nn 

The test files **distributed.test_distributed_spawn** &
**test_binary_ufuncs** are passing with latest mainline build-

**registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**.

The test file **test_nn** has 2 failing tests-
**test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** &
**test_RNN_dropout_state**.
The **test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** test is
skipped from PR #2370.
The **test_RNN_dropout_state** is fixed by cherry picking upstream
commit 1aa971a.

Tested on MI200 with docker image-

**registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**.

---------

Co-authored-by: Iurii Paikov <[email protected]>
Co-authored-by: Jeff Daily <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
dhonnappa-amd pushed a commit that referenced this pull request Aug 13, 2025
#2440)

This PR has fixes for P1 Jira
https://ontrack-internal.amd.com/browse/SWDEV-542659.
In this Jira, there are 3 test files with failing tests.
1) distributed.test_distributed_spawn
2) test_binary_ufuncs
3) test_nn 

The test files **distributed.test_distributed_spawn** &
**test_binary_ufuncs** are passing with latest mainline build-

**registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**.

The test file **test_nn** has 2 failing tests-
**test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** &
**test_RNN_dropout_state**.
The **test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** test is
skipped from PR #2370.
The **test_RNN_dropout_state** is fixed by cherry picking upstream
commit 1aa971a.

Tested on MI200 with docker image-

**registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**.

---------

Co-authored-by: Iurii Paikov <[email protected]>
Co-authored-by: Jeff Daily <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
@dhonnappa-amd
Copy link

Created branch autogenerated/release/2.5_cherry-pick_pr-2440 and #2506. It contains a merge conflict. Please resolve it

Created branch autogenerated/release/2.6_cherry-pick_pr-2440 and #2507

Nothing to cherry-pick onto the release/2.7 branch

Created branch autogenerated/release/2.8_cherry-pick_pr-2440 and #2509

Created branch autogenerated/rocm7.0_internal_testing_cherry-pick_pr-2440 and #2510

jeffdaily added a commit that referenced this pull request Aug 15, 2025
#2507)

Cherry-pick of #2440

Co-authored-by: akashveramd <[email protected]>
Co-authored-by: Iurii Paikov <[email protected]>
Co-authored-by: Jeff Daily <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
jeffdaily added a commit that referenced this pull request Aug 15, 2025
…a SWDEV-542659 (#2510)

Cherry-pick of #2440

Co-authored-by: akashveramd <[email protected]>
Co-authored-by: Iurii Paikov <[email protected]>
Co-authored-by: Jeff Daily <[email protected]>
Co-authored-by: Nikita Shulga <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants