-
Notifications
You must be signed in to change notification settings - Fork 74
[release/2.7] Fix test_rnn_check_device tests for P1 Jira SWDEV-542659 #2440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…#144572) This PR fixes pytorch#107183 for ROCm. Implemented the usage of new RNN descriptor for MIOpen backend that takes into account dropout rate value using dropout descriptor. This fixes associated test_RNN_dropout_state test. Pull Request resolved: pytorch#144572 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <[email protected]> Co-authored-by: Nikita Shulga <[email protected]>
Jenkins build for 9bd1a832b0b2a9eef7d5d5bd9c9ed425e95ee85e commit finished as NOT_BUILT |
Jenkins build for 9bd1a832b0b2a9eef7d5d5bd9c9ed425e95ee85e commit finished as FAILURE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm on the dropout side of things
@pragupta: As discussed, I took off the changes made to test_nn.py. Hence dismissing the review. The PR has no changes to test_nn.py file now.
@akashveramd , @iupaikov-amd , |
@pruthvistony : This was cherry-picked from upstream. |
! cherry-pick --onto release/2.5 release/2.6 release/2.7 release/2.8 rocm7.0_internal_testing |
#2440) This PR has fixes for P1 Jira https://ontrack-internal.amd.com/browse/SWDEV-542659. In this Jira, there are 3 test files with failing tests. 1) distributed.test_distributed_spawn 2) test_binary_ufuncs 3) test_nn The test files **distributed.test_distributed_spawn** & **test_binary_ufuncs** are passing with latest mainline build- **registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**. The test file **test_nn** has 2 failing tests- **test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** & **test_RNN_dropout_state**. The **test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** test is skipped from PR #2370. The **test_RNN_dropout_state** is fixed by cherry picking upstream commit 1aa971a. Tested on MI200 with docker image- **registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**. --------- Co-authored-by: Iurii Paikov <[email protected]> Co-authored-by: Jeff Daily <[email protected]> Co-authored-by: Nikita Shulga <[email protected]>
#2440) This PR has fixes for P1 Jira https://ontrack-internal.amd.com/browse/SWDEV-542659. In this Jira, there are 3 test files with failing tests. 1) distributed.test_distributed_spawn 2) test_binary_ufuncs 3) test_nn The test files **distributed.test_distributed_spawn** & **test_binary_ufuncs** are passing with latest mainline build- **registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**. The test file **test_nn** has 2 failing tests- **test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** & **test_RNN_dropout_state**. The **test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** test is skipped from PR #2370. The **test_RNN_dropout_state** is fixed by cherry picking upstream commit 1aa971a. Tested on MI200 with docker image- **registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**. --------- Co-authored-by: Iurii Paikov <[email protected]> Co-authored-by: Jeff Daily <[email protected]> Co-authored-by: Nikita Shulga <[email protected]>
#2440) This PR has fixes for P1 Jira https://ontrack-internal.amd.com/browse/SWDEV-542659. In this Jira, there are 3 test files with failing tests. 1) distributed.test_distributed_spawn 2) test_binary_ufuncs 3) test_nn The test files **distributed.test_distributed_spawn** & **test_binary_ufuncs** are passing with latest mainline build- **registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**. The test file **test_nn** has 2 failing tests- **test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** & **test_RNN_dropout_state**. The **test_batchnorm_3D_train_NCHW_vs_native_mixed_float16** test is skipped from PR #2370. The **test_RNN_dropout_state** is fixed by cherry picking upstream commit 1aa971a. Tested on MI200 with docker image- **registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9**. --------- Co-authored-by: Iurii Paikov <[email protected]> Co-authored-by: Jeff Daily <[email protected]> Co-authored-by: Nikita Shulga <[email protected]>
Created branch autogenerated/release/2.5_cherry-pick_pr-2440 and #2506. It contains a merge conflict. Please resolve it Created branch autogenerated/release/2.6_cherry-pick_pr-2440 and #2507 Nothing to cherry-pick onto the release/2.7 branch Created branch autogenerated/release/2.8_cherry-pick_pr-2440 and #2509 Created branch autogenerated/rocm7.0_internal_testing_cherry-pick_pr-2440 and #2510 |
#2507) Cherry-pick of #2440 Co-authored-by: akashveramd <[email protected]> Co-authored-by: Iurii Paikov <[email protected]> Co-authored-by: Jeff Daily <[email protected]> Co-authored-by: Nikita Shulga <[email protected]>
…a SWDEV-542659 (#2510) Cherry-pick of #2440 Co-authored-by: akashveramd <[email protected]> Co-authored-by: Iurii Paikov <[email protected]> Co-authored-by: Jeff Daily <[email protected]> Co-authored-by: Nikita Shulga <[email protected]>
This PR has fixes for P1 Jira https://ontrack-internal.amd.com/browse/SWDEV-542659.
In this Jira, there are 3 test files with failing tests.
The test files distributed.test_distributed_spawn & test_binary_ufuncs are passing with latest mainline build-
registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9.
The test file test_nn has 2 failing tests- test_batchnorm_3D_train_NCHW_vs_native_mixed_float16 & test_RNN_dropout_state.
The test_batchnorm_3D_train_NCHW_vs_native_mixed_float16 test is skipped from PR #2370.
The test_RNN_dropout_state is fixed by cherry picking upstream commit 1aa971a.
Tested on MI200 with docker image-
registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16426_ubuntu22.04_py3.10_pytorch_lw_release-2.7_fe3d37a9.
Cherry-picked to release/2.5 branch via #2506
Cherry-picked to release/2.6 branch via #2507
Cherry-picked to release/2.8 branch via #2509
Cherry-picked to rocm7.0_internal_testing branch via #2510