enable device index check for all device types #126767

garfield1997 · 2024-05-21T09:41:16Z

enable device index check for all device types for grad setter.

pytorch-bot · 2024-05-21T09:41:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126767

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 7e8d1d8 with merge base 00f675b ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / linux-focal-py3.8-clang10 / test (dynamo, 3, 3, linux.2xlarge) (gh) (similar failure)
test_quantization.py::TestQuantizePT2EQATModels::test_qat_mobilenet_v2
trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-13) (gh) (similar failure)
test_mps.py::TestMPS::test_mps_allocator_module
trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-14) (gh) (similar failure)
test_mps.py::TestMPS::test_mps_allocator_module

This comment was automatically generated by Dr. CI and updates every 15 minutes.

janeyx99

Looks fine to me, needs test though

albanD · 2024-05-21T22:31:17Z

The blast radius is pretty large but that sounds ok to me as any device with single hw should just return 0 all the time. Will let @soulitzer give the final stamp if he's happy with it.

Also agreed that we need test to make sure it works fine.

garfield1997 · 2024-05-22T01:24:33Z

I believe there are already some tests in place. https://github.com/pytorch/pytorch/blob/main/test/test_autograd.py#L11299

# Tests cross-device assignment raises
        if len(devices) > 1:
            x = torch.randn(5, 5, device=devices[0])
            with self.assertRaises(RuntimeError):
                x.grad = torch.randn(5, 5, device=devices[1])

Do we need additional testing? @albanD @janeyx99

albanD · 2024-05-22T14:18:07Z

Do we run this test on non-cuda devices? I would expect that this was failing before and would be fixed with this PR for non-cuda devices?

garfield1997 · 2024-05-23T01:29:05Z

I am adding a custom backend for PyTorch. This test case used to fail and will be fixed by this PR. @albanD

garfield1997 · 2024-05-28T01:17:13Z

If you have the time, could you give me some feedback on this PR? @albanD @janeyx99

soulitzer

Sounds reasonable to me

Though also on testing, @albanD, do you know what our general testing strategy is for non-cuda devices, for things like this

albanD · 2024-05-30T20:33:19Z

The best Plan On Record for testing these is what is described in this rfc: pytorch/rfcs#64 but it is not done yet I'm afraid.

albanD

Sounds good then!

albanD · 2024-05-30T20:34:20Z

@pytorchbot merge

pytorchmergebot · 2024-05-30T20:37:01Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

albanD · 2024-05-30T20:38:20Z

@pytorchbot merge

pytorchmergebot · 2024-05-30T20:41:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-05-30T21:23:28Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-cuda12.1-py3.10-gcc9 / test (nogpu_NO_AVX2, 1, 1, linux.2xlarge)

Details for Dev Infra team

Raised by workflow job

garfield1997 · 2024-05-31T02:23:42Z

The CI failure seems unrelated to the changes in the PR, and I am unable to reproduce it locally.

FFFrog · 2024-06-03T07:02:42Z

The best Plan On Record for testing these is what is described in this rfc: pytorch/rfcs#64 but it is not done yet I'm afraid.

cc @albanD @soulitzer @garfield1997

pytorch/rfcs#64 is currently in the design and development stage. We will do our best to speed up the development and open source it as soon as possible.

As stated in the RFC, we plan to abstract and standardize the third-party device access mechanism and in theory, it can provide the following benefits:

Code abstract reuse to accelerate the integration of new elements
Standardized integration to improve integration quality of third-party devices
Streamlined backend and unified test coverage ensure the availability of third-party device integration mechanisms
End-to-end full process documentation and official DEMO

garfield1997 · 2024-06-04T00:30:51Z

@pytorchbot merge

pytorchmergebot · 2024-06-04T00:32:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-06-04T00:32:58Z

Merge failed

Reason: 3 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

garfield1997 · 2024-06-04T01:20:53Z

Could you please help trigger the merge again? @albanD

garfield1997 · 2024-06-26T07:48:17Z

@pytorchbot rebase

pytorchmergebot · 2024-06-26T07:49:33Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-06-26T07:49:36Z

Successfully rebased remove_grad_cuda_check onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout remove_grad_cuda_check && git pull --rebase)

garfield1997 · 2024-06-27T01:02:35Z

@pytorchbot merge

pytorchmergebot · 2024-06-27T01:04:09Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

garfield1997 requested review from albanD and soulitzer as code owners May 21, 2024 09:41

pytorchbot added the open source label May 21, 2024

janeyx99 reviewed May 21, 2024

View reviewed changes

janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 21, 2024

garfield1997 force-pushed the remove_grad_cuda_check branch from 2fb3012 to 7d3bfe4 Compare May 30, 2024 01:18

soulitzer reviewed May 30, 2024

View reviewed changes

albanD approved these changes May 30, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 30, 2024

pytorchmergebot added the merging label May 30, 2024

pytorchmergebot removed the merging label May 30, 2024

albanD added the topic: not user facing topic category label May 30, 2024

pytorchmergebot added the merging label May 30, 2024

pytorchmergebot removed the merging label May 30, 2024

garfield1997 force-pushed the remove_grad_cuda_check branch from 7d3bfe4 to ba8175f Compare June 3, 2024 01:36

pytorchmergebot added the merging label Jun 4, 2024

pytorchmergebot removed the merging label Jun 4, 2024

garfield1997 force-pushed the remove_grad_cuda_check branch from ba8175f to 4ce3403 Compare June 4, 2024 01:17

enable device index check for all device types

7e8d1d8

pytorchmergebot force-pushed the remove_grad_cuda_check branch from 4ce3403 to 7e8d1d8 Compare June 26, 2024 07:49

pytorchmergebot added the merging label Jun 27, 2024

pytorchmergebot closed this in 27a1440 Jun 27, 2024

pytorchmergebot added Merged and removed merging labels Jun 27, 2024

enable device index check for all device types #126767

enable device index check for all device types #126767

Uh oh!

Conversation

garfield1997 commented May 21, 2024

Uh oh!

pytorch-bot bot commented May 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126767

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

janeyx99 left a comment

Choose a reason for hiding this comment

Uh oh!

albanD commented May 21, 2024

Uh oh!

garfield1997 commented May 22, 2024

Uh oh!

albanD commented May 22, 2024

Uh oh!

garfield1997 commented May 23, 2024

Uh oh!

garfield1997 commented May 28, 2024

Uh oh!

soulitzer left a comment

Choose a reason for hiding this comment

Uh oh!

albanD commented May 30, 2024

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD commented May 30, 2024

Uh oh!

pytorchmergebot commented May 30, 2024

Merge failed

Uh oh!

albanD commented May 30, 2024

Uh oh!

pytorchmergebot commented May 30, 2024

Merge started

Uh oh!

pytorchmergebot commented May 30, 2024

Merge failed

Uh oh!

garfield1997 commented May 31, 2024

Uh oh!

FFFrog commented Jun 3, 2024

Uh oh!

garfield1997 commented Jun 4, 2024

Uh oh!

pytorchmergebot commented Jun 4, 2024

Merge started

Uh oh!

pytorchmergebot commented Jun 4, 2024

Merge failed

Uh oh!

garfield1997 commented Jun 4, 2024

Uh oh!

garfield1997 commented Jun 26, 2024

Uh oh!

pytorchmergebot commented Jun 26, 2024

Uh oh!

pytorchmergebot commented Jun 26, 2024

Uh oh!

garfield1997 commented Jun 27, 2024

Uh oh!

pytorchmergebot commented Jun 27, 2024

Merge started

Uh oh!

Uh oh!

pytorch-bot bot commented May 21, 2024 •

edited

Loading