Skip to content

Implicit distributed backend selection #516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

booxter
Copy link
Contributor

@booxter booxter commented Apr 30, 2025

  • chore: bump pytorch to 2.6.0+
  • feat: Rely on implicit detection of distributed backend

@mergify mergify bot added dependencies Pull requests that update a dependency file ci-failure labels Apr 30, 2025
Copy link

github-actions bot commented May 1, 2025

E2E (NVIDIA L40S x4) workflow launched on this PR: View run

@mergify mergify bot removed the ci-failure label May 1, 2025
Copy link

github-actions bot commented May 1, 2025

e2e workflow succeeded on this PR: View run, congrats!

@booxter
Copy link
Contributor Author

booxter commented May 1, 2025

@tiran any particular concerns with this bump of minimal pytorch to 2.6.0+ for training library? (It's already 2.6.0+ in ilab so I'd not expect any, but better double-check...)

@booxter
Copy link
Contributor Author

booxter commented May 6, 2025

As confirmed by Doug H, this won't change versions used downstream. ilab already pulls 2.6.0+ for all flavors.

@booxter booxter marked this pull request as ready for review May 6, 2025 20:09
@booxter booxter requested review from JamesKunstle and RobotSail May 20, 2025 22:34
@mergify mergify bot added the one-approval label May 20, 2025
Copy link
Contributor

mergify bot commented May 20, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. @booxter please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 20, 2025
booxter added 2 commits May 21, 2025 08:58
This is in line with ilab repo. There are some features in later pytorch
releases that we may want to have access to.

Signed-off-by: Ihar Hrachyshka <[email protected]>
From the official docs,

```
Since 2.6, if backend is not provided, c10d will use a backend
registered for the device type indicated by the device_id kwarg (if
provided).
```

and:

```
If neither backend nor device_id is provided, c10d will detect the
accelerator on the run-time machine and use a backend registered for
that detected accelerator (or cpu).
```

While the library is still cuda centric, this is one tiny step towards
a more agnostic implementation.

Signed-off-by: Ihar Hrachyshka <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file one-approval
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants