Skip to content

Example Tensor Parallelism Optimizer Bug #1325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nrothGIT opened this issue Apr 6, 2025 · 0 comments
Open

Example Tensor Parallelism Optimizer Bug #1325

nrothGIT opened this issue Apr 6, 2025 · 0 comments

Comments

@nrothGIT
Copy link
Contributor

nrothGIT commented Apr 6, 2025

📚 Documentation

I believe the optimizer in this example should be declared after the parallelize module call, as in sequence parallelism. Without this, in latest torch, the example seems to not update the weights and thus not truly train. Please lmk if im missing anything and thanks so much for all your work!

Tiny fix PR below:
#1324

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant