Skip to content

Add multigpu pt #288

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft

Add multigpu pt #288

wants to merge 10 commits into from

Conversation

gkumbhat
Copy link
Collaborator

@gkumbhat gkumbhat commented Dec 1, 2023

Changes

  • Add FSDP configuration for PT trainer
  • Add torch elastic launch

NOTE: This PR to be merged only after merge and rebase with #287

This currently gives following error:

ValueError: expected to be in states [<TrainingState.IDLE: 1>] but current state is TrainingState.FORWARD_BACKWARD

gkumbhat and others added 10 commits November 26, 2023 14:49
Co-authored-by: Alex-Brooks <[email protected]>
Signed-off-by: gkumbhat <[email protected]>
Co-authored-by: Alex-Brooks <[email protected]>
Signed-off-by: gkumbhat <[email protected]>
Co-authored-by: Alex-Brooks <[email protected]>
Signed-off-by: gkumbhat <[email protected]>
Co-authored-by: Alex-Brooks <[email protected]>
Signed-off-by: gkumbhat <[email protected]>
Co-authored-by: Alex-Brooks <[email protected]>
Signed-off-by: gkumbhat <[email protected]>
Signed-off-by: gkumbhat <[email protected]>
Signed-off-by: gkumbhat <[email protected]>
Co-authored-by: Alex-Brooks <[email protected]>
Signed-off-by: gkumbhat <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant