Skip to content

Conversation

WoosungMyung
Copy link
Contributor

@WoosungMyung WoosungMyung commented Aug 9, 2025

This PR adds getter methods to DeepSpeedEngine / PipelineEngine:

  • get_parallel_world_sizes(): returns parallelism world size as a dictionary, e.g. {"tp": 4, "dp": 8}

Why

When integrating DeepSpeed with logging and experiment tracking tools such as Weights & Biases, it is often useful to record the parallelism configuration (world sizes for each Parallelism). Currently, there is no convenient getter method for each parallelism world size at once.

Benefits

  • Simplifies logging of parallelism configuration for distributed training using W&B and DeepSpeed.
  • I would like to do additional PR for automatically logging parallelism info in DeepSpeed using this PR.

Thanks for your precious time for reviewing this PR.
Thanks

@sfc-gh-truwase
Copy link
Collaborator

Thanks for this PR. Can you please add some UTs?

@WoosungMyung
Copy link
Contributor Author

@sfc-gh-truwase
Sorry for the late reply.
Would it be alright if I create the unit test in tests/unit/runtime/test_parallel_info.py? I think it would be a suitable location for this functionality, but I’m happy to adjust if you prefer a different path.

@sfc-gh-truwase
Copy link
Collaborator

@WoosungMyung that works. Thanks so much!

@WoosungMyung WoosungMyung force-pushed the feature/world_size_getter branch from be8f7df to 3f5165d Compare August 16, 2025 01:02
@WoosungMyung WoosungMyung force-pushed the feature/world_size_getter branch from 3f5165d to 59eb3cb Compare August 16, 2025 01:09
@WoosungMyung
Copy link
Contributor Author

@sfc-gh-truwase
I’ve just added unit test for the introduced get_parallel_world_sizes methods.
Please let me know if there’s anything I should refine further. Thanks for your review!

@sfc-gh-truwase
Copy link
Collaborator

@WoosungMyung thanks for creating the unit tests. However, after reviewing it I realized that I gave you misleading information in terms of the location of the tests. I apologize for that. I think the tests belong in this folder with existing TP and PP tests.

  1. We need to add appropriate asserts of the new get_parallel_world_sizes() alongside this one.
  2. We need to increase coverage of test_parallel_info.py similar to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants