-
Notifications
You must be signed in to change notification settings - Fork 19.6k
Add Autoconfig and Coordinated_Optimizer implementations for Tensor Parallel Autosharding #21703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…uhana/keras into Tensor_parallel_keras
Summary of ChangesHello @buildwithsuhana, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances Keras's distributed training capabilities by introducing automatic tensor parallelism. It provides the necessary infrastructure to intelligently shard large model layers across multiple devices, enabling the training of models that would otherwise exceed the memory capacity of a single accelerator. The changes include a system for automatically determining sharding strategies for model layers and a specialized optimizer that coordinates state and gradient updates across the sharded components. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces significant new functionality for tensor parallelism autosharding. The implementation is split across autoconfig.py
for sharding plan generation and coordinated_optimizer.py
for synchronized training. My review has identified a few issues ranging from critical to medium severity. Notably, there's a critical bug in TensorParallelOptimizer
that prevents the use of learning rate schedules. I've also found some high-severity maintainability concerns in CoordinatedOptimizer
due to its reliance on parsing internal variable names, which is fragile. Additionally, there are potential correctness issues in autoconfig.py
and some violations of the Keras API design style guide. Addressing these points will improve the robustness and maintainability of this new feature.
keras/src/distribution/tensor_parallel/coordinated_optimizer.py
Outdated
Show resolved
Hide resolved
keras/src/distribution/tensor_parallel/coordinated_optimizer.py
Outdated
Show resolved
Hide resolved
keras/src/distribution/tensor_parallel/coordinated_optimizer.py
Outdated
Show resolved
Hide resolved
keras/src/distribution/tensor_parallel/coordinated_optimizer.py
Outdated
Show resolved
Hide resolved
keras/src/distribution/tensor_parallel/coordinated_optimizer_test.py
Outdated
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #21703 +/- ##
==========================================
- Coverage 82.59% 82.22% -0.38%
==========================================
Files 572 580 +8
Lines 58327 58906 +579
Branches 9131 9232 +101
==========================================
+ Hits 48177 48437 +260
- Misses 7818 8112 +294
- Partials 2332 2357 +25
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This PR introduces support for tensor parallelism autosharding in Keras, enabling users to shard large model layers across multiple devices. This is a crucial feature for training models that are too large to fit into the memory of a single accelerator.
The implementation is centered around two new components:
autoconfig.py: This module contains the logic to analyze a Keras model, identify sharding candidates (e.g., Dense, EinsumDense layers), and generate a sharding plan.
coordinated_optimizer.py: This is an optimizer wrapper that consumes the sharding plan. During training, it intercepts gradients for sharded variables and performs a collective AllReduce to ensure weight updates are correctly synchronized across all devices.