Feature/load balance add expert replacement feature for MoE model(mixtral) #187
Open
uygnef wants to merge 5 commits intoalibaba:mainfrom
Open
Feature/load balance add expert replacement feature for MoE model(mixtral) #187uygnef wants to merge 5 commits intoalibaba:mainfrom
uygnef wants to merge 5 commits intoalibaba:mainfrom
Conversation
|
fengyu05 seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
revert change: router weight use default fp32 weight
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I have added a new feature to the Megatron LM repository that introduces a load balance interval for expert replacement in Mixture of Experts (MoE) models. This feature allows for the redistribution of experts across GPUs at user-specified intervals, with the aim of achieving a balanced computational load across the GPUs by maintaining a similar number of tokens processed on each card.
Implementation Details
The load balance interval for expert replacement is controlled by a new command-line argument --load-balance-interval. Users can specify the number of steps after which the redistribution of experts should take place. The system then automatically adjusts the placement of experts to ensure an even workload distribution, improving the overall efficiency of the MoE model training.
Benefits
Parallel strategy:tp4pp2ep2, 16 GPUs, train from scratch and without aux loss
How to Use
To enable the load balance interval for expert replacement, users should use the
--load-balance-intervalargument.