-
Notifications
You must be signed in to change notification settings - Fork 617
Bugfix: Fix accuracy degradation caused by EPLB #4491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.11.0-dev
Are you sure you want to change the base?
Bugfix: Fix accuracy degradation caused by EPLB #4491
Conversation
53671ce to
5317e61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several bug fixes related to the Expert Placement and Load Balancing (EPLB) mechanism for Mixture-of-Experts (MoE) models, aimed at resolving accuracy degradation. The key changes include: using in-place copy_ to correctly update expert weights, padding expert maps before updating to handle dynamic shape changes, and ensuring that up-to-date weight scales are used during computation. These changes appear to correctly address the root causes of the accuracy issues. However, I've identified a potential issue in how the CPU-side expert map is updated, which could lead to runtime errors. My detailed feedback is in the comment below.
| value=-1 | ||
| ) | ||
| self.expert_map_per_layer[layer_id].copy_(updated_expert_map_padded) | ||
| self.expert_map_per_layer_cpu[layer_id].copy_(updated_expert_map) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The update to self.expert_map_per_layer_cpu using copy_ might lead to a runtime error if updated_expert_map has a different shape than the existing tensor in self.expert_map_per_layer_cpu[layer_id]. The logic to calculate pad_len just before this suggests that the shape of updated_expert_map can indeed be smaller. While the device tensor self.expert_map_per_layer is correctly updated with a padded version, this CPU copy is not, which is inconsistent and unsafe.
To prevent potential crashes, I recommend reverting this line to use clone(), which was the original implementation and is safer as it replaces the tensor entirely, handling any shape changes gracefully.
| self.expert_map_per_layer_cpu[layer_id].copy_(updated_expert_map) | |
| self.expert_map_per_layer_cpu[layer_id] = updated_expert_map.clone() |
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
Signed-off-by: Che Ruan <[email protected]>
Signed-off-by: Che Ruan <[email protected]>
6dc597b to
59b5536
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
No description provided.