You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for your work! I've recently been reading the source code of SmartMoE and noticed that there is no implementation for transferring the optimizer state of experts in the update_expert_mapping function in layer.py. Could this potentially cause issues with gradient updates?
The text was updated successfully, but these errors were encountered:
Thank you for your work! I've recently been reading the source code of SmartMoE and noticed that there is no implementation for transferring the optimizer state of experts in the
update_expert_mapping
function inlayer.py
. Could this potentially cause issues with gradient updates?The text was updated successfully, but these errors were encountered: