The implementation of `update_expert_mapping` does not work for optimizer states of experts #3

qyhfrank · 2024-05-01T14:34:04Z

Thank you for your work! I've recently been reading the source code of SmartMoE and noticed that there is no implementation for transferring the optimizer state of experts in the update_expert_mapping function in layer.py. Could this potentially cause issues with gradient updates?

The text was updated successfully, but these errors were encountered:

zms1999 · 2024-05-01T14:37:58Z

I have noticed this problem. I am working on it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The implementation of `update_expert_mapping` does not work for optimizer states of experts #3

The implementation of `update_expert_mapping` does not work for optimizer states of experts #3

qyhfrank commented May 1, 2024

zms1999 commented May 1, 2024

The implementation of update_expert_mapping does not work for optimizer states of experts #3

The implementation of update_expert_mapping does not work for optimizer states of experts #3

Comments

qyhfrank commented May 1, 2024

zms1999 commented May 1, 2024

The implementation of `update_expert_mapping` does not work for optimizer states of experts #3

The implementation of `update_expert_mapping` does not work for optimizer states of experts #3