Megatron-DeepSpeed supports MoE as seen from examples_deepspeed/MoE, there is some support for PP and TP as well introduced recently: deepspeedai#373 However I could not get this easily running, maybe I was missing some recent deepspeed updates that required this one.
Megatron-LM has some MoE support but the older version that can easily be ported to any accelerator is lacking drop token support. Maybe that can be ported easily still?
Megatron-DeepSpeed supports MoE as seen from
examples_deepspeed/MoE, there is some support for PP and TP as well introduced recently: deepspeedai#373 However I could not get this easily running, maybe I was missing some recent deepspeed updates that required this one.Megatron-LM has some MoE support but the older version that can easily be ported to any accelerator is lacking drop token support. Maybe that can be ported easily still?