Feature: Mixture of experts with token dropping

Megatron-DeepSpeed supports MoE as seen from `examples_deepspeed/MoE`, there is some support for PP and TP as well introduced recently: https://github.com/microsoft/Megatron-DeepSpeed/pull/373 However I could not get this easily running, maybe I was missing some recent deepspeed updates that required this one. 

Megatron-LM has some MoE support but the older version that can easily be ported to any accelerator is lacking drop token support. Maybe that can be ported easily still?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Mixture of experts with token dropping #44

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: Mixture of experts with token dropping #44

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions