Skip to content

Feature: Mixture of experts with token dropping #44

@hatanp

Description

@hatanp

Megatron-DeepSpeed supports MoE as seen from examples_deepspeed/MoE, there is some support for PP and TP as well introduced recently: deepspeedai#373 However I could not get this easily running, maybe I was missing some recent deepspeed updates that required this one.

Megatron-LM has some MoE support but the older version that can easily be ported to any accelerator is lacking drop token support. Maybe that can be ported easily still?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions