How to use both deepspeed framework and tutel framework? #6684

luuck · 2024-10-29T07:19:29Z

Hello，I use both your deepspeed and tutel framework, 4 experts and 4 gpus, both Data and Expert Parallelism.
To avoid expert parameters be managed by deepspeed allreduce, code :

deepspeed.initialize( args=self.args,
model=model,
model_parameters = [param for name,param in model.named_parameters() if not hasattr(param, "skip_allruduce")],
config=ds_config )

but it doesn't work，expert parameters cann't be updated correctly. how should I do ? Thanks!

If I use both DDP and tutel framework, it's ok like this:
microsoft/Tutel#204 (comment)

How to use deepspeed correctly like DDP?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use both deepspeed framework and tutel framework? #6684

How to use both deepspeed framework and tutel framework? #6684

luuck commented Oct 29, 2024

How to use both deepspeed framework and tutel framework? #6684

How to use both deepspeed framework and tutel framework? #6684

Comments

luuck commented Oct 29, 2024