You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,I use both your deepspeed and tutel framework, 4 experts and 4 gpus, both Data and Expert Parallelism.
To avoid expert parameters be managed by deepspeed allreduce, code :
deepspeed.initialize( args=self.args,
model=model,
model_parameters = [param for name,param in model.named_parameters() if not hasattr(param, "skip_allruduce")],
config=ds_config )
but it doesn't work,expert parameters cann't be updated correctly. how should I do ? Thanks!
Hello,I use both your deepspeed and tutel framework, 4 experts and 4 gpus, both Data and Expert Parallelism.
To avoid expert parameters be managed by deepspeed allreduce, code :
deepspeed.initialize( args=self.args,
model=model,
model_parameters = [param for name,param in model.named_parameters() if not hasattr(param, "skip_allruduce")],
config=ds_config )
but it doesn't work,expert parameters cann't be updated correctly. how should I do ? Thanks!
If I use both DDP and tutel framework, it's ok like this:
microsoft/Tutel#204 (comment)
How to use deepspeed correctly like DDP?
The text was updated successfully, but these errors were encountered: