-
I understand that Deepspeed supports TP for inference, but it does not support training. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
You are correct, DeepSpeed relies on client-side tensor parallelism, such as Megatron, for training. The following doc provides some details on combining DeepSpeed and Megatron for this: https://huggingface.co/blog/bloom-megatron-deepspeed. |
Beta Was this translation helpful? Give feedback.
-
I have the same problem but i didnt find the ways that use only dp and tp automatically |
Beta Was this translation helpful? Give feedback.
You are correct, DeepSpeed relies on client-side tensor parallelism, such as Megatron, for training. The following doc provides some details on combining DeepSpeed and Megatron for this: https://huggingface.co/blog/bloom-megatron-deepspeed.