Skip to content

DeepSeek V3权重转换脚本存在bug #494

@keaidechen

Description

@keaidechen

DeepSeek V3在训练时采用20250217的megatron版本,在该版本中TP与EP进行了解耦,对于MoE层不再强制使用与Attention层相同的TP,此时转换脚本save时的三层循环会导致对于expert强制使用TP,最后多保存ep_size/tp_size倍数的参数,导致最终ckpt占用存储空间过高。

for tp_rank in range(args.tensor_model_parallel_size):
    for ep_rank in range(args.expert_model_parallel_size):
        for pp_rank in range(args.pipeline_model_parallel_size):

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions