Skip to content

DeepSpeed pipeline parallelization #55

@mjyb16

Description

@mjyb16

DeepSpeed is a parallelization package for training large neural nets across multiple GPUs. One of their features is pipeline parallelism, which works by taking a PyTorch sequential nn object and splitting the sequential layers across different GPUs. Since Caskade is a nested, sequential code, I suspect we could modify it to enable such sequential evaluation, which would open the door to extremely large models and complex nested module structures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions