Skip to content

Distributed group information for MOE layer #20632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
preddy5 opened this issue Mar 10, 2025 · 0 comments
Open

Distributed group information for MOE layer #20632

preddy5 opened this issue Mar 10, 2025 · 0 comments
Labels
docs Documentation related needs triage Waiting to be triaged by maintainers

Comments

@preddy5
Copy link

preddy5 commented Mar 10, 2025

📚 Documentation

Thank you for maintaining this amazing repository.

I am integrating MOE layers into my model architecture, which I am training using lightning.
I am using megablocks implementation due to its wider adoption. One of the variables required to enable moe_expert_model_parallelism is distributed group information(https://github.com/databricks/megablocks/blob/main/megablocks/layers/memory_test.py#L97C5-L97C10). I am wondering if there is a way to access this information in LightningModule before model initialization.

I would appreciate any guidance you can provide on how to access the group variable, even if it is not straightforward with the current lightning API. Thank you very much for your time and help!

Regards,
Pradyumna.

cc @lantiga @Borda

@preddy5 preddy5 added docs Documentation related needs triage Waiting to be triaged by maintainers labels Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation related needs triage Waiting to be triaged by maintainers
Projects
None yet
Development

No branches or pull requests

1 participant