-
Notifications
You must be signed in to change notification settings - Fork 6.4k
[docs] CP #12331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] CP #12331
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think we should merge this PR in #11941.
I will try to gather some benchmarks to include in the docs.
pipeline.transformer.parallelize(config=ContextParallelConfig(ring_degree=2)) | ||
pipeline.transformer.set_attention_backend("flash") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to call parallelize()
after loading the model, or is it better to pass a parallel config when initializing the model? Or are both approaches same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think it's just enable_prallelism()
now.
[`ContextParallelConfig`] also supports Ulysses Attention through the `ulysses_degree` argument. This determines the number of devices to use for Ulysses Attention. | ||
|
||
```py | ||
pipeline.transformer.parallelize(config=ContextParallelConfig(ulysses_degree=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is ParallelConfig
used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't include ParallelConfig
because it seems like you just pass ContextParallelConfig
to it. So I opted to use ContextParallelConfig
directly.
Is the ParallelConfig
class meant to support other parallelism strategies not yet implemented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sayakpaul So, the intention with ParallelConfig is to support different kinds of parallelism easily. If you pass just ContextParallelConfig, it just create a ParallelConfig using that automatically.
I think current example is sufficient but we can ofcourse revision once there is more parallelisms supported natively
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, thanks! Can we supplement a ParallelConfig
as well? 👀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't see any parallelize()
method in #11941
Sorry for the delay! Please LMK if I can help with anything :) The CP PR is currently blocked because I can't make updates to it (the branch is in the diffusers repo and not a personal fork, so I can't push changes). Hopefully someone can address the tests there and we can proceed here too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @stevhliu ! LGTM in general, but the examples are outdated a bit. The latest inference snippet removes enable_parallelism
and handles that internally.
The final code looks like this: #11941 (comment)
Sorry for the inconvenience! I forgot to update the description of that PR
Ah my bad, I missed that! Code snippet should be updated now. Let me know if there are any more changes :) |
[`ContextParallelConfig`] supports Ulysses Attention through the `ulysses_degree` argument. This determines how many devices to use for Ulysses Attention. | ||
|
||
```py | ||
pipeline.transformer.parallelize(config=ContextParallelConfig(ulysses_degree=2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pipeline.transformer.parallelize(config=ContextParallelConfig(ulysses_degree=2)) | |
pipeline.transformer.enable_parallelism(config=ContextParallelConfig(ulysses_degree=2)) |
Just one last change and this should be good I think. Off to you @sayakpaul
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. I would also link the distributed_inference
doc from parallel.md
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating!
companion docs for context parallelism with Ring/Ulysses attention (see #11941)
cc @a-r-r-o-w