[docs] CP #12331

stevhliu · 2025-09-15T19:53:29Z

companion docs for context parallelism with Ring/Ulysses attention (see #11941)

HuggingFaceDocBuilderDev · 2025-09-15T20:01:14Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

Thanks! I think we should merge this PR in #11941.

I will try to gather some benchmarks to include in the docs.

docs/source/en/training/distributed_inference.md

sayakpaul · 2025-09-22T09:02:17Z

docs/source/en/training/distributed_inference.md

+    pipeline.transformer.parallelize(config=ContextParallelConfig(ring_degree=2))
+    pipeline.transformer.set_attention_backend("flash")


Is it better to call parallelize() after loading the model, or is it better to pass a parallel config when initializing the model? Or are both approaches same?

Oh I think it's just enable_prallelism() now.

docs/source/en/training/distributed_inference.md

sayakpaul · 2025-09-22T09:03:35Z

docs/source/en/training/distributed_inference.md

+[`ContextParallelConfig`] also supports Ulysses Attention through the `ulysses_degree` argument. This determines the number of devices to use for Ulysses Attention.
+
+```py
+pipeline.transformer.parallelize(config=ContextParallelConfig(ulysses_degree=2))


Where is ParallelConfig used?

I didn't include ParallelConfig because it seems like you just pass ContextParallelConfig to it. So I opted to use ContextParallelConfig directly.

Is the ParallelConfig class meant to support other parallelism strategies not yet implemented?

@sayakpaul So, the intention with ParallelConfig is to support different kinds of parallelism easily. If you pass just ContextParallelConfig, it just create a ParallelConfig using that automatically.

I think current example is sufficient but we can ofcourse revision once there is more parallelisms supported natively

Sure, thanks! Can we supplement a ParallelConfig as well? 👀

Also, I don't see any parallelize() method in #11941

a-r-r-o-w · 2025-09-24T11:20:16Z

Sorry for the delay! Please LMK if I can help with anything :) The CP PR is currently blocked because I can't make updates to it (the branch is in the diffusers repo and not a personal fork, so I can't push changes). Hopefully someone can address the tests there and we can proceed here too

a-r-r-o-w

Thanks @stevhliu ! LGTM in general, but the examples are outdated a bit. The latest inference snippet removes enable_parallelism and handles that internally.

The final code looks like this: #11941 (comment)

Sorry for the inconvenience! I forgot to update the description of that PR

stevhliu · 2025-09-25T16:06:38Z

Ah my bad, I missed that! Code snippet should be updated now. Let me know if there are any more changes :)

a-r-r-o-w · 2025-09-25T19:09:24Z

docs/source/en/training/distributed_inference.md

+[`ContextParallelConfig`] supports Ulysses Attention through the `ulysses_degree` argument. This determines how many devices to use for Ulysses Attention.
+
+```py
+pipeline.transformer.parallelize(config=ContextParallelConfig(ulysses_degree=2))


Suggested change

pipeline.transformer.parallelize(config=ContextParallelConfig(ulysses_degree=2))

pipeline.transformer.enable_parallelism(config=ContextParallelConfig(ulysses_degree=2))

Just one last change and this should be good I think. Off to you @sayakpaul

sayakpaul

Looking good. I would also link the distributed_inference doc from parallel.md.

docs/source/en/training/distributed_inference.md

yh8899 · 2025-09-30T09:17:06Z

I use this demo code run flux, but the result have little different between cp and original. Is there have numerical stability in cp? This is my code:

import torch
from diffusers import AutoModel, FluxPipeline, ContextParallelConfig

try:
    torch.distributed.init_process_group("nccl")
    rank = torch.distributed.get_rank()
    device = torch.device("cuda", rank % torch.cuda.device_count())
    torch.cuda.set_device(device)

    model_id = "black-forest-labs/FLUX.1-dev"
    transformer = AutoModel.from_pretrained(model_id, subfolder="transformer", torch_dtype=torch.bfloat16)
    pipeline = FluxPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.bfloat16).to(device)
    pipeline.transformer.set_attention_backend("_native_cudnn")
    
    pipeline.transformer.enable_parallelism(config=ContextParallelConfig(ring_degree=2))

    prompt = """
    cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
    highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
    """

    # Must specify generator so all ranks start with same latents (or pass your own)
    generator = torch.Generator(device="cpu").manual_seed(42)
    image = pipeline(prompt, num_inference_steps=50, generator=generator).images[0]

    if rank == 0:
        image.save("output_cp.png")

except Exception as e:
    print(f"An error occurred: {e}")
    torch.distributed.breakpoint()
    raise

finally:
    if torch.distributed.is_initialized():
        torch.distributed.destroy_process_group()

No cp:

import torch
from diffusers import AutoModel, FluxPipeline, ContextParallelConfig


model_id = "black-forest-labs/FLUX.1-dev"
transformer = AutoModel.from_pretrained(model_id, subfolder="transformer", torch_dtype=torch.bfloat16)
pipeline = FluxPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.bfloat16).to("cuda")
pipeline.transformer.set_attention_backend("_native_cudnn")

prompt = """
cinematic film still of a cat sipping a margarita in a pool in Palm Springs, California
highly detailed, high budget hollywood movie, cinemascope, moody, epic, gorgeous, film grain
"""

# Must specify generator so all ranks start with same latents (or pass your own)
generator = torch.Generator(device="cpu").manual_seed(42)
image = pipeline(prompt, num_inference_steps=50, generator=generator).images[0]

image.save("output.png")

sayakpaul

Thanks for iterating!

stevhliu requested review from sayakpaul and DN6 September 15, 2025 20:01

sayakpaul changed the base branch from main to attn-dispatcher-cp-and-training September 22, 2025 08:47

sayakpaul changed the base branch from attn-dispatcher-cp-and-training to main September 22, 2025 08:48

sayakpaul reviewed Sep 22, 2025

View reviewed changes

stevhliu force-pushed the cp branch from 3287205 to d3425a7 Compare September 22, 2025 21:17

stevhliu force-pushed the cp branch from d3425a7 to d7f2e88 Compare September 24, 2025 19:05

sayakpaul requested a review from a-r-r-o-w September 25, 2025 04:02

a-r-r-o-w approved these changes Sep 25, 2025

View reviewed changes

stevhliu marked this pull request as ready for review September 25, 2025 16:06

a-r-r-o-w reviewed Sep 25, 2025

View reviewed changes

stevhliu requested a review from sayakpaul September 26, 2025 15:52

sayakpaul reviewed Sep 29, 2025

View reviewed changes

docs/source/en/training/distributed_inference.md Outdated Show resolved Hide resolved

docs/source/en/training/distributed_inference.md Outdated Show resolved Hide resolved

docs/source/en/training/distributed_inference.md Outdated Show resolved Hide resolved

stevhliu added 6 commits September 29, 2025 11:39

init

8542d45

feedback

2c3a66a

feedback

52afff3

feedback

0af2b9b

feedback

0eed9f5

feedback

b58e74b

stevhliu force-pushed the cp branch from 0f4932c to b58e74b Compare September 29, 2025 18:49

feedback

0aa68b7

sayakpaul approved these changes Sep 30, 2025

View reviewed changes

stevhliu merged commit d7a1a03 into huggingface:main Sep 30, 2025
1 check passed

stevhliu deleted the cp branch September 30, 2025 16:33

		pipeline.transformer.parallelize(config=ContextParallelConfig(ring_degree=2))
		pipeline.transformer.set_attention_backend("flash")

	pipeline.transformer.parallelize(config=ContextParallelConfig(ulysses_degree=2))
	pipeline.transformer.enable_parallelism(config=ContextParallelConfig(ulysses_degree=2))

[docs] CP #12331

[docs] CP #12331

Uh oh!

Conversation

stevhliu commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 15, 2025

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

a-r-r-o-w commented Sep 24, 2025

Uh oh!

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

stevhliu commented Sep 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yh8899 commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

stevhliu commented Sep 15, 2025 •

edited

Loading

yh8899 commented Sep 30, 2025 •

edited

Loading