[Code documentation] FluxPipeline & FluxControlNetModel with apply_group_offloading #10840

nitinmukesh · 2025-02-20T13:47:49Z

nitinmukesh
Feb 20, 2025

Note: output not tested as it is very slow at my end but the inference starts working.

FluxPipeline - offload_type="leaf_level"

import torch
from diffusers import FluxPipeline
from diffusers.hooks import apply_group_offloading

model_id = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16
pipe = FluxPipeline.from_pretrained(
	model_id,
	torch_dtype=dtype,
)

apply_group_offloading(
    pipe.transformer,
    offload_type="leaf_level",
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
)
apply_group_offloading(
    pipe.text_encoder, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level"
)
apply_group_offloading(
    pipe.text_encoder_2, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level"
)
apply_group_offloading(
    pipe.vae, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level"
)

prompt="A cat wearing sunglasses and working as a lifeguard at pool."

generator = torch.Generator().manual_seed(181201)
image = pipe(
    prompt,
	width=576,
	height=1024,
	num_inference_steps=30,
    generator=generator
).images[0]

image.save("flux_apply_group_offloading.png")
print("----Inference complete..")

nitinmukesh · 2025-02-20T13:48:45Z

nitinmukesh
Feb 20, 2025
Author

FluxControlNetModel - offload_type="leaf_level"

from diffusers import FluxTransformer2DModel, FluxPipeline, FluxControlNetModel, FluxControlNetPipeline
from transformers import T5EncoderModel
from diffusers.utils import load_image
import torch 
from diffusers.hooks import apply_group_offloading

transformer = FluxTransformer2DModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="transformer",
    torch_dtype=torch.bfloat16
)

text_encoder_2 = T5EncoderModel.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    subfolder="text_encoder_2",
    torch_dtype=torch.bfloat16
)

canny_controlnet = FluxControlNetModel.from_pretrained(
    "Xlabs-AI/flux-controlnet-canny-diffusers",
    torch_dtype=torch.bfloat16,
    use_safetensors=True,
)

xlabs_canny_pipe = FluxControlNetPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    controlnet=canny_controlnet,
    transformer=transformer,
    text_encoder_2=text_encoder_2,
    torch_dtype=torch.bfloat16
)

apply_group_offloading(
    xlabs_canny_pipe.transformer,
    offload_type="leaf_level",
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
)
apply_group_offloading(
    xlabs_canny_pipe.text_encoder, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level"
)
apply_group_offloading(
    xlabs_canny_pipe.text_encoder_2, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level"
)
apply_group_offloading(
    xlabs_canny_pipe.vae, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level"
)
apply_group_offloading(
    xlabs_canny_pipe.controlnet, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level"
)

control_image = load_image("https://huggingface.co/XLabs-AI/flux-controlnet-canny-diffusers/resolve/main/canny_example.png")

image = xlabs_canny_pipe(
    'A bear with scarf',
    control_image=control_image,
    controlnet_conditioning_scale=0.8
).images[0]
image.save("flux_controlnet_apply_group_offloading.png")

0 replies

nitinmukesh · 2025-02-20T14:14:47Z

nitinmukesh
Feb 20, 2025
Author

FluxPipeline - offload_type="leaf_level" and "block_level" [NOT WORKING requires a lot of RAM: CUDA OOM with block_level (Tested with 1, 2, 4)]

import torch
from diffusers import FluxPipeline
from diffusers.hooks import apply_group_offloading

model_id = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16
pipe = FluxPipeline.from_pretrained(
	model_id,
	torch_dtype=dtype,
)

apply_group_offloading(
    pipe.transformer,
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="block_level", 
    num_blocks_per_group=4,
    non_blocking=True
)
apply_group_offloading(
    pipe.text_encoder, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="block_level", 
    num_blocks_per_group=4,
    non_blocking=True
)
apply_group_offloading(
    pipe.text_encoder_2, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="block_level", 
    num_blocks_per_group=4,
    non_blocking=True
)
apply_group_offloading(
    pipe.vae, 
    offload_device=torch.device("cpu"),
    onload_device=torch.device("cuda"),
    offload_type="leaf_level"
)

prompt="A cat wearing sunglasses and working as a lifeguard at pool."

generator = torch.Generator().manual_seed(181201)
image = pipe(
    prompt,
	width=576,
	height=1024,
	num_inference_steps=30,
    generator=generator
).images[0]
print("----Inference complete..")
image.save("flux_apply_group_offloading.png")

0 replies

asomoza · 2025-02-20T14:33:34Z

asomoza
Feb 20, 2025
Maintainer

how much RAM do you have? I got it to run but it needs around 50GB of RAM or you will get OOM, this was just for the transformer model, I encoded the prompt before the generation.

6 replies

asomoza Feb 20, 2025
Maintainer

yeah, thanks a lot, this has no inference time cost but you need the RAM to be able to do it, RAM is a lot cheaper than buying a GPU though unless you have a laptop with soldered DIMMs which in that case, there's nothing you can do.

Can you please add python at the end of the ''' (like this ```python) on each starting code block, that enables the syntax highlighting which helps when there's a lot of code.

so for example this:

import torch
from diffusers import FluxPipeline
from diffusers.hooks import apply_group_offloading

model_id = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16
pipe = FluxPipeline.from_pretrained(
	model_id,
	torch_dtype=dtype,
)

will look like this:

import torch
from diffusers import FluxPipeline
from diffusers.hooks import apply_group_offloading

model_id = "black-forest-labs/FLUX.1-dev"
dtype = torch.bfloat16
pipe = FluxPipeline.from_pretrained(
	model_id,
	torch_dtype=dtype,
)

asomoza Feb 20, 2025
Maintainer

also ccing @a-r-r-o-w and @stevhliu, since FLUX is so popular, we should add the example to the docs no? Also maybe point out the RAM requirement to be able to use like we discussed.

nitinmukesh Feb 20, 2025
Author

Added python. Thanks for letting me know.
Yeah I have Laptop with soldered RAM. Gonna upgrade in due time.

There was an interesting feature in Forge (now not using it so don't know), You can select the GPU weight/memory and it will be used fully. I noticed Flux generations was very fast (under a minute) as full GPU VRAM was always utilized and rest spilled on RAM or Virtual. In case of offload (both types cpu and sequential) uses less and very less VRAM is used which increase the inference times. Implementing something similar will be useful.

asomoza Feb 20, 2025
Maintainer

yeah, I think ComfyUI has the same without the configuration, I mean when the VRAM is full it offloads the layers (I believe) to CPU. Maybe you can open a feature request with this so we can track it and see if someone from the team can work on it when we have the bandwidth.

nitinmukesh Feb 20, 2025
Author

Sure will do. Thank you.

nitinmukesh · 2025-02-22T14:54:57Z

nitinmukesh
Feb 22, 2025
Author

Since the code is added in documentation. closing....

0 replies

[Code documentation] FluxPipeline & FluxControlNetModel with apply_group_offloading #10840

Uh oh!

Uh oh!

nitinmukesh Feb 20, 2025

FluxPipeline - offload_type="leaf_level"

Replies: 4 comments · 6 replies

Uh oh!

Uh oh!

nitinmukesh Feb 20, 2025 Author

FluxControlNetModel - offload_type="leaf_level"

Uh oh!

Uh oh!

nitinmukesh Feb 20, 2025 Author

Uh oh!

Uh oh!

asomoza Feb 20, 2025 Maintainer

Uh oh!

Uh oh!

asomoza Feb 20, 2025 Maintainer

Uh oh!

asomoza Feb 20, 2025 Maintainer

Uh oh!

Uh oh!

nitinmukesh Feb 20, 2025 Author

Uh oh!

asomoza Feb 20, 2025 Maintainer

Uh oh!

nitinmukesh Feb 20, 2025 Author

Uh oh!

Uh oh!

nitinmukesh Feb 22, 2025 Author

nitinmukesh
Feb 20, 2025

Replies: 4 comments 6 replies

nitinmukesh
Feb 20, 2025
Author

nitinmukesh
Feb 20, 2025
Author

asomoza
Feb 20, 2025
Maintainer

asomoza Feb 20, 2025
Maintainer

asomoza Feb 20, 2025
Maintainer

nitinmukesh Feb 20, 2025
Author

asomoza Feb 20, 2025
Maintainer

nitinmukesh Feb 20, 2025
Author

nitinmukesh
Feb 22, 2025
Author