Skip to content

init #2852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

DerekLiu35
Copy link

Preparing the Article

  • Add an entry to _blog.yml.
  • Add a thumbnail. There are no requirements here, but there is a template if it's helpful.
  • Check you use a short title and blog path.
  • Upload any additional assets (such as images) to the Documentation Images repo. This is to reduce bloat in the GitHub base repo when cloning and pulling. Try to have small images to avoid a slow or expensive user experience.
  • Add metadata (such as authors) to your md file. You can also specify guest or org for the authors.
  • Ensure the publication date is correct.
  • Preview the content. A quick way is to paste the markdown content in https://huggingface.co/new-blog. Do not click publish, this is just a way to do an early check.

@DerekLiu35 DerekLiu35 marked this pull request as ready for review May 13, 2025 20:33
@DerekLiu35
Copy link
Author

@SunMarc

@SunMarc SunMarc requested a review from sayakpaul May 14, 2025 17:17
Copy link
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.


**BF16:**

![Baroque, Futurist, and Noir style images generated with BF16 precision](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/quantization-backends-diffusers/combined_flux-dev_bf16_combined.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also provide an actual caption for the figure.

Comment on lines 37 to 42
**BnB 4-bit:**

![Baroque, Futurist, and Noir style images generated with BnB 4-bit](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/quantization-backends-diffusers/combined_flux-dev_bnb_4bit_combined.png)

**BnB 8-bit:**
![Baroque, Futurist, and Noir style images generated with BnB 8-bit](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/quantization-backends-diffusers/combined_flux-dev_bnb_8bit_combined.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine three images here?

  1. BF16
  2. 4bit
  3. 8bit

Along with the caption?

**BnB 8-bit:**
![Baroque, Futurist, and Noir style images generated with BnB 8-bit](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/quantization-backends-diffusers/combined_flux-dev_bnb_8bit_combined.png)

| BnB Precision | Memory after loading | Peak memory | Inference time |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include the Bf16 numbers too.

| Q8_0 | 21.502 GB | 25.973 GB | 15 seconds |
| Q2_k | 13.264 GB | 17.752 GB | 26 seconds |

**Example (Flux-dev with GGUF Q4_1)**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have to be exhaustive about showing snippets for every configuration unless they vary significantly from one another.


For more information check out the [GGUF docs](https://huggingface.co/docs/diffusers/quantization/gguf).

### FP8 Layerwise Casting (`enable_layerwise_casting`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also write that can be combined with group_offloading:
https://huggingface.co/docs/diffusers/en/optimization/memory#group-offloading


We created a setup where you can provide a prompt, and we generate results using both the original, high-precision model (e.g., Flux-dev in BF16) and several quantized versions (BnB 4-bit, BnB 8-bit). The generated images are then presented to you and your challenge is to identify which ones came from the quantized models.

Try it out [here](https://huggingface.co/spaces/derekl35/flux-quant)!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's embed the space inside the blog post.

Here's a quick guide to choosing a quantization backend:

* **Easiest Memory Savings (NVIDIA):** Start with `bitsandbytes` 4/8-bit.
* **Prioritize Inference Speed:** `torchao` + `torch.compile` offers the best performance potential.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GGUF also supports torch.compile(). So does bitsandbytes. I think we should mention that.

* **Simplicity (Hopper/Ada):** Explore FP8 Layerwise Casting (`enable_layerwise_casting`).
* **For Using Existing GGUF Models:** Use GGUF loading (`from_single_file`).

Quantization significantly lowers the barrier to entry for using large diffusion models. Experiment with these backends to find the best balance of memory, speed, and quality for your needs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we hint the readers that they can expect a follow-up blog around training with quantization?

### bitsandbytes (BnB)

[`bitsandbytes`](https://github.com/bitsandbytes-foundation/bitsandbytes) is a popular and user-friendly library for 8-bit and 4-bit quantization, widely used for LLMs and QLoRA fine-tuning. We can use it for transformer-based diffusion and flow models, too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is what you meant by combining the images

@ChunTeLee
Copy link

ChunTeLee commented May 15, 2025

Exploring Quantization Backends in Diffusers thumbnail Hey there, here is the thumbnail suggestion! cc @sayakpaul

@sayakpaul
Copy link
Member

@ChunTeLee possible to reduce the size of the middle object a bit so that "exploring" and "quantization" words are clear?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants