-
Notifications
You must be signed in to change notification settings - Fork 864
init #2852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
init #2852
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
diffusers-quantization.md
Outdated
|
||
**BF16:** | ||
|
||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also provide an actual caption for the figure.
diffusers-quantization.md
Outdated
**BnB 4-bit:** | ||
|
||
 | ||
|
||
**BnB 8-bit:** | ||
 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we combine three images here?
- BF16
- 4bit
- 8bit
Along with the caption?
diffusers-quantization.md
Outdated
**BnB 8-bit:** | ||
 | ||
|
||
| BnB Precision | Memory after loading | Peak memory | Inference time | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's include the Bf16 numbers too.
| Q8_0 | 21.502 GB | 25.973 GB | 15 seconds | | ||
| Q2_k | 13.264 GB | 17.752 GB | 26 seconds | | ||
|
||
**Example (Flux-dev with GGUF Q4_1)** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have to be exhaustive about showing snippets for every configuration unless they vary significantly from one another.
|
||
For more information check out the [GGUF docs](https://huggingface.co/docs/diffusers/quantization/gguf). | ||
|
||
### FP8 Layerwise Casting (`enable_layerwise_casting`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also write that can be combined with group_offloading
:
https://huggingface.co/docs/diffusers/en/optimization/memory#group-offloading
diffusers-quantization.md
Outdated
|
||
We created a setup where you can provide a prompt, and we generate results using both the original, high-precision model (e.g., Flux-dev in BF16) and several quantized versions (BnB 4-bit, BnB 8-bit). The generated images are then presented to you and your challenge is to identify which ones came from the quantized models. | ||
|
||
Try it out [here](https://huggingface.co/spaces/derekl35/flux-quant)! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's embed the space inside the blog post.
diffusers-quantization.md
Outdated
Here's a quick guide to choosing a quantization backend: | ||
|
||
* **Easiest Memory Savings (NVIDIA):** Start with `bitsandbytes` 4/8-bit. | ||
* **Prioritize Inference Speed:** `torchao` + `torch.compile` offers the best performance potential. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GGUF also supports torch.compile()
. So does bitsandbytes
. I think we should mention that.
* **Simplicity (Hopper/Ada):** Explore FP8 Layerwise Casting (`enable_layerwise_casting`). | ||
* **For Using Existing GGUF Models:** Use GGUF loading (`from_single_file`). | ||
|
||
Quantization significantly lowers the barrier to entry for using large diffusion models. Experiment with these backends to find the best balance of memory, speed, and quality for your needs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we hint the readers that they can expect a follow-up blog around training with quantization?
Co-authored-by: Sayak Paul <[email protected]>
### bitsandbytes (BnB) | ||
|
||
[`bitsandbytes`](https://github.com/bitsandbytes-foundation/bitsandbytes) is a popular and user-friendly library for 8-bit and 4-bit quantization, widely used for LLMs and QLoRA fine-tuning. We can use it for transformer-based diffusion and flow models, too. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is what you meant by combining the images
![]() |
@ChunTeLee possible to reduce the size of the middle object a bit so that "exploring" and "quantization" words are clear? |
Preparing the Article
md
file. You can also specifyguest
ororg
for the authors.