Full SDXL Model #67

jazcollins · 2023-09-25T22:17:39Z

This PR contains the full implementation of Stable Diffusion XL (SDXL). SDXL uses two text encoders/tokenizers and also takes crop & size parameters from the dataloader as conditioning - a majority of the changes here are for supporting that.

A high-level description of the changes for each file:

diffusion/datasets/image_caption.py

Added rand_crop flag to choose between LargestCenterSquare & RandomCropSquare - previously only center cropping was supported. This is relevant to SD2 if one might want to train with random cropping, but doesn't apply to SDXL
Infer whether or not we're doing SDXL training by the tokenizer_name_or_path
Use RandomCropSquareReturnTransform for SDXL, which returns the cropping parameters used as well as original image size (for training SDXL with micro-conditioning) and return micro-conditioning as part of the training batch
Add option to do micro-conditioning dropout (by setting microcond_drop_prob flag). This is not discussed in the SDXL paper but is reflected in Stability AI's implementation
Small changes necessary for using SDXLTokenizer

diffusion/datasets/laion/transforms.py

Implementation of random cropping
RandomCropSquare (does random crop only) and RandomCropSquareReturnTransform (does random crop and returns crop params)

diffusion/models/layers.py

Contains attention processor classes for QKV clipping
Contains zero_module function used in SDXL init

diffusion/models/models.py

Add option to do QKV clipping for both SD2 and SDXL (clip_qkv argument)
For SDXL, instantiate SDXLTokenizer and SDXLTextEncoder which contain the two tokenizers/text encoders but mostly can be used as if they are one tokenizer/text encoder
For SDXL, do zero init trick

diffusion/models/stable_diffusion.py

Pass sdxl flag to StableDiffusion to indicate if we are training an SDXL model
Set appropriate latent_scale for SD2 vs. SDXL
Extract pooled_conditioning from SDXL text encoder, which is used in micro-conditioning
Construct micro-conditioning dict and pass it to the UNet forward call
In StableDiffusion.generate(...), allow user to pass crop_params and size_params for SDXL micro-conditioning, otherwise set them to reasonable default values
In StableDiffusion.generate(...), new flag called zero_out_negative_prompt that zero's out the negative prompt if it is empty (rather than tokenizing and encoding the empty string). This was added to match the behavior of the diffusers StableDiffusionXLPipeline and in general this just seems like a good thing to do. Note: I set the default value to be True, so this means previously made generations (e.g. with SD2) will look different despite using the same prompt/seed. Obviously can set it to False to match previous results.

diffusion/callbacks/log_diffusion_images.py

Edits to support multiple tokenizers

setup.py

Pin diffusers version to 0.21.0 - this is necessary because attention processor implementations are tied to this version

There are a few remaining things I'd like to add, but this is already a big enough PR. I will add the following once this one is merged in:

Zero-ing out dropped captions rather than tokenizing/encoding the empty string during training (this can apply to SD2 training as well)
Adding a CenterCropSquareReturnTransform transformation that can be used for COCO eval. Currently with SDXL training we do random crop for the eval dataset as well
Allow user to pass different prompts to the different text encoders in SDXL inference

…talled

Landanjs

This is great work!! It seems really difficult to manage the SDXL specific features and the SD2 features in one model, but you did the best organization possible. The few possible refactors may not be that much better. I proposed a few suggestions, but up to your discretion if you think they are good / worth the effort.

The only possible bug I noticed was in log_diffusion_images.py.

One general proposal: there are a few if-statements due to slight differences in the SDXL and HF tokenizers. If you're up for it, I think it would clean up a bit of code if could make them more similar. I think this requires two things:

Having a max_length argument in the SDXL Tokenizers and in the code set max length by max_length = None if self.sdxl else self.tokenizer.model_max_length.
Have the SDXL tokenizer return of dictionary with the key input_ids

diffusion/datasets/laion/transforms.py

diffusion/datasets/image_caption.py

diffusion/callbacks/log_diffusion_images.py

diffusion/models/layers.py

diffusion/models/models.py

diffusion/models/stable_diffusion.py

Co-authored-by: Landan Seguin <[email protected]>

Landanjs

Amazing!! Thanks for all the fixes!

diffusion/models/models.py

Skylion007 · 2023-10-04T21:39:13Z

diffusion/models/models.py

+    torch_dtype = torch.float16 if encode_latents_in_fp16 else None
+    try:
+        vae = AutoencoderKL.from_pretrained(vae_model_name, subfolder='vae', torch_dtype=torch_dtype)
+    except:  # for handling SDXL vae fp16 fixed checkpoint


The blanket except isn't great here. We should probably qualify the exception type here.

Skylion007 · 2023-10-04T21:39:40Z

diffusion/models/models.py

+            attn_processor = ClippedXFormersAttnProcessor(clip_val=clip_qkv)
+        else:
+            attn_processor = ClippedAttnProcessor2_0(clip_val=clip_qkv)
+        log.info('Using %s with clip_val %.1f' % (attn_processor.__class__, clip_qkv))


Remove the '%' sign, logger can automatically generate the strings by itself.

Skylion007

Great PR, just some minor nits

Skylion007 · 2023-10-04T21:40:18Z

diffusion/models/layers.py

+        hidden_states = attn.to_out[1](hidden_states)
+
+        if input_ndim == 4:
+            assert channel


print the value of the assert, so we know why it's False (None, 0, etc.)

Suggested change

assert channel

assert channel, f"{channel}"

felixdae · 2024-03-12T07:07:25Z

why we need to clip qkv

jazcollins and others added 17 commits September 20, 2023 13:42

random crop

ec57395

zero init trick

d34d7e9

add intentionally buggy clipping

fb856c4

fix docstring and update diffusers version

4dd3c40

fix attention clipping, add to sdxl, fix xformers import when not ins…

c1d58c9

…talled

big sdxl commit, no style check

f14018a

fix style and pyright

45c1ac0

print sdxl statement

e873717

add sdxl logic to generate

d93fbdb

allow setting SDXLTextEncoder device

75db76f

sdxltextencoder edits

26a133d

split conditioning

320c7a1

remove prints

c2d0321

microconditioning and cleaning up comments

12217fc

fix style

06b2f2f

fix dropout dtype

77b0099

rm local streaming

9b798c6

jazcollins requested review from Landanjs and coryMosaicML September 25, 2023 22:17

Landanjs reviewed Sep 26, 2023

View reviewed changes

jazcollins and others added 8 commits September 27, 2023 17:07

Update diffusion/datasets/image_caption.py

f798258

Co-authored-by: Landan Seguin <[email protected]>

use RandomCrop, fix LogDiffusionImages bug

505b850

have tokenizers pass dict output

054d1ef

add to layers.py docs

65b1a8b

override prediction_type in inference_noise_scheulder

dd35c77

Update diffusion/models/stable_diffusion.py

45c0cd7

Co-authored-by: Landan Seguin <[email protected]>

fix style

2346076

log_diffusion_images.py fix

8d8e2ae

jazcollins requested a review from Landanjs October 3, 2023 16:58

pass tokenized prompts as batch_size x 2 x max_length shape

8b65e7d

jazcollins added 3 commits October 3, 2023 15:30

stack tokenizer output to match

3ae948e

fix negative prompt classifier free guidance

daf745c

_prepare_text_embeddings fix

0a1f31a

Landanjs approved these changes Oct 3, 2023

View reviewed changes

diffusion/models/models.py Show resolved Hide resolved

add negative_prompt_embeds to zero_out_negative_prompt check

67e0316

jazcollins merged commit 35f5a57 into mosaicml:main Oct 4, 2023
7 checks passed

Skylion007 reviewed Oct 4, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full SDXL Model #67

Full SDXL Model #67

jazcollins commented Sep 25, 2023

Landanjs left a comment

Landanjs left a comment

Skylion007 Oct 4, 2023

Skylion007 Oct 4, 2023

Skylion007 left a comment

Skylion007 Oct 4, 2023

felixdae commented Mar 12, 2024

Full SDXL Model #67

Full SDXL Model #67

Conversation

jazcollins commented Sep 25, 2023

Landanjs left a comment

Choose a reason for hiding this comment

Landanjs left a comment

Choose a reason for hiding this comment

Skylion007 Oct 4, 2023

Choose a reason for hiding this comment

Skylion007 Oct 4, 2023

Choose a reason for hiding this comment

Skylion007 left a comment

Choose a reason for hiding this comment

Skylion007 Oct 4, 2023

Choose a reason for hiding this comment

felixdae commented Mar 12, 2024