How does the training of gligen work? #10709

Putzzmunta · 2025-02-03T14:13:29Z

Putzzmunta
Feb 3, 2025

Hi everybody,
I have some hard time understanding how and why the training script of the gligen example works.

In the StableDiffusionGLIGENTextImagePipeline there are the following attributes listed for the cross attention:

{
            "boxes": boxes,
            "masks": masks,
            "phrases_masks": phrases_masks,
            "image_masks": image_masks,
            "phrases_embeddings": phrases_embeddings,
            "image_embeddings": image_embeddings,
}

Whereas in the `train_gligen_text.py the following ones are listed:

cross_attention_kwargs["gligen"] = {
                    "boxes": batch["boxes"],
                    "positive_embeddings": batch["text_embeddings_before_projection"],
                    "masks": batch["masks"],
}

How is this working, when the attributes are different?

Thank you all in advance :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does the training of gligen work? #10709

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How does the training of gligen work? #10709

Putzzmunta Feb 3, 2025

Replies: 0 comments

Putzzmunta
Feb 3, 2025