Skip to content

[2024 CVPR] Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation #228

@Jasonlee1995

Description

@Jasonlee1995

text-to-image diffusion model은 biased or harmful images와 같은 inappropriate content를 generate할 수 있음

어떻게 하면 diffusion model이 inappropriate content를 generate하지 못하게 만들 수 있을까?

previous work - responsible alignment of diffusion models

diffusion model의 responsible alignment 관련 기존 연구들을 크게 4가지로 분류할 수 있음

  1. training dataset 정제해서 diffusion model 학습하기
    refine the training dataset to remove biased and inappropriate content
    limitation : computationally intensive, may not fully eliminate harmful content, degrade the model's performance
  2. pre-trained diffusion model fine-tune하기
    fine-tune the parameters of pre-trained models, aiming to remove the model's representation capability of generating such inappropriate concepts
    limitation : require a potentially exhaustive list of words that introduce biases and harmful concepts, sensitive to the adaptation process and may result in the degradation of the original models
  3. input prompt 보고 걸러내기
    detect and filter out inappropriate words from the input prompts
    limitation : fails to address non-explicit phrases that can still yield inappropriate outputs
  4. classifier-free guidance 사용하기
    utilize classifier-free guidance to direct the generated images away from undesirable content during inference
previous work - interpreting diffusion model in h-space

Diffusion Models already have a Semantic Latent Space
U-Net의 bottleneck layer를 semantic representation space로 볼 수 있으며, 이를 h-space로 명칭
h-space에서의 manipulation을 통해 specific semantic concept을 반영한 image generation 가능

specific semantic concept의 direction을 어떻게 찾을 수 있는가?

  1. unsupervised approach
    found vectors must be interpreted with humans in a loop
    number of interpretable directions depends on the training data
    not clear to which semantic concepts those identified vectors correspond
    some target concepts may not be found in the discovered directions
  2. supervised approach
    require training external attribute classifiers supervised by human annotations
    quality of the identified vectors is sensitive to the classifier's performance
    new concepts require the training of new classifiers

정리하면...
unsupervised approach는 사람 손도 거치면서, 우리가 원하는 target concept direction을 못찾을 수 있음
supervised approach는 classifier를 학습해야하며, classifier 성능에 sensitive하며, 새로운 concept에 대한 direction을 찾으려면 classifier를 다시 학습해야함

responsible alignment 연구들이 괜찮은 성능을 내지만, 여전히 inappropriate content를 generate함

어떻게 극복할 수 있을까?
→ h-space에서 direct manipulation하면 어떨까?

기존의 h-space 연구들로는 inappropriate concept의 direction을 찾는 것이 쉽지 않음
(unsupervised approach는 direction을 못찾을 수 있고, supervised approach는 classifier를 학습해야해서 번거로움)

해당 논문의 3가지 main contribution은 다음과 같음

  1. external model, labeled data 없이 h-space에서 원하는 concept의 direction을 찾는 self-discovery method를 제안
  2. discovered concept vector를 이용하여 responsible generation이 가능함을 보임
    responsible generation : fair generation, safe generation, responsible text-enhancing generation
  3. 해당 방법으로 좋은 성능을 냄

정리하면...
self-discovery method를 이용해서 원하는 concept의 h-space direction을 알 수 있으며
h-space manipulation이라는 approach로 inappropriate generation을 mitigate할 수 있음을 보임

self-discovery method를 이용해서 원하는 concept의 h-space direction을 알 수 있는 것이 논문의 핵심이라고 보면 됨

중요하다고 생각되는 부분만 간단히 요약

1. Approach

1.1. Finding a Semantic Concept

내가 원하는 concept의 interpretable direction을 어떻게 찾을 수 있을까?
기존 연구들은 원하는 concept의 interpretable direction을 찾기 위해 human labeled data, classifier를 사용했음
해당 방법들은 scalable하지 않음
diffusion model로 image generation해서 data를 구축하면 어떨까?
→ concept이 포함된/포함되지 않은 prompt + pre-trained model로 image generation하자

image

Figure 1
female concept의 interpretable direction 찾는 방법

  1. concept이 포함된 y+ prompt a photo of a female face로 x+ images 생성
  2. concept이 포함되지 않은 y- prompt a photo of a face로 generation하는데, x+ images가 생성되도록 concept vector를 optimize
image

pre-trained model을 frozen하고, reconstruction error를 minimize하도록 optimize
→ female image를 생성하도록 optimize되기에, context vector c는 female concept을 배우게 됨

참고로 concept vector는 single vector로, timestep과 무관함
(모든 timestep에서 single vector가 더해지게 됨)

inference할 때는 매 decoding step마다 h-space의 original activations에 context vector를 더해줌
(concept vector is added to the original activations in h-space at each decoding step)

1.2. Responsible Generation with Self-discovered Interpretable Latent Direction �Figure 2

Fair Generation Method (Figure 2)
purpose : to prevent generation of biased societal groups
train : learn a semantic concepts representing different societal groups
inference : a concept vector is sampled from the learned concepts in the societal group with equal probability

Figure 3

Safe Generation Method (Figure 3)
purpose : to prevent generation of inappropriate content
train : learn the opposite latent direction of an inappropriate concept

ex.
learn the concept of anti-sexual
y+ prompt : a gorgeous person (with negative prompt sexual)
y- prompt : a gorgeous person

Figure 4

Responsible Text-enhancing Generation Method (Figure 4)
purpose : to make generative models accurately incorporate all the concepts defined in the prompt
train : learn concepts such as gender, race, safety
inference : extract safety-related concepts from prompt and apply to original activations

사실상 fair, safe generation과 다를바가 없음
다른 점이라고 하면 text prompt에 fair, safe concept이 명시되어있다는 것
text prompt에 있는 fair, safe concept이 무시되지 않고 잘 반영되도록 해당 concept을 더하겠다라는 것

2. Experiments

2.1. Fair Generation

Task
to increase the diversity of societal groups in the generated images, particularly in professions where existing models exhibit gender and racial bias

Dataset
Winobias benchmark with original templates, hard templates
(ex. original templates : a portrait of a doctor, hard templates : a portrait of a successful doctor

Evaluation Metric
target : gender (male, female), racial (black, white, asian)
use deviation ratio to quantify the imbalance of different attributes
use CLIP classifier to predict attributes

Approach Setting
Stable Diffusion 1.4 with 7.5 guidance scale
find 5 concept vectors (male, female, black, white, asian) using a base prompt person
(ex. y+ : a photo of a woman, y- : a photo of a person → learn the concept female)

concept vectors are optimized for 10K steps on 1K synthesized images for each concept
directly employ the learned vector without any scaling

image

Table 1
our approach is significantly better than the original SD and outperforms the state-of-the-art debiasing approach UCE
despite the presence of bias in the text prompts, our approach consistently performs well as it directly operates on the latent visual space
→ generalization capability of our approach to different text prompts

image

Figure 5
quality of images generated by our approach remains consistent with the original SD

2.2. Safe Generation

Task
eliminate harmful content specified in inappropriate prompts

Dataset
I2P benchmark : 4703 inappropriate prompts from real-world user prompts
(ex. illegal activity, sexual, violence)

Evaluation Metric
accuracy
use Nudenet detector, Q16 classifier to detect nudity or violent content
if image is classified as inappropriate if any of the classifiers predict as positive

Approach Setting
learn the concept vector for each inappropriate concept defined in the I2P dataset
(ex. anti-sexual)

certain concepts are rather abstract and include diverse visual categories
adding these concepts improves safety yet at a higher cost of image quality degradation
(ex. hate)
→ use only anti-sexual, anti-violence

identified concept vectors are linearly combined as the final vector

image

Table 2
our safety vector can suppress inappropriate concepts that existing approaches failed to eliminate

2.3. Enhancing Responsible Text Guidance

Task
accurately represent the responsible phrases in the prompt in the generated image, if user prompts classified as responsible text

Dataset
create a dataset of 200 prompts that explicitly include responsible concepts
gender and race fairness, removal of sexual and violent content
(ex. a fair-gender doctor is operating a surgery, a picture of a loved couple, without sexual content)

image

Table 3
our approach effectively enhances the text guidance for responsible instructions

2.4. Semantic Concepts image

Figure 6 - interpolation
impact of manipulating image semantics by linearly controlling the strength of the concept vector
the image is gradually modified to the introduced concept by adjusting the added vector's strength
the smooth transition indicates that the discovered vector represents the target semantic concept while remaining approximately disentangled from other semantic factors

image

Figure 7 - composition
by linearly combining these concept vectors, we can control the corresponding attributes in the generated image
→ composability of learned concept vectors

image

Figure 8 - generalization
train the latent vector for the concept running on generated dog images and test its effect on other objects using prompts such as a photo of a cat
although the vector of running was learned from dogs, it successfully extends to different animals and even humans
→ generalization capability of our discovered concept vector to universal semantic concepts

image

Table 4 - impact on image quality
quality of generated images remains approximately the same level as the original SD

3. Appendix

3.1. Approach image image image image
image

Table 5
negative scaling : learn the concept directly and apply negative scaling
(ex. learn the sexual concept vector directly and obtain anti-sexual by applying a negative scaling)
negative prompt approach (+anti-sexual) outperforms the negative scaling approach (−sexual)

backpropagating on the anti-sexual vector directly aligns with the objective of minimizing harmful content
negative scaling of the concept vector is more challenging as it involves extrapolating the learned vector into untrained directions
nevertheless, both approaches yield significantly better results than the original SD

3.2. Experiment for Fair Generation image

Table 6
CLIP score evaluation on generated images from Winobias prompts
generated image is compared with the text used to generate it
similarity between the text embedding and image embedding is computed
(higher scores indicating better performance)
this experiment only quantifies the semantic alignment between the image and the input text, without considering the gender or race of the generated image

3.3. Hyperparameters for Safety Experiments image

Figure 10
as we combine more concept vectors, our approach effectively removes more harmful content
however, we observed a decrease in image quality
we find that when the concept vector has a large magnitude, it tends to shift the image generation away from the input text prompt

3.4. Responsible Text-enhancing Benchmark image

use GPT-3.5 to generate text with specified responsible phrases across 4 categories
gender fairness, race fairness, nonsexual content, nonviolent content

3.5. Semantic Concepts Visualizations

Interpolation

generation process of diffusion models involves multiple factors, such as sequential operations, manipulating a single attribute precisely using a linear vector is challenging
to ensure that the generated image remains close to the original image, we apply a technique inspired by SDEdit
during generation, we use a simple average operation

$x_{t} = (x_{t}^{(y)} + x_{t}^{(c, y)}) / 2$
average between (output without concept vectors, output with concept vectors)
this approach helps preserve more semantic structures from the original image

Composition

image

Table 8
composing vectors performed similarly to applying a single vector
→ effectiveness of the linear composition of concepts in the semantic space

Generalization

image

Figure 14
concepts learned from particular images capture more general properties that can be generalized to different prompts with similar semantics

3.6. Ablation Study image

Figure 11 (left) - number of training images
as long as the number of samples reached a reasonable level, the specific number of unique images had less impact on the performance

Figure 11 (right) - number of unique training prompts
number of unique prompts had less impact on the overall performance
learning with a particular profession is more challenging than learning with a generic prompt such as a person
adding various prompts leads to a slight improvement, but less significant than adding the number of training samples

image

Figure 15 - concept discovery with realistic dataset
using CelebA, our approach can find the semantic concepts for Stable Diffusion

Metadata

Metadata

Assignees

No one assigned

    Labels

    ML-SafetyCorruption, Consistency, Adversary, Calibration, Anomaly Detection, GAN DetectionVisionRelated with Computer Vision tasks

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions