EDiT is a diffusion transformer (DiT) model architecture. For now, E means editted as it supports text prompts instead of class labels.
EDiT adopts SDXL-VAE and CLIP to encode images and text.
This repository contains:
- 🪐 A simple PyTorch implementation of EDiT
- 🛸 Training script on ImageNet with text prompt.
Please refer to DiT and PixArt-α.
accelerate launch --mixed_precision fp16 train.py --data_path /path/to/ImageNet/train
- Support text prompt for DiT
- Training script using accelerate
- Gradio for inference
EDiT has been greatly inspired by the following amazing works and teams: