This repository is a mini-project for parameter-efficient finetuning with Orthogonal Finetuning (OFT) on a pretrained text-to-image model.
The downstream task is subject-driven generation: teaching a diffusion model to generate a specific plush dog toy from a small set of personal photos.
Learn a new visual concept token, sksdog, that represents the user's plush dog, then compare image generation results:
- before finetuning
- after OFT finetuning
- Base task: subject-driven generation with Stable Diffusion
- PEFT method: OFT
- Dataset: 8-15 curated images of the same plush dog
data/
plush_dog/
images/ # put training images here
captions/ # one .txt caption per image
metadata.csv # image-to-caption mapping
configs/
oft_plush_dog.yaml # editable experiment config
scripts/
prepare_dataset.py # checks files and builds metadata
generate_captions.py # bootstrap caption files
train_oft_dreambooth.py # single-device OFT training script
sample_prompts.txt # evaluation prompts
report/
report_template.md # 3-page report draft
requirements.txt
Use your plush dog photos, but start from a clean core identity set:
- keep sharp images with the dog clearly visible
- prefer single-subject images
- keep diverse backgrounds and scales
- avoid heavy blur, strong occlusion, or multiple toys in one frame for the first run
Good first-stage target:
- the orange/brown smiling plush dog
- mostly front-facing or three-quarter views
- different scenes such as grass, flowers, indoor desk, beach, park
Use a rare identifier token such as:
sksdog
Example caption:
a photo of sksdog, a cute orange-brown plush dog toy, standing in the grass
- Install dependencies:
pip install -r requirements.txt- Put curated training images into
data/plush_dog/images/. - Run caption bootstrapping if needed:
python3 scripts/generate_captions.py- Review and edit the generated captions in
data/plush_dog/captions/. - Build metadata:
python3 scripts/prepare_dataset.py- Edit
configs/oft_plush_dog.yamlif needed. - Run training:
python3 scripts/train_oft_dreambooth.py --config configs/oft_plush_dog.yaml- Check outputs in
outputs/oft-plush-dog/:
loss_curve.pngvalidation/final/unet_oft/final/text_encoder_oft/final/tokenizer/
- motivation for choosing subject-driven generation
- description of the plush dog dataset
- OFT training configuration
- training loss curve
- qualitative results before and after finetuning
- successes and failure cases
This repository is intentionally lightweight so it can be adapted to:
- local GPU training
- school server training
- Colab or another notebook workflow
If you want, the next step is to place your actual selected images into data/plush_dog/images/, and then we can refine captions and training commands together.
The repository already contains a first-pass curated set of 10 plush dog images:
- source mapping:
data/plush_dog/selected_images.csv - captions:
data/plush_dog/captions/ - metadata:
data/plush_dog/metadata.csv
You can use this set directly for the first experiment.