Skip to content

WenxinFan/GenAI_Miniproject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OFT Mini-Project: Personalized Plush Dog Generation

This repository is a mini-project for parameter-efficient finetuning with Orthogonal Finetuning (OFT) on a pretrained text-to-image model.

The downstream task is subject-driven generation: teaching a diffusion model to generate a specific plush dog toy from a small set of personal photos.

Project Goal

Learn a new visual concept token, sksdog, that represents the user's plush dog, then compare image generation results:

  • before finetuning
  • after OFT finetuning

Recommended Setup

  • Base task: subject-driven generation with Stable Diffusion
  • PEFT method: OFT
  • Dataset: 8-15 curated images of the same plush dog

Repository Layout

data/
  plush_dog/
    images/                 # put training images here
    captions/               # one .txt caption per image
    metadata.csv            # image-to-caption mapping
configs/
  oft_plush_dog.yaml        # editable experiment config
scripts/
  prepare_dataset.py        # checks files and builds metadata
  generate_captions.py      # bootstrap caption files
  train_oft_dreambooth.py   # single-device OFT training script
  sample_prompts.txt        # evaluation prompts
report/
  report_template.md        # 3-page report draft
requirements.txt

Dataset Curation Advice

Use your plush dog photos, but start from a clean core identity set:

  • keep sharp images with the dog clearly visible
  • prefer single-subject images
  • keep diverse backgrounds and scales
  • avoid heavy blur, strong occlusion, or multiple toys in one frame for the first run

Good first-stage target:

  • the orange/brown smiling plush dog
  • mostly front-facing or three-quarter views
  • different scenes such as grass, flowers, indoor desk, beach, park

Suggested Training Token

Use a rare identifier token such as:

  • sksdog

Example caption:

a photo of sksdog, a cute orange-brown plush dog toy, standing in the grass

Quick Start

  1. Install dependencies:
pip install -r requirements.txt
  1. Put curated training images into data/plush_dog/images/.
  2. Run caption bootstrapping if needed:
python3 scripts/generate_captions.py
  1. Review and edit the generated captions in data/plush_dog/captions/.
  2. Build metadata:
python3 scripts/prepare_dataset.py
  1. Edit configs/oft_plush_dog.yaml if needed.
  2. Run training:
python3 scripts/train_oft_dreambooth.py --config configs/oft_plush_dog.yaml
  1. Check outputs in outputs/oft-plush-dog/:
  • loss_curve.png
  • validation/
  • final/unet_oft/
  • final/text_encoder_oft/
  • final/tokenizer/

What To Show In The Report

  • motivation for choosing subject-driven generation
  • description of the plush dog dataset
  • OFT training configuration
  • training loss curve
  • qualitative results before and after finetuning
  • successes and failure cases

Notes

This repository is intentionally lightweight so it can be adapted to:

  • local GPU training
  • school server training
  • Colab or another notebook workflow

If you want, the next step is to place your actual selected images into data/plush_dog/images/, and then we can refine captions and training commands together.

Current Dataset Status

The repository already contains a first-pass curated set of 10 plush dog images:

  • source mapping: data/plush_dog/selected_images.csv
  • captions: data/plush_dog/captions/
  • metadata: data/plush_dog/metadata.csv

You can use this set directly for the first experiment.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors