Skip to content

Lihr747/CgtGAN

Repository files navigation

CLIP-guided text GAN

Code for our paper: CgT-GAN: CLIP-guided Text GAN for Image Captioning

All pre-processed data and pretrained models are released in BaiduPan.

The paper has been accepted by ACM MM 2023.

Dataset

All data will be placed in ~/data as an example.

MSCOCO Dataset

  1. Download COCO images: train & val/test , put train2014 and val2014 folders in ~/data/coco (coco root directory).
  2. Download COCO annotations: BaiduPan or GoogleDrive, put all json files in ~/data/coco/annotations.

Example of the COCO root directory folder:

./data/coco
--train2014
--val2014
--annotations

Flickr30K Dataset

  1. Download images: train/val/test, put flickr30k_images folder in ~/data/Flickr30k.
  2. Download annotations: annotations, put all json files in ~/data/Flickr30k/annotations.

Text Corpus

Place all externel data in ~/data/externel for subsequent processing.

Data_Preprocess

We take MSCOCO Dataset and GCC external corpus as an example.

  • Extract clip embeddings for COCO images and captions. All extracted embedding pkl files will be saved in ~/data/coco.
    python preprocess/coco/coco_train_images.py
    python preprocess/coco/coco_train_captions.py
    python preprocess/coco/coco_val-test.py
    
  • Extract clip embeddings for GCC captions. The pkl files will be saved in ~/data/external.
    python preprocess/external/gcc_external_captions.py
    
  • Then generate aggregated textual embeddings.
    python preprocess/generate_embeddings.py --image_pkl ./data/coco/coco_ViT-L_14_train_images.pkl --caption_pkl ./data/coco/coco_ViT-L_14_train_captions.pkl --image_dataset coco --caption_corpus coco --t 100
    python preprocess/generate_embeddings.py --image_pkl ./data/coco/coco_ViT-L_14_train_images.pkl --caption_pkl ./data/external/gcc_ViT-L_14_external_captions.pkl --image_dataset coco --caption_corpus gcc --t 175
    

Initialization

  • Initialize generator using COCO Captions:
python initialization.py --output_dir path/to/save/folder --data ./data/coco/coco_ViT-L_14_train_captions.pkl
  • Initialize generator using GCC Captions:
python initialization.py --output_dir path/to/save/folder --data ./data/external/gcc_ViT-L_14_external_captions.pkl

Training

  • Training model under MSCOCO images <-> MSCOCO captions setting:
gpus=0,1
CUDA_VISIBLE_DEVICES=$gpus nohup python -m torch.distributed.launch \
--master_port 17527 \
--nproc_per_node 2 cgtgan.py \
--output_dir path/to/save/folder \
--generator_init path/to/init/model.pt \
--data_train ./data/coco/coco_images_coco_captions_ViT-L_14_100.pkl \
--data_val ./data/coco/coco_ViT-L_14_val.pkl \
--data_test ./data/coco/coco_ViT-L_14_test.pkl \
--text_corpus ./data/coco/coco_train_sentences.pkl \
--gt_val ./data/coco/annotations/val_caption_coco_format.json \
--gt_test ./data/coco/annotations/test_caption_coco_format.json \
--do_train \
--epochs 50 \
> coco.out &
  • Training model under MSCOCO images <-> GCC captions setting:
gpus=0,1
mkdir ./output/gcc
CUDA_VISIBLE_DEVICES=$gpus nohup python -m torch.distributed.launch \
--master_port 17528 \
--nproc_per_node 2 cgtgan.py \
--output_dir path/to/save/folder \
--generator_init path/to/init/model.pt \
--data_train ./data/external/coco_images_gcc_captions_ViT-L_14_175.pkl \
--data_val ./data/coco/coco_ViT-L_14_val.pkl \
--data_test ./data/coco/coco_ViT-L_14_test.pkl \
--text_corpus ./data/external/gcc_external_sentences.pkl \
--gt_val ./data/coco/annotations/val_caption_coco_format.json \
--gt_test ./data/coco/annotations/test_caption_coco_format.json \
--do_train \
--epochs 80 \
> gcc.out &

Evaluation

  • Test model checkpoint on MSCOCO test split:
python -u cgtgan.py \
--output_dir path/to/save/folder \
--generator_init path/to/checkpoint/model.pt \
--data_test ./data/coco/coco_ViT-L_14_test.pkl \
--gt_test ./data/coco/annotations/test_caption_coco_format.json \
--do_eval \

Cite

@article{2308.12045,
Author = {Jiarui Yu and Haoran Li and Yanbin Hao and Bin Zhu and Tong Xu and Xiangnan He},
Title = {CgT-GAN: CLIP-guided Text GAN for Image Captioning},
Year = {2023},
Eprint = {arXiv:2308.12045},
Doi = {10.1145/3581783.3611891},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages