CLIP-guided text GAN

Code for our paper: CgT-GAN: CLIP-guided Text GAN for Image Captioning

All pre-processed data and pretrained models are released in BaiduPan.

The paper has been accepted by ACM MM 2023.

Dataset

All data will be placed in ~/data as an example.

MSCOCO Dataset

Download COCO images: train & val/test , put train2014 and val2014 folders in ~/data/coco (coco root directory).
Download COCO annotations: BaiduPan or GoogleDrive, put all json files in ~/data/coco/annotations.

Example of the COCO root directory folder:

./data/coco
--train2014
--val2014
--annotations

Flickr30K Dataset

Download images: train/val/test, put flickr30k_images folder in ~/data/Flickr30k.
Download annotations: annotations, put all json files in ~/data/Flickr30k/annotations.

Text Corpus

ShutterStock : download from UIC
Google Conceptual Captions : download from Train_GCC-training.tsv

Place all externel data in ~/data/externel for subsequent processing.

Data_Preprocess

We take MSCOCO Dataset and GCC external corpus as an example.

Extract clip embeddings for COCO images and captions. All extracted embedding pkl files will be saved in ~/data/coco.

python preprocess/coco/coco_train_images.py
python preprocess/coco/coco_train_captions.py
python preprocess/coco/coco_val-test.py

Extract clip embeddings for GCC captions. The pkl files will be saved in ~/data/external.
```
python preprocess/external/gcc_external_captions.py
```

Then generate aggregated textual embeddings.

python preprocess/generate_embeddings.py --image_pkl ./data/coco/coco_ViT-L_14_train_images.pkl --caption_pkl ./data/coco/coco_ViT-L_14_train_captions.pkl --image_dataset coco --caption_corpus coco --t 100
python preprocess/generate_embeddings.py --image_pkl ./data/coco/coco_ViT-L_14_train_images.pkl --caption_pkl ./data/external/gcc_ViT-L_14_external_captions.pkl --image_dataset coco --caption_corpus gcc --t 175

Initialization

Initialize generator using COCO Captions:

python initialization.py --output_dir path/to/save/folder --data ./data/coco/coco_ViT-L_14_train_captions.pkl

Initialize generator using GCC Captions:

python initialization.py --output_dir path/to/save/folder --data ./data/external/gcc_ViT-L_14_external_captions.pkl

Training

Training model under MSCOCO images <-> MSCOCO captions setting:

gpus=0,1
CUDA_VISIBLE_DEVICES=$gpus nohup python -m torch.distributed.launch \
--master_port 17527 \
--nproc_per_node 2 cgtgan.py \
--output_dir path/to/save/folder \
--generator_init path/to/init/model.pt \
--data_train ./data/coco/coco_images_coco_captions_ViT-L_14_100.pkl \
--data_val ./data/coco/coco_ViT-L_14_val.pkl \
--data_test ./data/coco/coco_ViT-L_14_test.pkl \
--text_corpus ./data/coco/coco_train_sentences.pkl \
--gt_val ./data/coco/annotations/val_caption_coco_format.json \
--gt_test ./data/coco/annotations/test_caption_coco_format.json \
--do_train \
--epochs 50 \
> coco.out &

Training model under MSCOCO images <-> GCC captions setting:

gpus=0,1
mkdir ./output/gcc
CUDA_VISIBLE_DEVICES=$gpus nohup python -m torch.distributed.launch \
--master_port 17528 \
--nproc_per_node 2 cgtgan.py \
--output_dir path/to/save/folder \
--generator_init path/to/init/model.pt \
--data_train ./data/external/coco_images_gcc_captions_ViT-L_14_175.pkl \
--data_val ./data/coco/coco_ViT-L_14_val.pkl \
--data_test ./data/coco/coco_ViT-L_14_test.pkl \
--text_corpus ./data/external/gcc_external_sentences.pkl \
--gt_val ./data/coco/annotations/val_caption_coco_format.json \
--gt_test ./data/coco/annotations/test_caption_coco_format.json \
--do_train \
--epochs 80 \
> gcc.out &

Evaluation

Test model checkpoint on MSCOCO test split:

python -u cgtgan.py \
--output_dir path/to/save/folder \
--generator_init path/to/checkpoint/model.pt \
--data_test ./data/coco/coco_ViT-L_14_test.pkl \
--gt_test ./data/coco/annotations/test_caption_coco_format.json \
--do_eval \

Cite

@article{2308.12045,
Author = {Jiarui Yu and Haoran Li and Yanbin Hao and Bin Zhu and Tong Xu and Xiangnan He},
Title = {CgT-GAN: CLIP-guided Text GAN for Image Captioning},
Year = {2023},
Eprint = {arXiv:2308.12045},
Doi = {10.1145/3581783.3611891},
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
preprocess		preprocess
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cgtgan.py		cgtgan.py
cog.yaml		cog.yaml
inference.py		inference.py
initialization.py		initialization.py
models.py		models.py
predict.py		predict.py
requirements.txt		requirements.txt
run_inference.py		run_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CLIP-guided text GAN

Dataset

MSCOCO Dataset

Flickr30K Dataset

Text Corpus

Data_Preprocess

Initialization

Training

Evaluation

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

Lihr747/CgtGAN

Folders and files

Latest commit

History

Repository files navigation

CLIP-guided text GAN

Dataset

MSCOCO Dataset

Flickr30K Dataset

Text Corpus

Data_Preprocess

Initialization

Training

Evaluation

Cite

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages