TransferCVLM is a method of efficient knowledge transfer from an existing vision-language pre-trained model to a combination of two unimodal pre-trained models for vision and language.
Python==3.8
Pytorch==1.10.0
transformers==4.35.0
i) Run "run_flava_{TASK}.py" or "run_git_{TASK}.py" to obtain teacher model.
ii) Run "run_cvlm_{TASK}.py" or "run_cvlm_captioning_{model}.py" to obtain fine-tuned CVLM model. (Phase 1)
iii) Run "transfer_flava2cvlm_{task}.py" or "transfer_git2cvlm_captioning_{model}.py"to obtain final model.(Phase 2) Requires step i) and ii) results.
iv) Run "transfer_cvlm2cvlm_{task}.py" to obtain Phase 2^MC model described in section 2.3 and 3.4. Requires step iii) and new i) results.
@inproceedings{choi2024transfercvlm,
title={TransferCVLM: Transferring Cross-Modal Knowledge for Vision-Language Modeling},
author={Choi, Dongha and Kim, Jung Jae and Lee, Hyunju},
booktitle={Findings of the Association for Computational Linguistics: EMNLP 2024},
pages={16733--16746},
year={2024}
}
