Welcome to the VELOCITI, this repository is to provide code for Evaluation of models on VELOCITI, and provide a jupyter notebook to visualize all the data presented in the benchmark.
⭐️ For instant visualization of data samples, please visit our Project Page
Create an environment in the choice of your environment manager, and simply install the requirement via
cd VELOCITI
# activate your conda or venv environment
pip install -r environments/clip_vis_requirements.txt
The code is tested to run with python 3.10.14
.
- The data is available Here as a
.zip
file. - Either manually visit the link and donwload the
velociti_data.zip
in the root of this directory, or - Download the
velociti_data.zip
viagdown
pip install gdown
cd VELOCITI
Then in a python terminal or a file,
import gdown
gdown.download('https://drive.google.com/uc?id=1aKxJL-xv6rS9ChqeLtokKIXBGaRMdD9w', 'velociti_data.zip')
Unzip the data, and you should see the directory structure as below.
.
├── LICENSE.txt
├── data
│ ├── action_adv.json
│ ├── action_bind.json
│ ├── action_mod.json
│ ├── agent_bind.json
│ ├── agent_iden.json
│ ├── control.json
│ ├── coref.json
│ ├── frames [900 entries]
│ ├── pos_caps.json
│ ├── sequence.json
│ ├── vidsitu_dict.json
│ └── videos
├── velociti_videos_10s [900 entries]
└── velociti_videos_4s [900 entries]
Ready for browsing the provided Jupyter Notebook !
Activate the same environment set-up above,
conda activate velo
pip install -r environments/requirements.txt
ViFi-CLIP model has to be manually downloaded and placed within the folder .hfcache
(the default cache directory as in the code) in the root directory of this repository. Precisely, this link may be used.
Rest all the models, will be automatically downloaded by the script, inside the directory .hfcache
in the root folder.
Note: It is observed that CLIP-ViP
maybe slow to download, the script downloads the model and might consume some time in doing so. The model is available here, if required.
If, you wish a different path for the cache files, modify the main_eval.py
file accordingly, and also the above cache paths locations of the model files.
After ensuring the above directory structure, simply run
python main_eval.py --num_workers 4 \
--all \
--output output \
--exhaustive_log \
--seed 1000
This will download and run the evaluation on all the models.
If a specific model is to be checked (say clip_B_32
), then run,
python main_eval.py --num_workers 4 \
--model clip_B_32 \
--output output \
--exhaustive_log \
--seed 1000
The exhasutive_log
saves the output for every sample in the benchmark, and if that level of logging is not required, simply run the evaluation without it, by
python main_eval.py --num_workers 4 \
--all \
--output output \
--seed 1000
To establish that captions in VELOCITI indeed require visual modality and can't simply be solved by a textual reasoning model, we evaluate and provide the script for VERA, a plausibility estimation model.
The scripts with some more instructions are in the folder VERA/
We also provide the scripts for evaluating the Video-LLMs in our work. These scripts may need to be updated as per changes in their original repositories. Though cloning this repo and working on inferencing these models might work, we recommend cloning respective model as separate projects, and setting up separate environments as per their documentations.
This can be evaluated by running the below command with the test name. Scripts are provided in the folder video_llava/
python video_llava_eval.py --test ivat \
--output output \
--seed 1000
Please refer to the folder pllava/
Please check the folder videocon/
Please check the folder gemini/
If you find our work useful, please cite as below
@article{velociti,
title={VELOCITI: Can Video-Language Models Bind Semantic Concepts Through Time?},
author={Saravanan, Darshana and Singh, Darshan and Gupta, Varun and Khan, Zeeshan and Gandhi, Vineet and Tapaswi, Makarand},
journal={arXiv:2406.10889},
year={2024}
}
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.