The VQA_CL (Visual Question Answering in Continual Learning) benchmark is designed to evaluate the ability of models to perform visual question answering tasks in a continual learning setting. This means the model is expected to learn from a sequence of VQA datasets, corresponding to different visual domains or task types, without catastrophically forgetting previously learned knowledge. This benchmark helps assess model adaptability, knowledge retention, and forward transfer capabilities in evolving VQA scenarios.
Please download the images from the respective datasets using the links below.
The JSON annotation files for the train/test splits of each dataset (e.g., scienceqa_train.json, vqav2_test.json, etc.) can be downloaded from the Hugging Face Hub at: ecnu-icalk/VQA_CL.
| Image Source | Download Path |
|---|---|
| COCO | train2014, test2015, val2014 |
| ScienceQA | images |
| VizWiz | train, val, test |
The benchmark expects a specific directory structure for the images and JSON annotation files.
The JSON files (e.g., scienceqa_train.json, vizwiz_test.json, vqav2_train.json) are structured as a list of objects, where each object represents a VQA instance. These JSON files are located within their respective dataset subdirectories (e.g., scienceqa/scienceqa_train.json, vqav2/vqav2_train.json).
Each object typically contains:
"image": A string representing the relative path to the image file. This path is relative to the directory containing the JSON file itself. For instance, ifscienceqa_train.jsonis in thebenchmark/VQA_CL/scienceqa/directory, animagepath like"./scienceqa/images/train/1/image.png"inside it will point tobenchmark/VQA_CL/scienceqa/scienceqa/images/train/1/image.png."conversations": A list of dialogue turns. Each turn is an object with:"from": A string indicating the source of the text, either"human"(for questions) or"gpt"(for answers)."value": A string containing the actual question or answer text. Questions may include an<image>token, which is a placeholder for the visual context.
Example JSON structure (from an entry in vqav2_train.json, assuming it's located in benchmark/VQA_CL/vqav2/):
[
{
"image": "./COCO2014/train2014/COCO_train2014_000000458752.jpg", // Path relative to benchmark/VQA_CL/vqav2/
"conversations": [
{
"from": "human",
"value": "<image>\nWhat is this photo taken looking through?\nAnswer the question using a single word or phrase."
},
{
"from": "gpt",
"value": "net"
}
// ... more conversation turns ...
]
}
// ... more VQA instances ...
]It is recommended to organize your downloaded image data as follows. JSON files should be placed in their respective dataset subdirectories within benchmark/VQA_CL/.
benchmark/
└── VQA_CL/
├── README.md
├── scienceqa/ # Subdirectory for ScienceQA dataset
│ ├── scienceqa_train.json
│ ├── scienceqa_test.json
│ └── scienceqa/ # Image directory for ScienceQA, as referenced by paths in its JSONs
│ └── images/
│ ├── train/
│ │ ├── 1/
│ │ │ └── image.png
│ │ └── ... (other ScienceQA train images)
│ └── test/
│ ├── 5/
│ │ └── image.png
│ └── ... (other ScienceQA test images)
├── vqav2/ # Subdirectory for VQAv2 dataset
│ ├── vqav2_train.json
│ ├── vqav2_test.json
│ └── COCO2014/ # Image directory for COCO, as referenced by paths in VQAv2 JSONs
│ ├── train2014/
│ │ ├── COCO_train2014_000000458752.jpg
│ │ └── ... (other COCO train images)
│ └── val2014/
│ ├── COCO_val2014_000000262148.jpg
│ └── ... (other COCO val images)
└── vizwiz/ # Subdirectory for VizWiz dataset
├── vizwiz_train.json
├── vizwiz_test.json
└── VizWiz/ # Image directory for VizWiz, as referenced by paths in its JSONs
├── train/
│ ├── VizWiz_train_00000000.jpg
│ └── ... (other VizWiz train images)
├── val/
│ ├── VizWiz_val_00000000.jpg
│ └── ... (other VizWiz val images)
└── test/
└── ... (VizWiz test images, if applicable)
Explanation:
- The main JSON annotation files (e.g.,
scienceqa_train.json,vqav2_train.json) are located within dataset-specific subdirectories likebenchmark/VQA_CL/scienceqa/,benchmark/VQA_CL/vqav2/, etc. - The
imagepaths within the JSON files (e.g.,"./scienceqa/images/train/1/image.png"inscienceqa_train.json) are relative to the location of the JSON file itself. - COCO (for VQAv2):
- VQAv2 JSON files (e.g.,
vqav2_train.json) are inbenchmark/VQA_CL/vqav2/. - Image paths like
"./COCO2014/train2014/..."inside these JSONs mean COCO images should be extracted intobenchmark/VQA_CL/vqav2/COCO2014/. - So,
train2014.zipcontents go intobenchmark/VQA_CL/vqav2/COCO2014/train2014/. val2014.zipcontents go intobenchmark/VQA_CL/vqav2/COCO2014/val2014/.
- VQAv2 JSON files (e.g.,
- ScienceQA:
- ScienceQA JSON files (e.g.,
scienceqa_train.json) are inbenchmark/VQA_CL/scienceqa/. - Image paths like
"./scienceqa/images/train/..."inside these JSONs mean ScienceQA images should be extracted such that theimagesdirectory is underbenchmark/VQA_CL/scienceqa/scienceqa/. - So, the downloaded
imagesdirectory (containingtrain/andtest/subfolders) from ScienceQA should be placed atbenchmark/VQA_CL/scienceqa/scienceqa/images/.
- ScienceQA JSON files (e.g.,
- VizWiz:
- VizWiz JSON files (e.g.,
vizwiz_train.json) are inbenchmark/VQA_CL/vizwiz/. - Image paths like
"./VizWiz/train/..."inside these JSONs mean VizWiz images should be extracted intobenchmark/VQA_CL/vizwiz/VizWiz/. - So,
train.zipcontents go intobenchmark/VQA_CL/vizwiz/VizWiz/train/. val.zipcontents go intobenchmark/VQA_CL/vizwiz/VizWiz/val/.test.zipcontents go intobenchmark/VQA_CL/vizwiz/VizWiz/test/.
- VizWiz JSON files (e.g.,
Ensure the image paths in your local setup match this structure for the benchmark to function correctly.