The data structure at the Hugging Face dataset is as follows.
data/
├── 3_back/
│ ├── 006_0.jpg
│ ├── 006_1.jpg
│ ├── 006_2.jpg
│ ├── 006_prompt.txt
│ ├── 014_0.jpg
│ ├── 014_1.jpg
│ ├── 014_2.jpg
│ ├── 014_prompt.txt
│ └── ...
├── 3_global/
│ └── ...
├── 3_local/
│ └── ...
└── ...
Download MultiBanana dataset by
git clone https://huggingface.co/datasets/kohsei/MultiBanana-Benchmark ./data
git clone [email protected]:matsuolab/multibanana.git
cd multibanana
conda create -n multibanana python=3.12
conda activate multibanana
pip install -r requirements.txtGenerated images are expected to be saved in the same directory with the _generated suffix.
data/
├── 3_back/
│ ├── 006_0.jpg
│ ├── 006_1.jpg
│ ├── 006_2.jpg
│ ├── 006_prompt.txt
│ ├── 006_generated.jpg
│ ├── 014_0.jpg
│ ├── 014_1.jpg
│ ├── 014_2.jpg
│ ├── 014_prompt.txt
│ ├── 014_generated.jpg
│ └── ...
├── 3_global/
│ └── ...
├── 3_local/
│ └── ...
└── ...
We use gemini-2.5-flash via the Google GenAI SDK, and gpt-5 via the OpenAI SDK.
Please set your API key in .env as follows
OPENAI_API_KEY=...
GEMINI_API_KEY=...
Run
# Gemini
python judge.py --base_dir ./data --model gemini --batch_size 32 --output_dir ./results
# GPT
python judge.py --base_dir ./data --model gpt --batch_size 32 --output_dir ./resultsThis will evaluate all generated images and save the results in {number}_{model}_judge.txt files (e.g., 006_gemini_judge.txt).
Apache-2.0 license
This benchmark partially incorporates a subset of images from the LAION-5B dataset. We acknowledge and thank the LAION team for making such a valuable large-scale dataset openly available to the research community.
@inproceedings{oshima2025multibanana,
title={MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation},
author={Yuta Oshima and Daiki Miyake and Kohsei Matsutani and Yusuke Iwasawa and Masahiro Suzuki and Yutaka Matsuo and Hiroki Furuta},
year={2025}
eprint={2511.22989},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.22989},
}