Skip to content

matsuolab/multibanana

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation

arXiv paper 2511.22989 Build

Task Example

Dataset

The data structure at the Hugging Face dataset is as follows.

data/
├── 3_back/
│   ├── 006_0.jpg
│   ├── 006_1.jpg
│   ├── 006_2.jpg
│   ├── 006_prompt.txt
│   ├── 014_0.jpg
│   ├── 014_1.jpg
│   ├── 014_2.jpg
│   ├── 014_prompt.txt
│   └── ...
├── 3_global/
│   └── ...
├── 3_local/
│   └── ...
└── ...

Download MultiBanana dataset by

git clone https://huggingface.co/datasets/kohsei/MultiBanana-Benchmark ./data

Setup

git clone [email protected]:matsuolab/multibanana.git
cd multibanana

conda create -n multibanana python=3.12
conda activate multibanana

pip install -r requirements.txt

Evaluation

Generated images are expected to be saved in the same directory with the _generated suffix.

data/
├── 3_back/
│   ├── 006_0.jpg
│   ├── 006_1.jpg
│   ├── 006_2.jpg
│   ├── 006_prompt.txt
│   ├── 006_generated.jpg
│   ├── 014_0.jpg
│   ├── 014_1.jpg
│   ├── 014_2.jpg
│   ├── 014_prompt.txt
│   ├── 014_generated.jpg
│   └── ...
├── 3_global/
│   └── ...
├── 3_local/
│   └── ...
└── ...

We use gemini-2.5-flash via the Google GenAI SDK, and gpt-5 via the OpenAI SDK.

Please set your API key in .env as follows

OPENAI_API_KEY=...
GEMINI_API_KEY=...

Run

# Gemini
python judge.py --base_dir ./data --model gemini --batch_size 32 --output_dir ./results

# GPT
python judge.py --base_dir ./data --model gpt --batch_size 32 --output_dir ./results

This will evaluate all generated images and save the results in {number}_{model}_judge.txt files (e.g., 006_gemini_judge.txt).

License

Apache-2.0 license

Acknowledgement

This benchmark partially incorporates a subset of images from the LAION-5B dataset. We acknowledge and thank the LAION team for making such a valuable large-scale dataset openly available to the research community.

Citation

@inproceedings{oshima2025multibanana,
  title={MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation},
  author={Yuta Oshima and Daiki Miyake and Kohsei Matsutani and Yusuke Iwasawa and Masahiro Suzuki and Yutaka Matsuo and Hiroki Furuta},
  year={2025}
  eprint={2511.22989},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2511.22989},
}

About

MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation

Topics

Resources

Stars

Watchers

Forks

Languages