Materials characterization plays a key role in understanding the processing–microstructure–property relationships that guide material design and optimization. While multimodal large language models (MLLMs) have shown promise in generative and predictive tasks, their ability to interpret real-world characterization imaging data remains underexplored.
MatCha is the first benchmark designed specifically for materials characterization image understanding. It provides a comprehensive evaluation framework that reflects real challenges faced by materials scientists.
-
1,500 expert-level questions focused on materials characterization.
-
Covers 4 stages of materials research across 21 distinct tasks.
-
Tasks designed to mimic real-world scientific challenges.
-
Provides the first systematic evaluation of MLLMs on materials characterization.
MatCha/
├── MatCha_Data/data/ # need to be downloaded from huggingface
└── src
├── lf_model_cfg/
├── eval.py
├── models.py
├── score.py
└── utils.py
The dataset is available on 🤗 MatCha under the license.
Follow the steps below to get started with the evaluation.
git clone https://github.com/FreedomIntelligence/MatCha
cd MatChahuggingface-cli download \
--repo-type dataset \
--resume-download \
./FreedomIntelligence/MatCha \
--local-dir MatCha_DataThis will download the complete dataset (files, images) into MatCha_Data.
cd ./src/
python eval.py \
--model gpt-4o \
--method zero-shot
python score.py \
--output_path path/to/output/fileIf you find our work helpful, please use the following citation.
@misc{lai2025matcha,
title={Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization},
author={Zhengzhao Lai and Youbin Zheng and Zhenyang Cai and Haonan Lyu and Jinpu Yang and Hongqing Liang and Yan Hu and Benyou Wang},
year={2025},
eprint={2509.09307},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.09307},
}