Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization

📃 Paper | 🤗 Dataset | 💻 Code

📖 Overview

Materials characterization plays a key role in understanding the processing–microstructure–property relationships that guide material design and optimization. While multimodal large language models (MLLMs) have shown promise in generative and predictive tasks, their ability to interpret real-world characterization imaging data remains underexplored.

MatCha is the first benchmark designed specifically for materials characterization image understanding. It provides a comprehensive evaluation framework that reflects real challenges faced by materials scientists.

✨ Key Features

1,500 expert-level questions focused on materials characterization.
Covers 4 stages of materials research across 21 distinct tasks.
Tasks designed to mimic real-world scientific challenges.
Provides the first systematic evaluation of MLLMs on materials characterization.

📁 Repository Structure

MatCha/
├── MatCha_Data/data/                     # need to be downloaded from huggingface
└── src
    ├── lf_model_cfg/
    ├── eval.py
    ├── models.py
    ├── score.py
    └── utils.py

🗃️ Dataset Access

The dataset is available on 🤗 MatCha under the license.

🚀 Quick Start

Follow the steps below to get started with the evaluation.

1. Clone the Repository

git clone https://github.com/FreedomIntelligence/MatCha
cd MatCha

2. Download the Dataset

huggingface-cli download \
    --repo-type dataset \
    --resume-download \
    ./FreedomIntelligence/MatCha \
    --local-dir MatCha_Data

This will download the complete dataset (files, images) into MatCha_Data.

3. Run Evaluation

cd ./src/

python eval.py \
    --model gpt-4o \
    --method zero-shot

python score.py \
    --output_path path/to/output/file

📄 Citation

If you find our work helpful, please use the following citation.

@misc{lai2025matcha,
      title={Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization}, 
      author={Zhengzhao Lai and Youbin Zheng and Zhenyang Cai and Haonan Lyu and Jinpu Yang and Hongqing Liang and Yan Hu and Benyou Wang},
      year={2025},
      eprint={2509.09307},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.09307}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization

📃 Paper | 🤗 Dataset | 💻 Code

📖 Overview

✨ Key Features

📁 Repository Structure

🗃️ Dataset Access

🚀 Quick Start

1. Clone the Repository

2. Download the Dataset

3. Run Evaluation

📄 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

FreedomIntelligence/MatCha

Folders and files

Latest commit

History

Repository files navigation

Can Multimodal LLMs See Materials Clearly? A Multimodal Benchmark on Materials Characterization

📃 Paper | 🤗 Dataset | 💻 Code

📖 Overview

✨ Key Features

📁 Repository Structure

🗃️ Dataset Access

🚀 Quick Start

1. Clone the Repository

2. Download the Dataset

3. Run Evaluation

📄 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages