The Evaluation Suite of Xiaomi MiMo-VL

To promote rigorous, reproducible, and thinking-oriented evaluation of Vision-Language Models (VLMs), we open-source our evaluation suite for MiMo-VL and beyond.

Built on top of the excellent lmms-eval framework, we introduce several improvements in model integration, evaluation protocol, and task coverage to better support the next generation of reasoning-capable VLMs.

📰 News

[25/08/08] We update our evaluation framework along with the release of MiMo-VL-7B-SFT-2508 and MiMo-VL-7B-RL-2508. New features include:

Additional GUI action benchmarks AndroidControl and CAGUI (evaluated using --model mimo_agent)
Additional evaluation benchmarks on video spatial reasoning (VSI-Bench), physics reasoning (PhysReason), multi-modal long context understanding (MMLongBench), multi-modal instruction following (MM-IFEval);
Enables no_think evaluation by adding model argument disable_thinking_user=True

🔧 Key Features

1. ⚙️ `MiVLLM`: A vLLM-based Model Wrapper for MiMo-VL

We introduce a new MiVLLM model class based on the original VLLM class in lmms-eval, which is tailored for MiMo-VL. Compared to the original implementation, it:

Greatly improves data loading efficiency
Enables fine-grained control over image and video preprocessing

2. 🧠 Adaptation to Thinking VLMs

The original lmms-eval tasks were designed for non-thinking VLMs: they prompt directly for short answers and compare outputs without post-processing. We redesign this process to support reasoning-intensive models:

Introduce a unified \boxed{} output format using the prompt: Put your final answer in \boxed{}.
Extend max_new_tokens to 32768 to allow the model to reason before answering
Automatically extract predictions from the final \boxed{} output

3. 📏 Refined Open-ended Evaluation Metrics

For open-ended tasks such as DocVQA, InfoVQA, ChartQA, and OCRBench, we calculate accuracy using GPT-4o as the evaluator. This improves the fidelity of evaluation for free-form answers and better reflects model capabilities.

4. 🧩 20+ New Tasks for Comprehensive Evaluation

We contribute over 20 new evaluation tasks covering:

General vision-language understanding
Math and logic reasoning
GUI understanding and grounding
Video understanding and reasoning

👉 A complete list of supported tasks is available here.

Usage

Installation

git clone https://github.com/XiaomiMiMo/lmms-eval
cd lmms-eval
pip install -e . && pip uninstall -y opencv-python-headless
pip install -r requirements.txt

Evaluation Script

bash mimovl_docs/eval_mimo_vl.sh

Reproduction of MiMo-VL-7B-SFT results in our technical report can be found here.

Citations

@misc{coreteam2025mimovl,
      title={MiMo-VL Technical Report}, 
      author={{Xiaomi LLM-Core Team}},
      year={2025},
      url={https://github.com/XiaomiMiMo/MiMo-VL}, 
}

@misc{mimovleval2025,
    title={The Evaluation Suite of Xiaomi MiMo-VL},
    author={LLM-Core Xiaomi},
    year={2025},
    url={https://github.com/XiaomiMiMo/lmms-eval}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,220 Commits
.github		.github
docs		docs
lmms_eval		lmms_eval
mimovl_docs		mimovl_docs
miscs		miscs
tools		tools
visualizer		visualizer
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
README_lmms_eval.md		README_lmms_eval.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
test_parse.py		test_parse.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Evaluation Suite of Xiaomi MiMo-VL

📰 News

🔧 Key Features

1. ⚙️ `MiVLLM`: A vLLM-based Model Wrapper for MiMo-VL

2. 🧠 Adaptation to Thinking VLMs

3. 📏 Refined Open-ended Evaluation Metrics

4. 🧩 20+ New Tasks for Comprehensive Evaluation

Usage

Installation

Evaluation Script

Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

The Evaluation Suite of Xiaomi MiMo-VL

📰 News

🔧 Key Features

1. ⚙️ MiVLLM: A vLLM-based Model Wrapper for MiMo-VL

2. 🧠 Adaptation to Thinking VLMs

3. 📏 Refined Open-ended Evaluation Metrics

4. 🧩 20+ New Tasks for Comprehensive Evaluation

Usage

Installation

Evaluation Script

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. ⚙️ `MiVLLM`: A vLLM-based Model Wrapper for MiMo-VL

Packages