To promote rigorous, reproducible, and thinking-oriented evaluation of Vision-Language Models (VLMs), we open-source our evaluation suite for MiMo-VL and beyond.
Built on top of the excellent lmms-eval framework, we introduce several improvements in model integration, evaluation protocol, and task coverage to better support the next generation of reasoning-capable VLMs.
[25/08/08] We update our evaluation framework along with the release of MiMo-VL-7B-SFT-2508 and MiMo-VL-7B-RL-2508. New features include:
- Additional GUI action benchmarks AndroidControl and CAGUI (evaluated using
--model mimo_agent) - Additional evaluation benchmarks on video spatial reasoning (VSI-Bench), physics reasoning (PhysReason), multi-modal long context understanding (MMLongBench), multi-modal instruction following (MM-IFEval);
- Enables no_think evaluation by adding model argument
disable_thinking_user=True
We introduce a new MiVLLM model class based on the original VLLM class in lmms-eval, which is tailored for MiMo-VL. Compared to the original implementation, it:
- Greatly improves data loading efficiency
- Enables fine-grained control over image and video preprocessing
The original lmms-eval tasks were designed for non-thinking VLMs: they prompt directly for short answers and compare outputs without post-processing. We redesign this process to support reasoning-intensive models:
- Introduce a unified
\boxed{}output format using the prompt: Put your final answer in\boxed{}. - Extend
max_new_tokensto 32768 to allow the model to reason before answering - Automatically extract predictions from the final
\boxed{}output
For open-ended tasks such as DocVQA, InfoVQA, ChartQA, and OCRBench, we calculate accuracy using GPT-4o as the evaluator. This improves the fidelity of evaluation for free-form answers and better reflects model capabilities.
We contribute over 20 new evaluation tasks covering:
- General vision-language understanding
- Math and logic reasoning
- GUI understanding and grounding
- Video understanding and reasoning
👉 A complete list of supported tasks is available here.
git clone https://github.com/XiaomiMiMo/lmms-eval
cd lmms-eval
pip install -e . && pip uninstall -y opencv-python-headless
pip install -r requirements.txtbash mimovl_docs/eval_mimo_vl.shReproduction of MiMo-VL-7B-SFT results in our technical report can be found here.
@misc{coreteam2025mimovl,
title={MiMo-VL Technical Report},
author={{Xiaomi LLM-Core Team}},
year={2025},
url={https://github.com/XiaomiMiMo/MiMo-VL},
}
@misc{mimovleval2025,
title={The Evaluation Suite of Xiaomi MiMo-VL},
author={LLM-Core Xiaomi},
year={2025},
url={https://github.com/XiaomiMiMo/lmms-eval}
}