An Online Evaluation Framework for Mobile GUI Agents

Requirements and Installation

This work has been tested in the following environment:

python == 3.10
uiautomator2
PIL
cv2
numpy
csv

Note for Mobile Agent V2

If you plan to use Mobile Agent V2, please follow the official environment configuration instructions provided here.

Supported Models

Model	Model Name	Organization
UI-TARS-1.5	`uitars_1_5`	Bytedance
GPT-4o	`gpt4o`	OpenAI
M3A(from AndroidWorld)	`m3a`	Google
Mobile Agent V2	`mobileagentv2`	Qwen

Model Deployment

Trajectory Collection

python3 run.py --mode interact --config config/interact.conf --subset base --output results/uitars1.5_1114

parameters

mode: [interact, evaluate] Running mode. "interact" means the agent will interact with the environment to collect GUI trajectories; "evaluate" means it will evaluate existing GUI trajectories.
config: config file path Path to the configuration file that contains all other parameter settings.
subset: [base,long-tail,long-horizon,gui-reasoning,noise-robust] Selects which subset of tasks to run from the benchmark. Different subsets test different agent abilities.
output: output path Directory path where the collected trajectory data will be saved. (e.g., "results/round1").

Config file settings

device

id (str): The device id of physical device. Input "adb devices" in the terminal to obtain device id.

model

name (str): Name of the model/agent used for trajectory collection (e.g., "uitars_1_5", "gpt-4o").
url (str): Input your own API URL.

inference

max_steps (int): Maximum number of interaction steps allowed for completing a single task before timing out.
back_times (int): Number of retry attempts allowed when the agent encounters an error or dead end.
sleep_seconds_per_act (int): Waiting time in seconds between consecutive actions.

task

task_file (str): Path to the file containing the list of tasks to be executed for trajectory collection.
output (str): Specific output path for saving individual task results and trajectories.

Evaluation

python3 run.py --mode evaluate --config config/evaluate.conf

Config file settings

evaluation

type (str): xpath
trajectory (str): Path to GUI trajectories to be evaluated. (e.g., "results/round1").
rule (str): Path to the rule file. (e.g., "data/top12.csv").

License

The dataset of this project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.

The source code of the this is licensed under the Apache 2.0 license.

Summary of Terms

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made.
NonCommercial: You may not use the material for commercial purposes.
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

License Badge

Citation

If you find the resources in this repository helpful, please cite as:

@article{wu2026mobilebench,
  title={MobileBench-OL: A Comprehensive Chinese Benchmark for Evaluating Mobile GUI Agents in Real-World Environment},
  author={Wu, Qinzhuo and Yang, Zhizhuo and Li, Hanhao and Gao, Pengzhi and Liu, Wei and Luan, Jian},
  journal={arXiv preprint arXiv:2601.20335},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
MobileAgent_new/Mobile-Agent-v2		MobileAgent_new/Mobile-Agent-v2
assets		assets
config		config
data		data
mobilebench		mobilebench
.flake8		.flake8
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
LICENSE		LICENSE
README.md		README.md
apk_install.py		apk_install.py
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Online Evaluation Framework for Mobile GUI Agents

Requirements and Installation

Note for Mobile Agent V2

Supported Models

Model Deployment

Trajectory Collection

Config file settings

Evaluation

Config file settings

License

Summary of Terms

License Badge

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

An Online Evaluation Framework for Mobile GUI Agents

Requirements and Installation

Note for Mobile Agent V2

Supported Models

Model Deployment

Trajectory Collection

Config file settings

Evaluation

Config file settings

License

Summary of Terms

License Badge

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages