You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -38,29 +54,118 @@ This work has been tested in the following environment:
38
54
39
55
Please follow the [instructions](./data/README.md) to download and preprocess the datasets.
40
56
41
-
## Data & Model Registration
57
+
## Development
58
+
59
+
### Configuration
42
60
43
-
Please update the files [dataset_info.json](./config/dataset_info.json) and [model_info.json](./config/model_info.json) with your own information.
61
+
Please update the configuration files or objs with your own information:
62
+
-**[dataset_info.json](./config/dataset_info.json)**: Configure dataset paths and settings
63
+
-**[guieval/config.py](./guieval/config.py)**: `DATASET` for clear type notation and static checking
64
+
-**[model_paths.json](./config/model_paths.json)**: Configure default model paths for supported models
65
+
66
+
### Model Core Implementation
67
+
-**[ur_model.py](./guieval/models/ur_model.py)**: Implement ur model's core methods
68
+
-**[__init__.py](./guieval/models/__init__.py)**: Register ur model
44
69
45
70
## Evaluation
46
71
72
+
### Quick Start
73
+
74
+
You can use the provided `run.sh` script as a template, or run directly with Python:
75
+
47
76
```bash
48
-
python3 run.py \
49
-
--model agentcpm-gui-8b \
50
-
--dataset cagui_agent \
51
-
--mode all \
52
-
--outputs outputs/agentcpm-gui-8b/cagui_agent \
53
-
--use-vllm
77
+
python3 run.py all \
78
+
--setup.datasets cagui_agent \
79
+
--setup.model.model_name agentcpm-gui-8b \
80
+
--setup.eval_mode offline_rule \
81
+
--setup.vllm_mode online
54
82
```
55
-
**Arguments:**
56
-
-`--model (str)`: Set the model name that is supported in GUIEvalKit (defined in `config/model_info.json`).
57
-
-`--dataset (str)`: Set the benchmark name that is supported in GUIEvalKit (defined in `config/dataset_info.json`).
58
-
-`--mode (str, default to 'all', choices are ['all', 'infer', 'eval'])`: When `mode` set to `all`, will perform both inference and evaluation; when set to `infer`, will only perform the inference; when set to `eval`, will only perform the evaluation.
59
-
-`--outputs (str, default to './outputs')`: The directory to save evaluation results.
60
-
-`--batch-size (int, default to 64)`: The batch size used for inference.
61
-
-`--no-think`: Use this argument if you want to disable the thinking mode (if applicable).
62
-
-`--use-vllm`: Use this argument if you want to inference with `vllm`, otherwise `transformers` will be adopted.
63
-
-`--over-size`: Use this argument for deploying large models on four GPUs and inferring with `vllm`.
**Please check [here](./docs/results.md) for the detailed evaluation results.**
66
171
@@ -74,4 +179,4 @@ To add new GUI agents and benchmarks to GUIEvalKit, please refer to the [Develop
74
179
75
180
## Acknowledgement
76
181
77
-
This repo benefits from [AgentCPM-GUI/eval](https://github.com/OpenBMB/AgentCPM-GUI/tree/main/eval) and [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). Thanks for their wonderful works.
182
+
This repo benefits from [AgentCPM-GUI/eval](https://github.com/OpenBMB/AgentCPM-GUI/tree/main/eval) and [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). Thanks for their wonderful works.
0 commit comments