Different evaluation results for UI-TARS-1.5-7B

We locally evaluated UI-TARS-1.5-7B, and there was a significant difference between the evaluation results and the data in the table.  Our evaluated average accuracy is 73.16 for grounding. Basic and advanced accuracies for fine-grained categories are different as well. 

[uitars-1.5-7b-local_GUIElementGrounding_score.json](https://github.com/user-attachments/files/21920724/uitars-1.5-7b-local_GUIElementGrounding_score.json)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different evaluation results for UI-TARS-1.5-7B #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Different evaluation results for UI-TARS-1.5-7B #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions