How to evaluate the performance of kimi-vl on GUI tasks？

Thank you for your outstanding work and for open-sourcing such a great model!

I noticed that the technical report included the OSWorld benchmark. Would it be possible to release the prompts that were used for these evaluations?

Additionally, I would like to know whether the model supports deployment on Android platforms (such as the AndroidWorld benchmark) and on the web (such as the WebVoyager benchmark). If so, could you please recommend some prompts for these scenarios?

Thank you very much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to evaluate the performance of kimi-vl on GUI tasks？ #65

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to evaluate the performance of kimi-vl on GUI tasks？ #65

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions