Skip to content

How to evaluate the performance of kimi-vl on GUI tasks? #65

@bill4689

Description

@bill4689

Thank you for your outstanding work and for open-sourcing such a great model!

I noticed that the technical report included the OSWorld benchmark. Would it be possible to release the prompts that were used for these evaluations?

Additionally, I would like to know whether the model supports deployment on Android platforms (such as the AndroidWorld benchmark) and on the web (such as the WebVoyager benchmark). If so, could you please recommend some prompts for these scenarios?

Thank you very much!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions