aitw数据集在Qwen VL上的评测结果

作者你好，
非常感谢您的工作和数据集，想咨询一下你们在评估QwenVL在sequential Action task的上的表现时是如何构造模型推理时的prompt/instruction呢，感觉直接通过aitw_test.py的代码测试QwenVL，会出现输出格式和action space的定义的格式不同，无法正常评估模型的性能。请问您们在评测时有限制输出格式相关的prompt吗？
下图为本地推理时，qwenVL的输入与输出：
<img width="778" alt="image" src="https://github.com/user-attachments/assets/31849e2e-bd83-402e-8216-81263ade493d">
<img width="778" alt="image" src="https://github.com/user-attachments/assets/15678d7a-b79a-479c-8294-ede728b61912">
<img width="778" alt="image" src="https://github.com/user-attachments/assets/cb243e7f-3ab7-4832-9f84-ac71840385e9">


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aitw数据集在Qwen VL上的评测结果 #43

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

aitw数据集在Qwen VL上的评测结果 #43

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions