Unable to Reproduce LLAVA-NeXT-Video Metrics on ScanQA

I hope you're doing well. I’m currently working with your project and was trying to reproduce the results for LLAVA-NeXT-Video on the ScanQA (val) dataset, as reported in the table (e.g., 46.2 for C, 9.8 for B-4, 9.1 for M, 27.8 for R, and 18.7 for EM@1). However, I’ve been unable to achieve these metrics in my experiments.

Could you please provide more details on how these results were obtained? Specifically, I’d like to know:

The exact configuration or version of LLAVA-NeXT-Video used for testing on ScanQA.
Any preprocessing steps, hyperparameters, or specific settings that might be critical for reproducing these results.
Whether there were any additional fine-tuning or modifications applied to the model before evaluation.
I’d greatly appreciate any guidance you can offer to help me align my results with those in the table. Thank you for your time and for sharing your amazing work!

Best regards,

![Image](https://github.com/user-attachments/assets/2c2df560-79b0-453c-a47d-f2d9cbb75b02)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to Reproduce LLAVA-NeXT-Video Metrics on ScanQA #31

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Unable to Reproduce LLAVA-NeXT-Video Metrics on ScanQA #31

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions