-
Notifications
You must be signed in to change notification settings - Fork 22
Description
I hope you're doing well. I’m currently working with your project and was trying to reproduce the results for LLAVA-NeXT-Video on the ScanQA (val) dataset, as reported in the table (e.g., 46.2 for C, 9.8 for B-4, 9.1 for M, 27.8 for R, and 18.7 for EM@1). However, I’ve been unable to achieve these metrics in my experiments.
Could you please provide more details on how these results were obtained? Specifically, I’d like to know:
The exact configuration or version of LLAVA-NeXT-Video used for testing on ScanQA.
Any preprocessing steps, hyperparameters, or specific settings that might be critical for reproducing these results.
Whether there were any additional fine-tuning or modifications applied to the model before evaluation.
I’d greatly appreciate any guidance you can offer to help me align my results with those in the table. Thank you for your time and for sharing your amazing work!
Best regards,
