Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IG for VQA using VLMs from transformers #1494

Open
nicokossmann opened this issue Jan 27, 2025 · 0 comments
Open

IG for VQA using VLMs from transformers #1494

nicokossmann opened this issue Jan 27, 2025 · 0 comments

Comments

@nicokossmann
Copy link

nicokossmann commented Jan 27, 2025

❓ Questions and Help

Hi there @sarahtranfb πŸ‘‹,

I would like to use IG for Image and Text to display the results of VLM models such as LLava-OneVision or Phi-3.5-Vision-instruct on a VQA task with multiple images.
I have already created a Google Colab notebook for a simple model from Transformers. However, I cannot pass the pixel_values and input_embeds together when using the llava model. Do you have any ideas how to overcome this?
Also, the results for the dandelin/vilt-b32-finetuned-vqa model don't look quite right somehow.

Image

I have also described the problem in the pytorch forum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant