How to input multiple images using VLM-R1 #135

WuYeeh · 2025-03-08T14:00:35Z

This is a very good job, but I seem to be in trouble. When I use VLM-R1 to train Qwen2.5VL for tasks which need input multiple images, the images seem cannot be input into the Trainer. LLaMA-Factory is easy to input multiple images. How should I modify the training code in VLM-R1? Looking forward to your reply, thank you.

snakeztc · 2025-03-09T13:52:57Z

We are working on supporting this feature. Stay tuned.

gmyFighting · 2025-03-10T12:25:26Z

In grpo_trainer.py, the 'image' variable is a list. You simply need to add all your images to this list.

WuYeeh · 2025-03-10T12:54:15Z

@gmyFighting Thanks, I get it. But I want to know if I want to place my images in the designated position of the prompt (different images in different position), can I add an identifier at the desired location like LLaMA-Factory?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to input multiple images using VLM-R1 #135

How to input multiple images using VLM-R1 #135

WuYeeh commented Mar 8, 2025

snakeztc commented Mar 9, 2025

gmyFighting commented Mar 10, 2025

WuYeeh commented Mar 10, 2025

How to input multiple images using VLM-R1 #135

How to input multiple images using VLM-R1 #135

Comments

WuYeeh commented Mar 8, 2025

snakeztc commented Mar 9, 2025

gmyFighting commented Mar 10, 2025

WuYeeh commented Mar 10, 2025