Instruction Fine-Tuning with Multimodal LLMs #3647
Unanswered
adrielkuek
asked this question in
Q&A
Replies: 1 comment 1 reply
-
Hey @adrielkuek, I did have a PR exploring LLaVA integration a few months back. Things have come along a bit since then, and now there's IDEFICS, which would be a pretty striaghtforward one to add to Ludwig. Let me file an issue to track this. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, just curious to know if anybody has tried using Ludwig for instruction tuning with multimodal LLMs? There have been a couple of promising works, extending base LLM capabilities to encode visual information to achieve multimodal understanding (OpenFlamingo, Llava, Mini-GPT4, OTTER, InstructBLIP etc.). We are interested in using Ludwig training optimisations to try to lower the cost of fine-tuning MM-LLMs, and would be keen to understand and learn more from folks who have attempted something similar.
Thank you very much for the advice!
Beta Was this translation helpful? Give feedback.
All reactions