Instruction Fine-Tuning with Multimodal LLMs #3647

adrielkuek · 2023-09-21T04:30:55Z

adrielkuek
Sep 21, 2023

Hi, just curious to know if anybody has tried using Ludwig for instruction tuning with multimodal LLMs? There have been a couple of promising works, extending base LLM capabilities to encode visual information to achieve multimodal understanding (OpenFlamingo, Llava, Mini-GPT4, OTTER, InstructBLIP etc.). We are interested in using Ludwig training optimisations to try to lower the cost of fine-tuning MM-LLMs, and would be keen to understand and learn more from folks who have attempted something similar.

Thank you very much for the advice!

tgaddair · 2023-09-22T23:10:10Z

tgaddair
Sep 22, 2023
Maintainer

Hey @adrielkuek, I did have a PR exploring LLaVA integration a few months back. Things have come along a bit since then, and now there's IDEFICS, which would be a pretty striaghtforward one to add to Ludwig. Let me file an issue to track this.

1 reply

tgaddair Sep 22, 2023
Maintainer

#3656

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instruction Fine-Tuning with Multimodal LLMs #3647

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Instruction Fine-Tuning with Multimodal LLMs #3647

adrielkuek Sep 21, 2023

Replies: 1 comment · 1 reply

tgaddair Sep 22, 2023 Maintainer

tgaddair Sep 22, 2023 Maintainer

adrielkuek
Sep 21, 2023

Replies: 1 comment 1 reply

tgaddair
Sep 22, 2023
Maintainer

tgaddair Sep 22, 2023
Maintainer