New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Training VLM #74

Open

faithfulnguyen opened this issue Feb 18, 2025 · 1 comment

faithfulnguyen commented Feb 18, 2025

Hi thanks for sharing your work, I have a small dataset including images and descriptions, Can I use this code for training on my dataset?

Collaborator

SumanthRH commented Feb 18, 2025

Hi!

Could you highlight which recipe you're trying to extend with image data? We've summarized all the recipes for the different models here:

Currently, the training code is made up of two forks :

LlamaFactory - Used for Sky-T1-32B-Preview and Sky-T1-32B-Flash. Llamafactory supports using image data.
VERL - Used for Sky-T1-mini. AFAIK VeRL is text-only, so you might have to customize the code for working with VLMs.

We're also actively working on cleaning up our training code.

Hope that helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment