Skip to content

Add the new Multi-Modal model of mistral AI: mistral-small-3.1-24b & pixtral-12b #3535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
SuperPat45 opened this issue Sep 12, 2024 · 10 comments
Labels
enhancement New feature or request roadmap

Comments

@SuperPat45
Copy link

SuperPat45 commented Sep 12, 2024

Add the new Multi-Modal model of mistral AI: mistral-small-3.1-24b and pixtral-12b:

https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503
https://huggingface.co/mistral-community/pixtral-12b-240910

@SuperPat45 SuperPat45 added the enhancement New feature or request label Sep 12, 2024
@AlexM4H
Copy link

AlexM4H commented Sep 13, 2024

Since yesterday vllm has internVL2 support. :-)

vllm-project/vllm/releases/tag/v0.6.1

@mudler mudler added the roadmap label Sep 13, 2024
@mudler
Copy link
Owner

mudler commented Sep 13, 2024

I guess that would work already with llama.cpp GGUF models if/when is getting supported in there ( see also ggml-org/llama.cpp#9440 ).

I'd change the focus of this one to be more generic and add support for multimodal with vLLM, examples:

https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_pixtral.py
https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language_multi_image.py

@AlexM4H
Copy link

AlexM4H commented Sep 26, 2024

vllm already has llama 3.2 support vllm-project/vllm#8811

Georgi wrote two weeks ago:
"Not much has changes since the issue was created. We need contributions to improve the existing vision code and people to maintain it. There is interest to reintroduce full multimodal support, but there are other things with higher priority that are currently worked upon by the core maintainers of the project."
(ggml-org/llama.cpp#8010 (comment))

@mudler
Copy link
Owner

mudler commented Sep 26, 2024

See also: ggml-org/llama.cpp#9455

@AlexM4H
Copy link

AlexM4H commented Sep 26, 2024

BTW: "(Coming very soon) 11B and 90B Vision models

11B and 90B models support image reasoning use cases, such as document-level understanding including charts and graphs and captioning of images."

(https://ollama.com/blog/llama3.2)

@mudler
Copy link
Owner

mudler commented Sep 26, 2024

BTW: "(Coming very soon) 11B and 90B Vision models

11B and 90B models support image reasoning use cases, such as document-level understanding including charts and graphs and captioning of images."

(https://ollama.com/blog/llama3.2)

that would be interesting to see given upstream(llama.cpp) is still working on it: ggml-org/llama.cpp#9643

@AlexM4H
Copy link

AlexM4H commented Sep 26, 2024

It seems they work independently on that ollama/ollama#6963

@mudler
Copy link
Owner

mudler commented Sep 26, 2024

It seems they work independently on that ollama/ollama#6963

that looks only golang-side of things to fit the images. The real backend changes seems to be in ollama/ollama#6965

@AlexM4H
Copy link

AlexM4H commented Sep 26, 2024

It seems they work independently on that ollama/ollama#6963

that looks only golang-side of things to fit the images. The real backend changes seems to be in ollama/ollama#6965

Oh, yes. Wrong link.

@SuperPat45 SuperPat45 changed the title Add the new Multi-Modal model of mistral AI: pixtral-12b Add the new Multi-Modal model of mistral AI: mistral-small-3.1-24b & pixtral-12b Apr 8, 2025
@SuperPat45
Copy link
Author

Mistral-small-3.1 with vision is now supported in ollama in this PR: ollama/ollama#10099

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request roadmap
Projects
None yet
Development

No branches or pull requests

3 participants