-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for DeepseekAI's DeepseekVL #36248
base: main
Are you sure you want to change the base?
Conversation
@zucchini-nlp , @Rocketknight1, @Cyrilvallez The from transformers import SamConfig, SamModel
config = SamConfig()
model = SamModel(config) I think that we should rename Otherwise, we would have to copy all the classes that build If you think having a Btw, final results would look like this from transformers import SamVisionConfig, SamVisionModel
config = SamVisionConfig()
model = SamVisionModel(config) and |
@geetu040 we had similar situation with ideficsVision afair. Yes, in that case, we can just make it public and add in the docs. Renaming though would be breaking, imo we can leave name as is |
@zucchini-nlp is it okay to do it in the same PR? or should I create a new one |
@geetu040 imo a new PR will make it easier for us to iterate and review |
Hi @zucchini-nlp, I am working on the Can you please answer these 2 questions:
|
@geetu040 no, that is not expected to have different shapes. Usually using sdpa attention means that no I see that the weights are calculated on top of SDPA by manual matmul of key and query, which imo defeats the purpose of using SDPA in the first place. Can you remove the returned attention and raise warning similar to what is done in ViT? |
@zucchini-nlp sure I'll do that. |
What does this PR do?
Fixes #36110
This PR adds DeepseekAI's DeepseekVL model to Hugging Face Transformers.
DeepseekVL is an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. DeepSeek-VL possesses general multimodal understanding capabilities, capable of processing logical diagrams, web pages, formula recognition, scientific literature, natural images, and embodied intelligence in complex scenarios.
Relevant Links
CC: @Benjamin-eecs, @RERV (github contributors of deepseek-ai/DeepSeek-VL)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@ArthurZucker, @Rocketknight1, @Cyrilvallez, @zucchini-nlp
TODOs