-
Notifications
You must be signed in to change notification settings - Fork 2.2k
feat: add dolphin #1772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: add dolphin #1772
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
03ce2d2
to
f16de96
Compare
@geoHeil Thanks for the addition. Looks almost ready for me. Note that here we also need
|
@geoHeil I think this is not how Dolphin works. In essence, you need to do a double pass (see here):
In this sense, Dolphin allows to obtain the bbox-es, something the other VLM can not do. |
Are already well working prompts for these 2 tasks clear? Should these be added (either as examples or somewhre else)? |
Signed-off-by: Georg Heiler <[email protected]>
Signed-off-by: Georg Heiler <[email protected]>
Signed-off-by: Georg Heiler <[email protected]>
63dc7e7
to
448c932
Compare
Yes, if you look here,
this is what is essentially happening. In a sense, this breaks a bit with our current VLM pipeline, that will do a page with a single prediction. It would be good to actually have a "two-shot VLM pipeline", which would support this. @cau-git @dolfim-ibm fyi: ^^ |
How do we move forward here? Should we get this merged? And then explore a 2nd PR for a separate VLM pipeline? |
I want to check it out and run it myself first. My main worry is that we dont showcase the rich output that Dolphin currently provides (with layout boxes). But, this PR might be good enough as a starting point. I would just love to go all the credit to the team that built the Dolphin model. |
@geoHeil I would like to merge this asap, but I see we fail the MyPy (https://github.com/docling-project/docling/actions/runs/15699814942/job/44748636363?pr=1772), Could you fix this quickly? |
Signed-off-by: Georg Heiler <[email protected]>
✅ DCO Check Passed Thanks @geoHeil, all your commits are properly signed off. 🎉 |
pleaese re-run CI - should work now |
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
Issue resolved by this Pull Request:
Resolves #1622
Checklist: