feat: add dolphin #1772

geoHeil · 2025-06-14T06:51:03Z

Issue resolved by this Pull Request:
Resolves #1622

Checklist:

Documentation has been updated, if necessary.
Examples have been added, if necessary.
Tests have been added, if necessary.

mergify · 2025-06-14T06:51:36Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

docling/datamodel/pipeline_options_vlm_model.py

dolfim-ibm · 2025-06-16T06:40:15Z

@geoHeil Thanks for the addition. Looks almost ready for me.

Note that here we also need

the commits to be signed off
the code should be formatted with the pre-commit hooks, i.e. uv run pre-commit run --all-files.

PeterStaar-IBM · 2025-06-16T09:15:05Z

@geoHeil I think this is not how Dolphin works. In essence, you need to do a double pass (see here):

Once run to get the layout
- the parse the layout to obtain bbox and label
- crop the images of the original images
Run the model again (with different prompt) to obtain the OCR-ed part

In this sense, Dolphin allows to obtain the bbox-es, something the other VLM can not do.

geoHeil · 2025-06-16T22:05:44Z

@geoHeil I think this is not how Dolphin works. In essence, you need to do a double pass (see here):
1. Once run to get the layout
   
   * the parse the layout to obtain bbox and label
   * crop the images of the original images

2. Run the model again (with different prompt) to obtain the OCR-ed part
In this sense, Dolphin allows to obtain the bbox-es, something the other VLM can not do.

Are already well working prompts for these 2 tasks clear? Should these be added (either as examples or somewhre else)?

Signed-off-by: Georg Heiler <[email protected]>

PeterStaar-IBM · 2025-06-17T11:16:28Z

@geoHeil I think this is not how Dolphin works. In essence, you need to do a double pass (see here):
1. Once run to get the layout
   
   * the parse the layout to obtain bbox and label
   * crop the images of the original images

2. Run the model again (with different prompt) to obtain the OCR-ed part
In this sense, Dolphin allows to obtain the bbox-es, something the other VLM can not do.
Are already well working prompts for these 2 tasks clear? Should these be added (either as examples or somewhre else)?

Yes, if you look here,

parse layout: https://github.com/bytedance/Dolphin/blob/master/demo_page.py#L63
proces elements: https://github.com/bytedance/Dolphin/blob/master/demo_page.py#L94

this is what is essentially happening.

In a sense, this breaks a bit with our current VLM pipeline, that will do a page with a single prediction. It would be good to actually have a "two-shot VLM pipeline", which would support this.

@cau-git @dolfim-ibm fyi: ^^

geoHeil · 2025-06-18T09:54:10Z

How do we move forward here? Should we get this merged? And then explore a 2nd PR for a separate VLM pipeline?

PeterStaar-IBM · 2025-06-18T10:31:18Z

How do we move forward here? Should we get this merged? And then explore a 2nd PR for a separate VLM pipeline?

I want to check it out and run it myself first. My main worry is that we dont showcase the rich output that Dolphin currently provides (with layout boxes). But, this PR might be good enough as a starting point. I would just love to go all the credit to the team that built the Dolphin model.

PeterStaar-IBM · 2025-06-30T10:47:10Z

@geoHeil I would like to merge this asap, but I see we fail the MyPy (https://github.com/docling-project/docling/actions/runs/15699814942/job/44748636363?pr=1772),

Could you fix this quickly?

Signed-off-by: Georg Heiler <[email protected]>

github-actions · 2025-06-30T15:32:36Z

✅ DCO Check Passed

Thanks @geoHeil, all your commits are properly signed off. 🎉

geoHeil · 2025-06-30T15:41:08Z

pleaese re-run CI - should work now

codecov · 2025-06-30T18:18:17Z

Codecov Report

Attention: Patch coverage is 14.28571% with 6 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
.../models/vlm_models_inline/hf_transformers_model.py	0.00%	6 Missing ⚠️

📢 Thoughts on this report? Let us know!

geoHeil mentioned this pull request Jun 14, 2025

[Feature Request] Add ByteDance/Dolphin model for Docling #1622

Open

geoHeil changed the title ~~prepare for dolfin~~ feat: add dolphin Jun 14, 2025

geoHeil force-pushed the feat/add-dolphin branch from 03ce2d2 to f16de96 Compare June 14, 2025 07:00

dolfim-ibm requested changes Jun 16, 2025

View reviewed changes

docling/datamodel/pipeline_options_vlm_model.py Outdated Show resolved Hide resolved

geoHeil added 3 commits June 17, 2025 08:25

feat(dolphin): add dolphin support

1c3699e

Signed-off-by: Georg Heiler <[email protected]>

rename

352f261

Signed-off-by: Georg Heiler <[email protected]>

reformat

448c932

Signed-off-by: Georg Heiler <[email protected]>

geoHeil force-pushed the feat/add-dolphin branch from 63dc7e7 to 448c932 Compare June 17, 2025 06:25

PeterStaar-IBM requested review from cau-git and PeterStaar-IBM June 17, 2025 11:16

PeterStaar-IBM assigned geoHeil and dolfim-ibm Jun 17, 2025

fix mypy

5ce2892

Signed-off-by: Georg Heiler <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add dolphin #1772

feat: add dolphin #1772

Uh oh!

geoHeil commented Jun 14, 2025

Uh oh!

mergify bot commented Jun 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

dolfim-ibm commented Jun 16, 2025

Uh oh!

PeterStaar-IBM commented Jun 16, 2025

Uh oh!

geoHeil commented Jun 16, 2025

Uh oh!

PeterStaar-IBM commented Jun 17, 2025

Uh oh!

geoHeil commented Jun 18, 2025

Uh oh!

PeterStaar-IBM commented Jun 18, 2025

Uh oh!

PeterStaar-IBM commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

geoHeil commented Jun 30, 2025

Uh oh!

codecov bot commented Jun 30, 2025

Uh oh!

Uh oh!

feat: add dolphin #1772

Are you sure you want to change the base?

feat: add dolphin #1772

Uh oh!

Conversation

geoHeil commented Jun 14, 2025

Uh oh!

mergify bot commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🟢 Enforce conventional commit

Uh oh!

Uh oh!

dolfim-ibm commented Jun 16, 2025

Uh oh!

PeterStaar-IBM commented Jun 16, 2025

Uh oh!

geoHeil commented Jun 16, 2025

Uh oh!

PeterStaar-IBM commented Jun 17, 2025

Uh oh!

geoHeil commented Jun 18, 2025

Uh oh!

PeterStaar-IBM commented Jun 18, 2025

Uh oh!

PeterStaar-IBM commented Jun 30, 2025

Uh oh!

github-actions bot commented Jun 30, 2025

Uh oh!

geoHeil commented Jun 30, 2025

Uh oh!

codecov bot commented Jun 30, 2025

Codecov Report

Uh oh!

Uh oh!

mergify bot commented Jun 14, 2025 •

edited

Loading