Skip to content

[Feature Request] Add ByteDance/Dolphin model for Docling #1622

Open
@NeroHin

Description

@NeroHin

Requested feature

Add ByteDance/Dolphin for Docling the customize document paring model .
...

Alternatives

Document image parsing is challenging due to its complexly intertwined elements such as text paragraphs, figures, formulas, and tables. Dolphin addresses these challenges through a two-stage approach:

🔍 Stage 1: Comprehensive page-level layout analysis by generating element sequence in natural reading order
🧩 Stage 2: Efficient parallel parsing of document elements using heterogeneous anchors and task-specific prompst
Dolphin achieves promising performance across diverse page-level and element-level parsing tasks while ensuring superior efficiency through its lightweight architecture and parallel parsing mechanism.

The model is implemented as a Hugging Face VisionEncoderDecoderModel for easy integration with the Transformers ecosystem.
...

Here is more example with Dolphin:

Image

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions