[Feature Request] Add  ByteDance/Dolphin model for Docling

### Requested feature
Add [ByteDance/Dolphin](https://github.com/bytedance/Dolphin) for Docling the customize document paring model .
...

### Alternatives

Document image parsing is challenging due to its complexly intertwined elements such as text paragraphs, figures, formulas, and tables. Dolphin addresses these challenges through a two-stage approach:

🔍 Stage 1: Comprehensive page-level layout analysis by generating element sequence in natural reading order
🧩 Stage 2: Efficient parallel parsing of document elements using heterogeneous anchors and task-specific prompst
Dolphin achieves promising performance across diverse page-level and element-level parsing tasks while ensuring superior efficiency through its lightweight architecture and parallel parsing mechanism.

The model is implemented as a Hugging Face VisionEncoderDecoderModel for easy integration with the Transformers ecosystem.
...



Here is more example with Dolphin:

![Image](https://github.com/user-attachments/assets/69d66946-6b62-4eff-a3d0-13079cc201e5)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] Add ByteDance/Dolphin model for Docling #1622

Requested feature

Alternatives

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] Add ByteDance/Dolphin model for Docling #1622

Description

Requested feature

Alternatives

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions