Description
Requested feature
Add ByteDance/Dolphin for Docling the customize document paring model .
...
Alternatives
Document image parsing is challenging due to its complexly intertwined elements such as text paragraphs, figures, formulas, and tables. Dolphin addresses these challenges through a two-stage approach:
🔍 Stage 1: Comprehensive page-level layout analysis by generating element sequence in natural reading order
🧩 Stage 2: Efficient parallel parsing of document elements using heterogeneous anchors and task-specific prompst
Dolphin achieves promising performance across diverse page-level and element-level parsing tasks while ensuring superior efficiency through its lightweight architecture and parallel parsing mechanism.
The model is implemented as a Hugging Face VisionEncoderDecoderModel for easy integration with the Transformers ecosystem.
...
Here is more example with Dolphin: