-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Closed
Labels
bugSomething isn't workingSomething isn't workingpdf parsingPDF issue related to docling-parsePDF issue related to docling-parse
Description
Bug
The Docling Parse v4 has been found to accumulate lots of memory for (very) long documents.
With the attached example, the system will use more than 20GB.
Using other PDF backend (both pypdfium2 and dlparse_v2) is converting all with a constant memory of about 3.9GB.
Document: pg4500.pdf
Steps to reproduce
docling -vv --no-ocr --image-export-mode=placeholder --pdf-backend=dlparse_v4 --to md --to json pg4500.pdf
Docling version
docling --version
Docling version: 2.44.0
Docling Core version: 2.44.1
Docling IBM Models version: 3.9.0
Docling Parse version: 4.1.0
Python: cpython-312 (3.12.9)
Platform: macOS-15.5-arm64-arm-64bit
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingpdf parsingPDF issue related to docling-parsePDF issue related to docling-parse