Skip to content

Docling Parse v4 accumulating memory #2077

@dolfim-ibm

Description

@dolfim-ibm

Bug

The Docling Parse v4 has been found to accumulate lots of memory for (very) long documents.

With the attached example, the system will use more than 20GB.

Using other PDF backend (both pypdfium2 and dlparse_v2) is converting all with a constant memory of about 3.9GB.

Document: pg4500.pdf

Steps to reproduce

docling -vv --no-ocr --image-export-mode=placeholder --pdf-backend=dlparse_v4 --to md --to json pg4500.pdf

Docling version

docling --version
Docling version: 2.44.0
Docling Core version: 2.44.1
Docling IBM Models version: 3.9.0
Docling Parse version: 4.1.0
Python: cpython-312 (3.12.9)
Platform: macOS-15.5-arm64-arm-64bit

Metadata

Metadata

Labels

bugSomething isn't workingpdf parsingPDF issue related to docling-parse

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions