-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug
I am trying to extract data from the table from research article and the results are not accurate. I have similar observations with other tables also. Often some values are missing and often some special characters like '=' and '~' are added. Sometimes, number '0' is misinterpreted as letter 'o'.
I tried with tesseractocr also, to rule out the role of EasyOCR engine, but things got worse with tesseract ocr.
Is it known bug in Docling, while extracting the values from tables?
...
Steps to reproduce
Table from paper was used for extracting data https://doi.org/10.1016/j.jpcs.2024.112412
...
Docling version
2.44.0
...
Python version
3.12.3...
Please find the pdf showing the lose of values while extraction of data

Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working