How to tackle such error : PdfReadWarning: Object 16920 0 not defined #1846
-
Hi,
but seems not to be enough to prevent such warning/error. What is/are the best practices when it comes to deal or check "corrupted" pdfs ? thanks for your feedback. PS : I got also this warning :
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Your code is more or less already part of PyMuPDF's since the latest 1.19.x version. It is being checked (not only for PDF!), whether the file exists and has a length > 0. For PDFs, MuPDF performs additional checks at open time and automatically starts repair algorithms to, for example, ensure that a usable PDF trailer does exist. If determining that a trailer is missing (often happens b/o incomplete downloads), a complete scan of all xref objects will be made to rebuild the xref table. So, because of the lack of any internal consistency guarantee in PDFs, previously undetected consistency errors may pop up after an apparently successful, harmless-looking open. In fact, all sorts of things can be wrong in a PDF: the page tree, the name tree, any single object (images, fonts, whatever). |
Beta Was this translation helpful? Give feedback.
Your code is more or less already part of PyMuPDF's since the latest 1.19.x version. It is being checked (not only for PDF!), whether the file exists and has a length > 0.
For some file types (non-PDF) a few additional checks are also performed.
For PDFs, MuPDF performs additional checks at open time and automatically starts repair algorithms to, for example, ensure that a usable PDF trailer does exist. If determining that a trailer is missing (often happens b/o incomplete downloads), a complete scan of all xref objects will be made to rebuild the xref table.
But it never walks through all of the PDF's internal structure unnecessarily / without reasons to be suspicious! Which is good.
If …