Searchable PDFs Not Working #551

winman3000 · 2025-02-03T09:46:35Z

Searchable PDFs Not Working

Description

NAPS2 does not create searchable PDF files, even when the option to create searchable PDFs is selected in the settings.

Steps to Reproduce

Scan a document using NAPS2.
Select "Save PDF".
Open the saved PDF in Adobe Acrobat.

Expected Behavior

The PDF should contain selectable and searchable text.

Actual Behavior

The PDF is marked as searchable but only contains an image, making text selection and search impossible.

Environment

NAPS2 Version: 8.0.3+e7cf25fa120c30decd76c41030dfb65b1ae5c032
Operating System: Windows 11 Professional Version 24H2 (OS Build 26100.3037)
OCR Language: German

Additional Notes

The issue persists across different scanned documents.
The problem occurs regardless of the OCR language selection.

cyanfish · 2025-02-06T18:32:48Z

Can you see if there are any error logs? And attach a sample PDF with the issue here?

winman3000 · 2025-02-07T07:01:31Z

Unfortunately there are no error logs, the text is simply exported. I have attached a file that I created with Naps2. The document is marked as “searchable” for screen readers, but there is only one graphic. If you want to export it as text with Adobe Acrobat, no text comes out. It doesn't matter which document you scan.

TestFile.pdf

cyanfish · 2025-02-08T04:36:58Z

That PDF is searchable for me, works fine with Adobe when I Ctrl+A and Ctrl+C.

winman3000 · 2025-02-08T11:27:49Z

Strange, but you can't read the PDF with a screen reader. It looks like a graphic. If I go to “Export as text” in Adobe Acrobat, no text is exported either. In other searchable PDF files, the text is exported.

winman3000 · 2025-02-09T17:06:48Z

I have now tested the problem with three different screen readers: JAWS, NVDA and Narrator, the screen reader from Microsoft that comes with Windows.

The quickest and easiest way to test it is with Narrator. It is important that you follow the steps in this way.

Start Narrator with CTRL+Windows+Enter.
Open the attached PDF file with Adobe Acrobat.
If necessary, confirm the accessibility settings.
Now read the document using the arrow keys.

Unfortunately, you cannot read the file as it is a graphic.

winman3000 · 2025-02-10T20:22:15Z

NAPS2 appears to generate PDF files that are not properly tagged. In PDF documents, tags are essential metadata elements that define the logical structure of the document. They help organize content hierarchically, specifying headings, paragraphs, lists, tables, and other structural elements. These tags are crucial for accessibility, as they enable screen readers and other assistive technologies to interpret and present the document's content correctly.

Without proper tagging, a PDF is essentially just a visual representation of the content rather than a structured document. This means that screen readers cannot navigate the text logically, making it difficult or impossible for visually impaired users to access the information. Even if OCR is applied to make the text selectable and searchable, the absence of proper tags prevents screen readers from reading the content in a meaningful way.

The issue with NAPS2’s PDFs suggests that while the software may perform OCR, it does not add the necessary structural tags to the document. As a result, these PDFs do not meet accessibility standards, such as those outlined in the PDF/UA (Universal Accessibility) specification. Ensuring that PDFs are correctly tagged is important not only for accessibility but also for improved document indexing and searchability.

Addressing this issue would significantly enhance the usability of PDFs generated by NAPS2, making them more accessible to all users, including those who rely on assistive technology.

cyanfish · 2025-02-11T01:21:52Z

Alas, ChatGPT can only make uninformed guesses. But I do have an idea of how it could maybe be fixed.

winman3000 · 2025-02-11T06:44:56Z

@cyanfish: Thank you!

This is actually the information I received from the JAWS developers and the user community. I only had the text generated by ChatGPT because I have problems formulating English texts.

cyanfish added compat ocr labels Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Searchable PDFs Not Working #551

Searchable PDFs Not Working #551

winman3000 commented Feb 3, 2025

cyanfish commented Feb 6, 2025

winman3000 commented Feb 7, 2025

cyanfish commented Feb 8, 2025

winman3000 commented Feb 8, 2025

winman3000 commented Feb 9, 2025

winman3000 commented Feb 10, 2025

cyanfish commented Feb 11, 2025

winman3000 commented Feb 11, 2025

Searchable PDFs Not Working #551

Searchable PDFs Not Working #551

Comments

winman3000 commented Feb 3, 2025

Searchable PDFs Not Working

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Notes

cyanfish commented Feb 6, 2025

winman3000 commented Feb 7, 2025

cyanfish commented Feb 8, 2025

winman3000 commented Feb 8, 2025

winman3000 commented Feb 9, 2025

winman3000 commented Feb 10, 2025

cyanfish commented Feb 11, 2025

winman3000 commented Feb 11, 2025