-
-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Searchable PDFs Not Working #551
Comments
Can you see if there are any error logs? And attach a sample PDF with the issue here? |
Unfortunately there are no error logs, the text is simply exported. I have attached a file that I created with Naps2. The document is marked as “searchable” for screen readers, but there is only one graphic. If you want to export it as text with Adobe Acrobat, no text comes out. It doesn't matter which document you scan. |
That PDF is searchable for me, works fine with Adobe when I Ctrl+A and Ctrl+C. |
Strange, but you can't read the PDF with a screen reader. It looks like a graphic. If I go to “Export as text” in Adobe Acrobat, no text is exported either. In other searchable PDF files, the text is exported. |
I have now tested the problem with three different screen readers: JAWS, NVDA and Narrator, the screen reader from Microsoft that comes with Windows. The quickest and easiest way to test it is with Narrator. It is important that you follow the steps in this way.
Unfortunately, you cannot read the file as it is a graphic. |
NAPS2 appears to generate PDF files that are not properly tagged. In PDF documents, tags are essential metadata elements that define the logical structure of the document. They help organize content hierarchically, specifying headings, paragraphs, lists, tables, and other structural elements. These tags are crucial for accessibility, as they enable screen readers and other assistive technologies to interpret and present the document's content correctly. Without proper tagging, a PDF is essentially just a visual representation of the content rather than a structured document. This means that screen readers cannot navigate the text logically, making it difficult or impossible for visually impaired users to access the information. Even if OCR is applied to make the text selectable and searchable, the absence of proper tags prevents screen readers from reading the content in a meaningful way. The issue with NAPS2’s PDFs suggests that while the software may perform OCR, it does not add the necessary structural tags to the document. As a result, these PDFs do not meet accessibility standards, such as those outlined in the PDF/UA (Universal Accessibility) specification. Ensuring that PDFs are correctly tagged is important not only for accessibility but also for improved document indexing and searchability. Addressing this issue would significantly enhance the usability of PDFs generated by NAPS2, making them more accessible to all users, including those who rely on assistive technology. |
Alas, ChatGPT can only make uninformed guesses. But I do have an idea of how it could maybe be fixed. |
@cyanfish: Thank you! This is actually the information I received from the JAWS developers and the user community. I only had the text generated by ChatGPT because I have problems formulating English texts. |
Searchable PDFs Not Working
Description
NAPS2 does not create searchable PDF files, even when the option to create searchable PDFs is selected in the settings.
Steps to Reproduce
Expected Behavior
The PDF should contain selectable and searchable text.
Actual Behavior
The PDF is marked as searchable but only contains an image, making text selection and search impossible.
Environment
Additional Notes
The text was updated successfully, but these errors were encountered: