This sample shows how to recognize and extract text from non-searchable PDF documents using Docotic.Pdf library and Tesseract OCR Engine.
Follow these steps to do OCR when a PDF page does not contain searchable text:
- Save the page as high-resolution image using Docotic.Pdf. Higher resolution leads to better recognition quality.
- Recognize the image using Tesseract OCR engine.
- Use recognized text.
If your documents contain text in language(s) other than English, provide Language Data Files for Tesseract 4.00 for the language(s) of your document.
Also ensure that you have Visual Studio 2015-2019 x86 & x64 runtimes installed.