You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
I'm testing this program to convert medical books in brazilian portuguese. These books have around 500-700 pages at good quality and, after install all I need to run pypdfocr (one exclusively box for this :), tesseract 3.03 and some of others requirements) when I run it [1] looks like fine, so the product of execution is a file with sufix _ocr.pdf sizing 306 bytes. Its content [2] show nothing good.
What may be wrong!?
1 - Generating OCR of MyBookInPortuguese.pdf - 227 MegaBytes
root@vagrant-ubuntu-trusty-64:/vagrant# pypdfocr -v -l por MyBookInPortuguese.pdf
Starting conversion of MyBookInPortuguese.pdf
Running pdfimages to figure out DPI...
Using 300 DPI
Detected color
gs -q -dNOPAUSE -sDEVICE=jpeg -dJPEGQ=75 -r300 -sOutputFile="MyBookInPortuguese.pdf - 9ª Ed [ptbr+foto]_%d.jpg" "MyBookInPortuguese.pdf" -c quit
Skipping preprocess step
Checking tesseract version
tesseract -v
Created OCR'ed pdf as MyBookInPortuguese.pdf - 9ª Ed [ptbr+foto]_ocr.pdf
Cleaning up []
Cleaning up []
Cleaning up []
Cleaning up []
Cleaning up []
Completed conversion successfully to MyBookInPortuguese.pdf_ocr.pdf
2 - MyBookInPortuguese.pdf_ocr.pdf - 306 bytes
%PDF-1.3
1 0 obj
<<
/Kids [ ]
/Type /Pages
/Count 0
>>
endobj
2 0 obj
<<
/Producer (PyPDF2)
>>
endobj
3 0 obj
<<
/Type /Catalog
/Pages 1 0 R
>>
endobj
xref
0 4
0000000000 65535 f
0000000009 00000 n
0000000062 00000 n
0000000102 00000 n
trailer
<<
/Size 4
/Root 3 0 R
/Info 2 0 R
>>
startxref
151
%%EOF
The text was updated successfully, but these errors were encountered:
Did you installed tesseract-data-por (data files for portuguese language) in your distro?
at least in archlinux, tesseract package supports only english language by default. If you need to support other languages, you need to install tesseract-data- package for your distro.
For portuguese language support in archlinux you'll need to run the folowing command:
Hello,
I'm testing this program to convert medical books in brazilian portuguese. These books have around 500-700 pages at good quality and, after install all I need to run pypdfocr (one exclusively box for this :), tesseract 3.03 and some of others requirements) when I run it [1] looks like fine, so the product of execution is a file with sufix _ocr.pdf sizing 306 bytes. Its content [2] show nothing good.
1 - Generating OCR of MyBookInPortuguese.pdf - 227 MegaBytes
2 - MyBookInPortuguese.pdf_ocr.pdf - 306 bytes
The text was updated successfully, but these errors were encountered: