You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On performing OCR (with --force-ocr), the output file size is 5.44× larger than the input file.
When the input file is re-written with GS and OCR is performed on the output, the OCRed file is only slightly larger than the original input file.
Input file size = 686 KB
OCRed = 3732 KB
Re-written with GS = 699 KB
OCRed = 804 KB
Steps to reproduce
1. Run ocrmypdf -v1 --output-type pdf --max-image-mpixels 1000 --tesseract-downsample-above 3508 --force-ocr in.pdf ocr.pdf
2. See that the output file is 5.44 times larger than the input file.
3. Run gswin64.exe -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -sOutputFile=gs.pdf in.pdf
4. Run ocrmypdf -v1 --output-type pdf --max-image-mpixels 1000 --tesseract-downsample-above 3508 --force-ocr gs.pdf gs_ocr.pdf
5. See that the OCRed file is now only slightly larger.
Describe the bug
On performing OCR (with --force-ocr), the output file size is 5.44× larger than the input file.
When the input file is re-written with GS and OCR is performed on the output, the OCRed file is only slightly larger than the original input file.
Input file size = 686 KB
OCRed = 3732 KB
Re-written with GS = 699 KB
OCRed = 804 KB
Steps to reproduce
Files
in.pdf (Same file as that in #1361)
How did you download and install the software?
PyPI (pip, poetry, pipx, etc.)
OCRmyPDF version
16.4.3
Relevant log output
When run on original file:
When run on file re-written using GS:
The text was updated successfully, but these errors were encountered: