[Bug]: Output PDF is too large #1366

user1823 · 2024-08-02T12:33:48Z

Describe the bug

On performing OCR (with --force-ocr), the output file size is 5.44× larger than the input file.

When the input file is re-written with GS and OCR is performed on the output, the OCRed file is only slightly larger than the original input file.

Input file size = 686 KB
OCRed = 3732 KB

Re-written with GS = 699 KB
OCRed = 804 KB

Steps to reproduce

1. Run ocrmypdf -v1 --output-type pdf --max-image-mpixels 1000 --tesseract-downsample-above 3508 --force-ocr in.pdf ocr.pdf
2. See that the output file is 5.44 times larger than the input file.
3. Run gswin64.exe -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -sOutputFile=gs.pdf in.pdf
4. Run ocrmypdf -v1 --output-type pdf --max-image-mpixels 1000 --tesseract-downsample-above 3508 --force-ocr gs.pdf gs_ocr.pdf
5. See that the OCRed file is now only slightly larger.

Files

in.pdf (Same file as that in #1361)

How did you download and install the software?

PyPI (pip, poetry, pipx, etc.)

OCRmyPDF version

16.4.3

Relevant log output

When run on original file:

ocrmypdf 16.4.3                                                                                           __main__.py:59
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Found tesseract 5.3.4.20240503                                                                           __init__.py:343
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '--version']                            __init__.py:133
Found gs 10.3.1                                                                                          __init__.py:343
Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '--version']                            __init__.py:133
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--list-langs']                             __init__.py:133
stdout/stderr = List of available languages in "C:\Program Files\Tesseract-OCR/tessdata/" (2):            __init__.py:73
eng
osd

No language specified; assuming --language eng                                                         _validation.py:54
pikepdf mmap enabled                                                                                      helpers.py:328
Gathering info with 1 thread workers                                                                         info.py:800
pikepdf mmap enabled                                                                                      helpers.py:328
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Using Tesseract OpenMP thread limit 3                                                               tesseract_ocr.py:199
pikepdf mmap enabled                                                                                      helpers.py:328
    1 page already has text! - rasterizing text and running OCR anyway                                  _pipeline.py:318
    1 Rasterize with png16m, rotation 0                                                                 _pipeline.py:539
    1 Weighted average image DPI is 175.4, max DPI is 600.0. The discrepancy may indicate a high detail _pipeline.py:477
region on this page, but could also indicate a problem with the input PDF file. Page image will be
rendered at 400.0 DPI.
    1 Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH',  __init__.py:133
'-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=1', '-dLastPage=1',
'-r400.000000x400.000000', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None',
'-f', 'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.2083b0_b\\origin.pdf']
    1 Rotating output by 0                                                                            ghostscript.py:149
    1 resolution (399.9992, 399.9992)                                                                   _pipeline.py:618
    1 Resizing image to fit image dimensions limit                                                        imageops.py:56
    1 Rescaled image to (2479, 3508) pixels and (300, 300) dpi                                           imageops.py:151
    1 convert                                                                                           _pipeline.py:735
    1 PIL format = PNG                                                                                   img2pdf.py:1834
    1 imgformat = PNG                                                                                    img2pdf.py:1852
    1 input dpi = 400 x 400                                                                              img2pdf.py:1371
    1 rotation = 0°                                                                                      img2pdf.py:1421
    1 input colorspace = RGB                                                                             img2pdf.py:1455
    1 width x height = 3307px x 4678px                                                                   img2pdf.py:1508
    1 read_images() embeds a PNG                                                                         img2pdf.py:2050
    1 convert done                                                                                      _pipeline.py:745
    1 Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '-l', 'eng',                          __init__.py:133
'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.2083b0_b\\000001_ocr.png',
'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.2083b0_b\\000001_ocr_hocr', 'hocr', 'txt']
    1 pikepdf.Matrix(0.18, 0, 0, -0.18, 0, 631.44)                                                          _hocr.py:203
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 824, 179)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 43, 373)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1954, 386)                                                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 42, 530)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 43, 656)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 41, 734)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999976, -0.00699983, 0.00699983, 0.999976, 40, 940)                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 39, 1019)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 34, 1099)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1300, 1155)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.99994, -0.0109993, 0.0109993, 0.99994, 33, 1342)                                     _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 36, 1420)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 36, 1500)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1298, 1613)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1300, 1695)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 35, 1905)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 35, 1985)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999982, 0.00599989, -0.00599989, 0.999982, 35, 2065)                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1301, 2160)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1301, 2242)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 36, 2471)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 37, 2553)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999976, 0.00699983, -0.00699983, 0.999976, 37, 2640)                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.99994, 0.0109993, -0.0109993, 0.99994, 34, 2722)                                     _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999916, 0.0129989, -0.0129989, 0.999916, 40, 3033)                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999988, 0.00499994, -0.00499994, 0.999988, 42, 3113)                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 45, 3191)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 45, 3225)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 58, 3256)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 60, 3288)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 49, 3317)                                                                  _hocr.py:323
    1 Emplacement update                                                                                   _graft.py:123
    1 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0                     _graft.py:140
    1 Grafting                                                                                             _graft.py:251
    1 Grafting with ctm pikepdf.Matrix(1.33414, 0, 0, 1.33352, 0, -5.68434e-14)                            _graft.py:294
    1 Page rotation: (content, auto) -> page = (0, 0) -> 0                                                 _graft.py:165
OCR                   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Postprocessing...                                                                                             ocr.py:144
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Linearizing           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 100/100 0:00:00
xref 11: treating as an optimization candidate                                                           optimize.py:282
Recursing into Form XObject /OCR-MguA7ICzwpsDknNobMnZig in page 0                                        optimize.py:265
XrefExt(xref=11, ext='.png')                                                                             optimize.py:347
Optimizable images: JPEGs: 0 PNGs: 1                                                                     optimize.py:352
Recompressing JPEGs   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
xref 11: treating as an optimization candidate                                                           optimize.py:282
Recursing into Form XObject /OCR-MguA7ICzwpsDknNobMnZig in page 0                                        optimize.py:265
Deflating JPEGs       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
xref 11: treating as an optimization candidate                                                           optimize.py:282
Recursing into Form XObject /OCR-MguA7ICzwpsDknNobMnZig in page 0                                        optimize.py:265
Optimizable images: JBIG2 groups: 0                                                                      optimize.py:363
JBIG2                 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
Image optimization did not improve the file - optimizations will not be used                             optimize.py:720
Running: ['C:\\jbig2enc-0.29\\jbig2.EXE', '--version']                                                   __init__.py:133
Running: ['C:\\pngquant\\pngquant.EXE', '--version']                                                     __init__.py:133
Image optimization ratio: 1.00 savings: -0.0%                                                           _pipeline.py:989
Total file size ratio: 0.18 savings: -444.1%                                                            _pipeline.py:992
C:\Users\User\AppData\Local\Temp\ocrmypdf.io.2083b0_b\optimize.pdf -> ocr.pdf                         _pipeline.py:1064
The output file size is 5.44× larger than the input file.                                             _validation.py:364
Possible reasons for this include:
--force-ocr was issued, causing transcoding.

When run on file re-written using GS:

ocrmypdf 16.4.3                                                                                           __main__.py:59
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Found tesseract 5.3.4.20240503                                                                           __init__.py:343
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '--version']                            __init__.py:133
Found gs 10.3.1                                                                                          __init__.py:343
Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '--version']                            __init__.py:133
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--list-langs']                             __init__.py:133
stdout/stderr = List of available languages in "C:\Program Files\Tesseract-OCR/tessdata/" (2):            __init__.py:73
eng
osd

No language specified; assuming --language eng                                                         _validation.py:54
pikepdf mmap enabled                                                                                      helpers.py:328
Gathering info with 1 thread workers                                                                         info.py:800
pikepdf mmap enabled                                                                                      helpers.py:328
Scanning contents     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Using Tesseract OpenMP thread limit 3                                                               tesseract_ocr.py:199
pikepdf mmap enabled                                                                                      helpers.py:328
    1 page already has text! - rasterizing text and running OCR anyway                                  _pipeline.py:318
    1 Rasterize with png16m, rotation 0                                                                 _pipeline.py:539
    1 Weighted average image DPI is 175.4, max DPI is 600.0. The discrepancy may indicate a high detail _pipeline.py:477
region on this page, but could also indicate a problem with the input PDF file. Page image will be
rendered at 400.0 DPI.
    1 Running: ['C:\\Program Files\\gs\\gs10.03.1\\bin\\gswin64c.EXE', '-dQUIET', '-dSAFER', '-dBATCH',  __init__.py:133
'-dNOPAUSE', '-dInterpolateControl=-1', '-sDEVICE=png16m', '-dFirstPage=1', '-dLastPage=1',
'-r400.000000x400.000000', '-dPDFSTOPONERROR', '-o', '-', '-sstdout=%stderr', '-dAutoRotatePages=/None',
'-f', 'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.dq3qh7v_\\origin.pdf']
    1 Rotating output by 0                                                                            ghostscript.py:149
    1 resolution (399.9992, 399.9992)                                                                   _pipeline.py:618
    1 Resizing image to fit image dimensions limit                                                        imageops.py:56
    1 Rescaled image to (2479, 3508) pixels and (300, 300) dpi                                           imageops.py:151
    1 convert                                                                                           _pipeline.py:735
    1 PIL format = JPEG                                                                                  img2pdf.py:1834
    1 imgformat = JPEG                                                                                   img2pdf.py:1852
    1 input dpi = 400 x 400                                                                              img2pdf.py:1371
    1 rotation = 0°                                                                                      img2pdf.py:1421
    1 input colorspace = RGB                                                                             img2pdf.py:1455
    1 width x height = 3307px x 4678px                                                                   img2pdf.py:1508
    1 read_images() embeds a JPEG                                                                        img2pdf.py:1868
    1 convert done                                                                                      _pipeline.py:745
    1 Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '-l', 'eng',                          __init__.py:133
'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.dq3qh7v_\\000001_ocr.png',
'C:\\Users\\User\\AppData\\Local\\Temp\\ocrmypdf.io.dq3qh7v_\\000001_ocr_hocr', 'hocr', 'txt']
    1 pikepdf.Matrix(0.18, 0, 0, -0.18, 0, 631.44)                                                          _hocr.py:203
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 824, 179)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 43, 373)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1954, 386)                                                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 42, 530)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 43, 656)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 41, 734)                                                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999976, -0.00699983, 0.00699983, 0.999976, 40, 940)                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 39, 1019)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 34, 1099)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1300, 1155)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.99994, -0.0109993, 0.0109993, 0.99994, 33, 1342)                                     _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 36, 1420)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 36, 1500)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1298, 1613)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1300, 1695)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 35, 1905)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 35, 1985)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999982, 0.00599989, -0.00599989, 0.999982, 35, 2065)                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1301, 2160)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 1301, 2242)                                                                _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 36, 2471)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 37, 2553)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999976, 0.00699983, -0.00699983, 0.999976, 37, 2640)                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.99994, 0.0109993, -0.0109993, 0.99994, 34, 2722)                                     _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999916, 0.0129989, -0.0129989, 0.999916, 40, 3033)                                   _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(0.999988, 0.00499994, -0.00499994, 0.999988, 42, 3113)                                 _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 45, 3191)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 45, 3225)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 58, 3256)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 60, 3288)                                                                  _hocr.py:323
    1 eng                                                                                                   _hocr.py:267
    1 pikepdf.Matrix(1, 0, 0, 1, 49, 3317)                                                                  _hocr.py:323
    1 Emplacement update                                                                                   _graft.py:123
    1 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0                     _graft.py:140
    1 Grafting                                                                                             _graft.py:251
    1 Grafting with ctm pikepdf.Matrix(1.33414, 0, 0, 1.33352, 0, -5.68434e-14)                            _graft.py:294
    1 Page rotation: (content, auto) -> page = (0, 0) -> 0                                                 _graft.py:165
OCR                   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Postprocessing...                                                                                             ocr.py:144
Running: ['C:\\Program Files\\Tesseract-OCR\\tesseract.EXE', '--version']                                __init__.py:133
Linearizing           ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 100/100 0:00:00
Recursing into Form XObject /OCR-YmcWPT_SVQ8ykR5dYENc2w in page 0                                        optimize.py:265
xref 11: treating as an optimization candidate                                                           optimize.py:282
XrefExt(xref=11, ext='.png')                                                                             optimize.py:347
Optimizable images: JPEGs: 0 PNGs: 1                                                                     optimize.py:352
Recompressing JPEGs   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
Recursing into Form XObject /OCR-YmcWPT_SVQ8ykR5dYENc2w in page 0                                        optimize.py:265
xref 11: treating as an optimization candidate                                                           optimize.py:282
xref 11: marking this JPEG as deflatable                                                                 optimize.py:547
Deflating JPEGs       ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 1/1 0:00:00
Recursing into Form XObject /OCR-YmcWPT_SVQ8ykR5dYENc2w in page 0                                        optimize.py:265
xref 11: treating as an optimization candidate                                                           optimize.py:282
xref 11: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization                  optimize.py:98
Optimizable images: JBIG2 groups: 0                                                                      optimize.py:363
JBIG2                 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% 0/0 -:--:--
Running: ['C:\\jbig2enc-0.29\\jbig2.EXE', '--version']                                                   __init__.py:133
Running: ['C:\\pngquant\\pngquant.EXE', '--version']                                                     __init__.py:133
Image optimization ratio: 1.21 savings: 17.4%                                                           _pipeline.py:989
Total file size ratio: 0.87 savings: -15.1%                                                             _pipeline.py:992
C:\Users\User\AppData\Local\Temp\ocrmypdf.io.dq3qh7v_\optimize.pdf -> gs_ocr.pdf                      _pipeline.py:1064

The text was updated successfully, but these errors were encountered:

user1823 added the triage Issue needs triage label Aug 2, 2024

user1823 assigned jbarlow83 Aug 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Output PDF is too large #1366

[Bug]: Output PDF is too large #1366

user1823 commented Aug 2, 2024 •

edited

Loading

[Bug]: Output PDF is too large #1366

[Bug]: Output PDF is too large #1366

Comments

user1823 commented Aug 2, 2024 • edited Loading

Describe the bug

Steps to reproduce

Files

How did you download and install the software?

OCRmyPDF version

Relevant log output

user1823 commented Aug 2, 2024 •

edited

Loading