perf(render): dirty-rect bound the overprint after-paint pass — 3.3× corpus aggregate, 7–21× on overprint-heavy PDFs, byte-identical by RayVR · Pull Request #736 · yfedoseev/pdf_oxide

RayVR · 2026-06-12T13:52:21Z

Description

apply_overprint_after_paint (the §11.7.4 CompatibleOverprint after-paint pass) snapshotted the full page pixmap (pixmap.data().to_vec()) and re-scanned every pixel after every paint operator with /OP//op active. Print-targeted producers commonly set /OP true in their ExtGState defaults, so an ordinary text-heavy form page with thousands of glyph paints did tens of gigabytes of byte traffic. Profiling an 80-document real-world corpus (academic papers, manuals, invoices, government forms, leaflets, commercial artwork) measured the scan alone at 72% of aggregate render CPU, with the snapshot memcpy hiding in another ~6% of memmove time — documents like a state tax form (NY IT-2104) spent 96% of their entire render inside this one function.

This PR bounds both the snapshot and the scan to a device rect that provably contains the painted geometry, with a full-page fallback whenever no bound can be proven. Output is byte-identical: 511/511 corpus pages render byte-for-byte the same as the parent commit. Corpus aggregate render time drops 109.7 s → 32.9 s (3.33×); the affected document class runs 7–21× faster.

Workload	user before → after	speedup
NY IT-2104 tax form	8.88 s → 0.42 s	20.9×
spot-colour label artwork	25.68 s → 1.39 s	18.5×
pharma patient leaflet	11.27 s → 0.72 s	15.7×
DS-82 passport form (2 variants)	6.0 s → 0.49 s	12.1× / 12.3×
journal article reprint	9.28 s → 0.76 s	12.2×
corpus geomean (71 docs)	—	1.49× user / 1.45× wall

53 documents without overprint are flat — a regression probe pins that they do no scan work at all. The one nominal sub-1× entry (0.948×) did not reproduce under interleaved re-measurement (base 1.06–1.09 s vs head 1.06–1.08 s — noise).

samply before/after, apply_overprint_after_paint self time: 95.8% → 1.4% (IT-2104), 93.7% → 3.5% (leaflet), 92.6% → 2.5% (label artwork), 89.3% → 5.6% (DS-82), 85.8% → 3.4% (clinical leaflet).

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Performance improvement
Code refactoring
Tests
CI/CD changes

Related Issues

None tracked upstream — surfaced by a corpus-wide profiling sweep after the §11 transparency surface landed.

Changes Made

DeviceRect / RectSnapshot / PaintBounds (page_renderer.rs): a half-open clamped device rect; a row-packed pre-paint snapshot of a rect; and an accumulator for the device-space AABB of what a paint helper actually rasterised. All conversions degrade toward over-coverage (non-finite coordinates → full page; clamping happens only after expansion), never under.
Path fill/stroke arms (Fill, Stroke, the four fill+stroke combos): compute a pre-paint rect from the affine-mapped path bbox + a 2 px AA margin. Strokes expand the unclamped AABB (an off-page path can still paint in-page through a fat stroke) by half line width × the transform's Frobenius norm × √2 for square caps × the miter limit only when the path has joins and the join style is miter — single-segment rule lines, ubiquitous in forms, never pay the PDF-default miter limit of 10. Zero-width hairlines floor at 1 px. Snapshot and scan are both rect-sized.
Text arms (Tj, ', ", TJ): every glyph paint site in the text rasteriser accumulates the exact union of transformed glyph-path bounds into a PaintBounds threaded through render_text / render_tj_array / render_unicode_text / render_cid_direct / render_substituted_cjk / render_text_fallback; the after-paint scan is restricted to that union. The pre-paint snapshot for text stays full-page for now — glyph bounds are only knowable once outlines are built, and guessing from font metadata (FontBBox, ascent/descent) is not provable against broken descriptors.
Shadings (sh): paint the whole clip region — no provable bbox, full-page snapshot + scan retained (pinned by a test).
apply_overprint_after_paint / apply_overprint_after_paint_with_coverage take the rect snapshot plus an optional scan-narrowing rect; the per-pixel blend math is untouched.

Byte equivalence is structural: a pixel outside a provable bound cannot differ from the snapshot, so the historical full-page diff skipped it anyway; restricting the walk visits exactly the pixels the diff could ever act on.

Testing

I have added tests that prove my fix is effective or that my feature works
All new and existing tests pass locally
I have run cargo test --all-features
I have run cargo clippy -- -D warnings
I have run cargo fmt

Bounding contract (tests/test_overprint_dirty_rect.rs, gated on test-support): a PageRenderer::overprint_scanned_pixels counter pins five behaviours — small /OP true DeviceCMYK fill, stroke, and Tj paints must each scan ≤ 25% of the page (they scan ~5–10%; the pre-change behaviour was 100% and the tests were written first and watched fail at exactly scanned == total); sh must remain full-page; and a document without /OP must scan zero pixels. Counters, not wall-clock — exact and machine-independent.

Byte equivalence: 80-document corpus, pages 1–20 per document at 150 DPI, rendered with this branch and its parent commit (e12609e5, the only diff being this PR): 511/511 pages byte-identical (cmp on every PNG).

Semantics: the full suite — 8857 passed, 0 failed with rendering icc test-support — includes the §11.7.4 byte-exact CompatibleOverprint probes (OPM=0/1, DeviceGray/Separation sources, knockout-group interaction), which all pass unchanged.

Timing: medians of 10 warmed runs per (document, binary) via /usr/bin/time -p on an idle M2 Max; geomean and per-class numbers above.

Python Bindings (if applicable)

Python bindings updated (if needed)
Python tests pass
Python code formatted with ruff format
Python code linted with ruff check

No binding changes — pure-Rust rendering hot path; bindings inherit the speedup through render_page.

Documentation

I have updated the documentation (README, docs/, code comments)
I have added/updated examples (if applicable)
I have updated CHANGELOG.md

No user-facing docs change. The new types and the bounding invariants carry doc comments, including the obligation that every new glyph paint site must accumulate into PaintBounds (or the text scan under-covers).

Checklist

My code follows the project's coding guidelines (see CONTRIBUTING.md)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings

Additional Notes

After this change, the remaining cost on overprint-heavy documents is memmove (20–78% self in the after-profiles): the surviving text-arm full-page snapshot (one per text-showing op) and the sibling snapshot families (smask_snapshot, cmyk_compose_snapshot, spot_paint_snapshot, cmyk_sidecar_snapshot*) which still clone the full page per gated paint. The DeviceRect/RectSnapshot/PaintBounds machinery introduced here is deliberately reusable for those passes — natural follow-ups, kept out of this PR to keep the blast radius reviewable. A provable pre-paint text bound (outline dry pass or an incrementally-synced shadow buffer) would eliminate the last full-page memcpy; both options have correctness subtleties (metadata-independent bounds, shadow desync on gate transitions) that deserve their own review.

… scan apply_overprint_after_paint snapshotted the full page pixmap (pixmap.data().to_vec()) and re-scanned every pixel after every paint operator with /OP or /op active. Print-targeted producers set /OP true in their ExtGState defaults, so a text-heavy form page with thousands of glyph paints did tens of gigabytes of byte traffic: corpus profiling measured the scan alone at 72% of aggregate render CPU across 78 real-world documents, with the snapshot memcpy hiding in another 6% of memmove time. The pass now snapshots and scans only a device rect that provably bounds the painted geometry: - Path fills bound by the transformed path bbox (affine corner mapping) plus a 2px anti-aliasing margin. Strokes additionally expand by half line width x the transform's Frobenius norm, times sqrt(2) for square caps, times the miter limit only when the path has joins and the join style is miter - so single-segment rule lines never pay the PDF default miter limit of 10. - Text paints accumulate the exact union of transformed glyph-path bounds (PaintBounds) at every fill site in the text rasteriser; the scan is restricted to that union. The pre-paint snapshot for text remains full-page (glyph bounds are only known once outlines are built), so text keeps one page-sized memcpy per operator for now. - Shadings (sh) paint the whole clip region and have no provable bbox: they keep the historical full-page snapshot and scan, as does any geometry whose mapped coordinates are non-finite. The fallback is always toward over-coverage, never under. Byte equivalence is structural: pixels outside a provable bound cannot differ from the snapshot, so the historical full-page diff skipped them anyway. The stroke outset expands the unclamped AABB because a path lying off-page can still paint in-page through a fat stroke. A test-support counter (overprint_scanned_pixels) pins the bounding contract: small /OP-true fill, stroke, and text paints must scan a rect-bounded neighbourhood, sh must remain full-page, and documents without /OP must not scan at all.

RayVR · 2026-06-12T15:46:31Z

CI triage — the 11 red checks decompose into two groups, neither caused by this PR's change:

Advisory-driven (3): Security Audit, Dependency Check (cargo-deny), Security audit (bundler-audit + OSV-Scanner). Two new PyO3 advisories published against the existing lockfile — RUSTSEC-2026-0176 (BoundListIterator/BoundTupleIterator unchecked nth/nth_back index arithmetic) and RUSTSEC-2026-0177 (PyCFunction::new_closure missing Sync bound). These fail on main's own latest CI run as well (cargo-deny red there too) and will fail every open PR until a pyo3 bump lands on main. Happy to send that bump as a separate PR if useful.

Network flakes (8): Test (ubuntu-latest, stable) died fetching the base64 crate from crates.io (SSL_read: unexpected eof, job log) before compiling anything; the 5 PHP jobs and 2 wheel builds all failed in setup-stage downloads in the same window. A re-run of failed jobs should clear all eight (I can't trigger re-runs from a fork).

All 165 functional checks — tests on the other platforms, clippy, docs, lib builds, bindings — are green.

yfedoseev

Approving. Reviewed the change against ISO 32000-1 §8.6.7 / §11.7.4 and independently re-verified the byte-identity claim with a fresh render sweep.

Spec & design

The correctness argument rests entirely on the dirty rect being a provable superset of the painted pixels — and every margin here is a conservative upper bound: the 2px AA pad, the √2 square-cap outset, the Frobenius-norm scale bound (≥ the spectral norm, so genuinely an upper bound on the transform's stretch), the miter limit applied only when the path actually has miter joins, the 1px hairline floor, and the non-finite → full-page fallback. The §11.7.4 CompatibleOverprint blend math is untouched; this only narrows where the after-paint scan runs, and shadings/text correctly retain the full-page snapshot. The doc comments capture the invariants well.

Independent render regression

Built base (e12609e5) and this branch, rendered 68 diverse PDFs (overprint / CMYK / knockout / shading / smask / tiling-pattern / clip / CJK / forms), pages 1–5 @150 DPI, pixel-diffed:

132/132 pages byte-identical — including the stroke-heavy CAD overprint plans and tiger-as-form-xobject (0 px differ).

Reproduces your 511/511 result on a fresh corpus.

CI heads-up

The red cargo-deny / Security Audit / OSV checks aren't from this change — they're the pyo3 RUSTSEC-2026-0176/0177 advisory-ignores that just landed in deny.toml / .cargo/audit.toml / osv-scanner.toml with v0.3.64, after this branch's base. Merging current main will clear them.

A 3.3× corpus-aggregate win on the render hot path with provably equivalent output is excellent — thank you for the careful work and the thorough write-up. 🙏

RayVR requested a review from yfedoseev as a code owner June 12, 2026 13:52

yfedoseev approved these changes Jun 12, 2026

View reviewed changes

Merge branch 'main' into perf/overprint-dirty-rect

4c0d510

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(render): dirty-rect bound the overprint after-paint pass — 3.3× corpus aggregate, 7–21× on overprint-heavy PDFs, byte-identical#736

perf(render): dirty-rect bound the overprint after-paint pass — 3.3× corpus aggregate, 7–21× on overprint-heavy PDFs, byte-identical#736
RayVR wants to merge 2 commits into
yfedoseev:mainfrom
RayVR:perf/overprint-dirty-rect

RayVR commented Jun 12, 2026

Uh oh!

RayVR commented Jun 12, 2026

Uh oh!

yfedoseev left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

RayVR commented Jun 12, 2026

Description

Type of Change

Related Issues

Changes Made

Testing

Python Bindings (if applicable)

Documentation

Checklist

Additional Notes

Uh oh!

RayVR commented Jun 12, 2026

Uh oh!

yfedoseev left a comment

Choose a reason for hiding this comment

Spec & design

Independent render regression

CI heads-up

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants