perf(render): dirty-rect bound the overprint after-paint pass — 3.3× corpus aggregate, 7–21× on overprint-heavy PDFs, byte-identical#736
Conversation
… scan apply_overprint_after_paint snapshotted the full page pixmap (pixmap.data().to_vec()) and re-scanned every pixel after every paint operator with /OP or /op active. Print-targeted producers set /OP true in their ExtGState defaults, so a text-heavy form page with thousands of glyph paints did tens of gigabytes of byte traffic: corpus profiling measured the scan alone at 72% of aggregate render CPU across 78 real-world documents, with the snapshot memcpy hiding in another 6% of memmove time. The pass now snapshots and scans only a device rect that provably bounds the painted geometry: - Path fills bound by the transformed path bbox (affine corner mapping) plus a 2px anti-aliasing margin. Strokes additionally expand by half line width x the transform's Frobenius norm, times sqrt(2) for square caps, times the miter limit only when the path has joins and the join style is miter - so single-segment rule lines never pay the PDF default miter limit of 10. - Text paints accumulate the exact union of transformed glyph-path bounds (PaintBounds) at every fill site in the text rasteriser; the scan is restricted to that union. The pre-paint snapshot for text remains full-page (glyph bounds are only known once outlines are built), so text keeps one page-sized memcpy per operator for now. - Shadings (sh) paint the whole clip region and have no provable bbox: they keep the historical full-page snapshot and scan, as does any geometry whose mapped coordinates are non-finite. The fallback is always toward over-coverage, never under. Byte equivalence is structural: pixels outside a provable bound cannot differ from the snapshot, so the historical full-page diff skipped them anyway. The stroke outset expands the unclamped AABB because a path lying off-page can still paint in-page through a fat stroke. A test-support counter (overprint_scanned_pixels) pins the bounding contract: small /OP-true fill, stroke, and text paints must scan a rect-bounded neighbourhood, sh must remain full-page, and documents without /OP must not scan at all.
|
CI triage — the 11 red checks decompose into two groups, neither caused by this PR's change: Advisory-driven (3): Network flakes (8): All 165 functional checks — tests on the other platforms, clippy, docs, lib builds, bindings — are green. |
yfedoseev
left a comment
There was a problem hiding this comment.
Approving. Reviewed the change against ISO 32000-1 §8.6.7 / §11.7.4 and independently re-verified the byte-identity claim with a fresh render sweep.
Spec & design
The correctness argument rests entirely on the dirty rect being a provable superset of the painted pixels — and every margin here is a conservative upper bound: the 2px AA pad, the √2 square-cap outset, the Frobenius-norm scale bound (≥ the spectral norm, so genuinely an upper bound on the transform's stretch), the miter limit applied only when the path actually has miter joins, the 1px hairline floor, and the non-finite → full-page fallback. The §11.7.4 CompatibleOverprint blend math is untouched; this only narrows where the after-paint scan runs, and shadings/text correctly retain the full-page snapshot. The doc comments capture the invariants well.
Independent render regression
Built base (e12609e5) and this branch, rendered 68 diverse PDFs (overprint / CMYK / knockout / shading / smask / tiling-pattern / clip / CJK / forms), pages 1–5 @150 DPI, pixel-diffed:
132/132 pages byte-identical — including the stroke-heavy CAD overprint plans and
tiger-as-form-xobject(0 px differ).
Reproduces your 511/511 result on a fresh corpus.
CI heads-up
The red cargo-deny / Security Audit / OSV checks aren't from this change — they're the pyo3 RUSTSEC-2026-0176/0177 advisory-ignores that just landed in deny.toml / .cargo/audit.toml / osv-scanner.toml with v0.3.64, after this branch's base. Merging current main will clear them.
A 3.3× corpus-aggregate win on the render hot path with provably equivalent output is excellent — thank you for the careful work and the thorough write-up. 🙏
Description
apply_overprint_after_paint(the §11.7.4 CompatibleOverprint after-paint pass) snapshotted the full page pixmap (pixmap.data().to_vec()) and re-scanned every pixel after every paint operator with/OP//opactive. Print-targeted producers commonly set/OP truein their ExtGState defaults, so an ordinary text-heavy form page with thousands of glyph paints did tens of gigabytes of byte traffic. Profiling an 80-document real-world corpus (academic papers, manuals, invoices, government forms, leaflets, commercial artwork) measured the scan alone at 72% of aggregate render CPU, with the snapshot memcpy hiding in another ~6% of memmove time — documents like a state tax form (NY IT-2104) spent 96% of their entire render inside this one function.This PR bounds both the snapshot and the scan to a device rect that provably contains the painted geometry, with a full-page fallback whenever no bound can be proven. Output is byte-identical: 511/511 corpus pages render byte-for-byte the same as the parent commit. Corpus aggregate render time drops 109.7 s → 32.9 s (3.33×); the affected document class runs 7–21× faster.
53 documents without overprint are flat — a regression probe pins that they do no scan work at all. The one nominal sub-1× entry (0.948×) did not reproduce under interleaved re-measurement (base 1.06–1.09 s vs head 1.06–1.08 s — noise).
samply before/after,
apply_overprint_after_paintself time: 95.8% → 1.4% (IT-2104), 93.7% → 3.5% (leaflet), 92.6% → 2.5% (label artwork), 89.3% → 5.6% (DS-82), 85.8% → 3.4% (clinical leaflet).Type of Change
Related Issues
None tracked upstream — surfaced by a corpus-wide profiling sweep after the §11 transparency surface landed.
Changes Made
DeviceRect/RectSnapshot/PaintBounds(page_renderer.rs): a half-open clamped device rect; a row-packed pre-paint snapshot of a rect; and an accumulator for the device-space AABB of what a paint helper actually rasterised. All conversions degrade toward over-coverage (non-finite coordinates → full page; clamping happens only after expansion), never under.PaintBoundsthreaded throughrender_text/render_tj_array/render_unicode_text/render_cid_direct/render_substituted_cjk/render_text_fallback; the after-paint scan is restricted to that union. The pre-paint snapshot for text stays full-page for now — glyph bounds are only knowable once outlines are built, and guessing from font metadata (FontBBox, ascent/descent) is not provable against broken descriptors.sh): paint the whole clip region — no provable bbox, full-page snapshot + scan retained (pinned by a test).apply_overprint_after_paint/apply_overprint_after_paint_with_coveragetake the rect snapshot plus an optional scan-narrowing rect; the per-pixel blend math is untouched.Byte equivalence is structural: a pixel outside a provable bound cannot differ from the snapshot, so the historical full-page diff skipped it anyway; restricting the walk visits exactly the pixels the diff could ever act on.
Testing
cargo test --all-featurescargo clippy -- -D warningscargo fmtBounding contract (
tests/test_overprint_dirty_rect.rs, gated ontest-support): aPageRenderer::overprint_scanned_pixelscounter pins five behaviours — small/OP trueDeviceCMYK fill, stroke, and Tj paints must each scan ≤ 25% of the page (they scan ~5–10%; the pre-change behaviour was 100% and the tests were written first and watched fail at exactlyscanned == total);shmust remain full-page; and a document without/OPmust scan zero pixels. Counters, not wall-clock — exact and machine-independent.Byte equivalence: 80-document corpus, pages 1–20 per document at 150 DPI, rendered with this branch and its parent commit (
e12609e5, the only diff being this PR): 511/511 pages byte-identical (cmpon every PNG).Semantics: the full suite — 8857 passed, 0 failed with
rendering icc test-support— includes the §11.7.4 byte-exact CompatibleOverprint probes (OPM=0/1, DeviceGray/Separation sources, knockout-group interaction), which all pass unchanged.Timing: medians of 10 warmed runs per (document, binary) via
/usr/bin/time -pon an idle M2 Max; geomean and per-class numbers above.Python Bindings (if applicable)
ruff formatruff checkNo binding changes — pure-Rust rendering hot path; bindings inherit the speedup through
render_page.Documentation
No user-facing docs change. The new types and the bounding invariants carry doc comments, including the obligation that every new glyph paint site must accumulate into
PaintBounds(or the text scan under-covers).Checklist
Additional Notes
After this change, the remaining cost on overprint-heavy documents is
memmove(20–78% self in the after-profiles): the surviving text-arm full-page snapshot (one per text-showing op) and the sibling snapshot families (smask_snapshot,cmyk_compose_snapshot,spot_paint_snapshot,cmyk_sidecar_snapshot*) which still clone the full page per gated paint. TheDeviceRect/RectSnapshot/PaintBoundsmachinery introduced here is deliberately reusable for those passes — natural follow-ups, kept out of this PR to keep the blast radius reviewable. A provable pre-paint text bound (outline dry pass or an incrementally-synced shadow buffer) would eliminate the last full-page memcpy; both options have correctness subtleties (metadata-independent bounds, shadow desync on gate transitions) that deserve their own review.