perf: cache FontMgr + system font enumeration thread-locally in SkiaLayerRenderer::new#1569
Merged
edwardkim merged 3 commits intoJun 27, 2026
Conversation
…ayerRenderer::new SkiaLayerRenderer::new() called FontMgr::default() + collect_system_families() (enumerating every installed font family) on every page. Measured ~8.4 ms/call on macOS — ~15-21% of a page's raster time, and ~3.6 s of pure repeated enumeration on a 426-page export. The system font set is process/thread stable, so compute it once in a thread_local and clone (FontMgr = refcount bump, families = small HashSet clone, ~3 us) into each new(). Font-matching inputs are unchanged, so render output is byte-identical (verified: p13/p9 PNG md5 match baseline). new() drops 8390 us -> 3.0 us. Native-skia render path only (no wasm/NIF impact). 48 skia tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
edwardkim
added a commit
that referenced
this pull request
Jun 27, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Owner
|
merge 완료했습니다 (merge commit 9c6f2e8, origin/devel 포함 검증). 감사합니다, @humdrum00001010! PR에서 하신 두 핵심 주장을 메인테이너가 직접 재현·검증했습니다: 1. 렌더 출력 byte-identical — before(devel, 캐시 전)/after(PR, 캐시 후) 바이너리를 각각 2. ~8.4ms/page 성능 개선 — 같은 30페이지 wall-clock이 before 평균 3.91s → after 평균 3.64s, ~9ms/page 절감으로 주장과 일치했습니다. thread_local 1회 계산 + clone(FontMgr refcount, families HashSet) 접근이 깔끔하고, 스레드별 독립 캐시라 멀티스레드 렌더에서도 안전합니다. 정량 증거가 충실해 검증이 수월했습니다. 좋은 최적화 감사합니다! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Impact — rendering all pages of a ~400-page HWPX is ~3.4 s faster (~8.4 ms/page). This is a full-render saving, not a load saving: parsing/opening the file is unaffected (~130 ms). The win is banked while rasterizing pages — every page in a full export, or one page at a time under lazy rendering (8.4 ms per page actually drawn).
SkiaLayerRenderer::new()re-enumerated every installed system font on every page (~8.4 ms/call); caching it once per thread removes that per-page tax, so the saving scales with the number of pages rendered.What / How
SkiaLayerRenderer::new()ranFontMgr::default()+collect_system_families()(enumerating every installed font family) on every page. Now athread_localholds(FontMgr, SystemFontFamilies)and eachnew()clones it (FontMgr= refcount bump, families = smallHashSetclone). Font-matching inputs are unchanged ⇒ render output is byte-identical.Proof
End-to-end wall-clock — real full-document render (release, branched off
devel)426-page 행정업무운영 편람 HWPX,
rhwp export-pngrendering all pages in one process (150 dpi, 432 rasterized pages), same machine, 2 runs each:→ 3.63 s saved across 432 pages = 8.40 ms/page (~17 % faster end-to-end). For a clean 400-page document:
8.4 ms × 400 ≈ 3.4 s.Microbench — per-call cost (release, single-thread, 100-iter loop)
SkiaLayerRenderer::new()==a fresh enumeration (assert_eq) → byte-identical matchingThe 8.40 ms/page measured end-to-end matches the 8,436 µs/call microbench exactly — the per-call cost reproduces in real renders, it is not an artifact of the loop.
cargo test --release --features native-skia --lib→ 1985 passed, 0 failed, 6 ignoredcargo clippy --features native-skia --lib -- -D warnings→ clean참고: lazy rendering과의 시너지
이 PR가 없애는 것은 로드(파싱) 비용이 아니라 렌더링의 페이지당 고정 비용입니다.
SkiaLayerRenderer::new()가 매 페이지마다 시스템 폰트를 전수 열거(~8.4 ms)하던 것을 스레드당 한 번으로 캐시하므로, 페이지당 비용에서 폰트 초기화가 빠지고 사실상 실제 raster 시간만 남습니다. 따라서 lazy(on-demand) 렌더링과 결합하면 — 필요한 페이지만 렌더링(lazy) + 그 페이지마다의 고정 비용 제거(이 PR) — rhwp 렌더링이 아주 빨라질 수 있습니다. 전체를 한 번에 렌더링하는 경우(예: 전체 내보내기)에는 위 측정처럼 400페이지 기준 ~3.4 s가 그대로 절약됩니다.🤖 Generated with Claude Code