perf: cache FontMgr + system font enumeration thread-locally in SkiaLayerRenderer::new by humdrum00001010 · Pull Request #1569 · edwardkim/rhwp

humdrum00001010 · 2026-06-26T09:30:14Z

Impact — rendering all pages of a ~400-page HWPX is ~3.4 s faster (~8.4 ms/page). This is a full-render saving, not a load saving: parsing/opening the file is unaffected (~130 ms). The win is banked while rasterizing pages — every page in a full export, or one page at a time under lazy rendering (8.4 ms per page actually drawn).

SkiaLayerRenderer::new() re-enumerated every installed system font on every page (~8.4 ms/call); caching it once per thread removes that per-page tax, so the saving scales with the number of pages rendered.

What / How

SkiaLayerRenderer::new() ran FontMgr::default() + collect_system_families() (enumerating every installed font family) on every page. Now a thread_local holds (FontMgr, SystemFontFamilies) and each new() clones it (FontMgr = refcount bump, families = small HashSet clone). Font-matching inputs are unchanged ⇒ render output is byte-identical.

Proof

End-to-end wall-clock — real full-document render (release, branched off `devel`)

426-page 행정업무운영 편람 HWPX, rhwp export-png rendering all pages in one process (150 dpi, 432 rasterized pages), same machine, 2 runs each:

run	before (uncached)	after (cached)
1	21.01 s	17.29 s
2	20.94 s	17.41 s
avg	20.98 s	17.35 s

→ 3.63 s saved across 432 pages = 8.40 ms/page (~17 % faster end-to-end). For a clean 400-page document: 8.4 ms × 400 ≈ 3.4 s.

Microbench — per-call cost (release, single-thread, 100-iter loop)

metric	before (uncached)	after (cached)
per `SkiaLayerRenderer::new()`	8,436 µs	4.4 µs (~1,900×)
font-set identity	—	cached set `==` a fresh enumeration (`assert_eq`) → byte-identical matching

The 8.40 ms/page measured end-to-end matches the 8,436 µs/call microbench exactly — the per-call cost reproduces in real renders, it is not an artifact of the loop.

cargo test --release --features native-skia --lib → 1985 passed, 0 failed, 6 ignored
cargo clippy --features native-skia --lib -- -D warnings → clean
Native-skia render path only; no wasm/NIF impact.

참고: lazy rendering과의 시너지

이 PR가 없애는 것은 로드(파싱) 비용이 아니라 렌더링의 페이지당 고정 비용입니다. SkiaLayerRenderer::new()가 매 페이지마다 시스템 폰트를 전수 열거(~8.4 ms)하던 것을 스레드당 한 번으로 캐시하므로, 페이지당 비용에서 폰트 초기화가 빠지고 사실상 실제 raster 시간만 남습니다. 따라서 lazy(on-demand) 렌더링과 결합하면 — 필요한 페이지만 렌더링(lazy) + 그 페이지마다의 고정 비용 제거(이 PR) — rhwp 렌더링이 아주 빨라질 수 있습니다. 전체를 한 번에 렌더링하는 경우(예: 전체 내보내기)에는 위 측정처럼 400페이지 기준 ~3.4 s가 그대로 절약됩니다.

🤖 Generated with Claude Code

…ayerRenderer::new SkiaLayerRenderer::new() called FontMgr::default() + collect_system_families() (enumerating every installed font family) on every page. Measured ~8.4 ms/call on macOS — ~15-21% of a page's raster time, and ~3.6 s of pure repeated enumeration on a 426-page export. The system font set is process/thread stable, so compute it once in a thread_local and clone (FontMgr = refcount bump, families = small HashSet clone, ~3 us) into each new(). Font-matching inputs are unchanged, so render output is byte-identical (verified: p13/p9 PNG md5 match baseline). new() drops 8390 us -> 3.0 us. Native-skia render path only (no wasm/NIF impact). 48 skia tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

edwardkim · 2026-06-27T06:14:21Z

merge 완료했습니다 (merge commit 9c6f2e8, origin/devel 포함 검증). 감사합니다, @humdrum00001010!

PR에서 하신 두 핵심 주장을 메인테이너가 직접 재현·검증했습니다:

1. 렌더 출력 byte-identical — before(devel, 캐시 전)/after(PR, 캐시 후) 바이너리를 각각 --release --features native-skia 빌드해 동일 문서(보고서 양식 30페이지)를 export-png로 렌더한 결과, PNG md5 해시 30/30 완전 일치(diff 0). 캐시가 폰트 매칭을 바꾸지 않음을 바이트 단위로 확인했습니다 (CI Canvas visual diff pass와 일치).

2. ~8.4ms/page 성능 개선 — 같은 30페이지 wall-clock이 before 평균 3.91s → after 평균 3.64s, ~9ms/page 절감으로 주장과 일치했습니다.

thread_local 1회 계산 + clone(FontMgr refcount, families HashSet) 접근이 깔끔하고, 스레드별 독립 캐시라 멀티스레드 렌더에서도 안전합니다. 정량 증거가 충실해 검증이 수월했습니다. 좋은 최적화 감사합니다!

humdrum00001010 marked this pull request as ready for review June 26, 2026 09:33

Merge branch 'devel' into perf/skia-fontmgr-thread-local-cache-devel

cf398ec

humdrum00001010 mentioned this pull request Jun 26, 2026

perf: optimize native Skia raster replay #1577

Merged

Merge branch 'devel' into perf/skia-fontmgr-thread-local-cache-devel

5899bd7

edwardkim added a commit that referenced this pull request Jun 27, 2026

docs: PR #1569 검토 기록 (Skia 폰트 캐시 성능, byte-identical 직접 검증)

be34d82

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

edwardkim merged commit 9c6f2e8 into edwardkim:devel Jun 27, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: cache FontMgr + system font enumeration thread-locally in SkiaLayerRenderer::new#1569

perf: cache FontMgr + system font enumeration thread-locally in SkiaLayerRenderer::new#1569
edwardkim merged 3 commits into
edwardkim:develfrom
humdrum00001010:perf/skia-fontmgr-thread-local-cache-devel

humdrum00001010 commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

edwardkim commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

humdrum00001010 commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What / How

Proof

End-to-end wall-clock — real full-document render (release, branched off devel)

Microbench — per-call cost (release, single-thread, 100-iter loop)

참고: lazy rendering과의 시너지

Uh oh!

Uh oh!

edwardkim commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

humdrum00001010 commented Jun 26, 2026 •

edited

Loading

End-to-end wall-clock — real full-document render (release, branched off `devel`)