Skip to content

perf: cache FontMgr + system font enumeration thread-locally in SkiaLayerRenderer::new#1569

Merged
edwardkim merged 3 commits into
edwardkim:develfrom
humdrum00001010:perf/skia-fontmgr-thread-local-cache-devel
Jun 27, 2026
Merged

perf: cache FontMgr + system font enumeration thread-locally in SkiaLayerRenderer::new#1569
edwardkim merged 3 commits into
edwardkim:develfrom
humdrum00001010:perf/skia-fontmgr-thread-local-cache-devel

Conversation

@humdrum00001010

@humdrum00001010 humdrum00001010 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Impact — rendering all pages of a ~400-page HWPX is ~3.4 s faster (~8.4 ms/page). This is a full-render saving, not a load saving: parsing/opening the file is unaffected (~130 ms). The win is banked while rasterizing pages — every page in a full export, or one page at a time under lazy rendering (8.4 ms per page actually drawn).

SkiaLayerRenderer::new() re-enumerated every installed system font on every page (~8.4 ms/call); caching it once per thread removes that per-page tax, so the saving scales with the number of pages rendered.

What / How

SkiaLayerRenderer::new() ran FontMgr::default() + collect_system_families() (enumerating every installed font family) on every page. Now a thread_local holds (FontMgr, SystemFontFamilies) and each new() clones it (FontMgr = refcount bump, families = small HashSet clone). Font-matching inputs are unchanged ⇒ render output is byte-identical.

Proof

End-to-end wall-clock — real full-document render (release, branched off devel)

426-page 행정업무운영 편람 HWPX, rhwp export-png rendering all pages in one process (150 dpi, 432 rasterized pages), same machine, 2 runs each:

run before (uncached) after (cached)
1 21.01 s 17.29 s
2 20.94 s 17.41 s
avg 20.98 s 17.35 s

3.63 s saved across 432 pages = 8.40 ms/page (~17 % faster end-to-end). For a clean 400-page document: 8.4 ms × 400 ≈ 3.4 s.

Microbench — per-call cost (release, single-thread, 100-iter loop)

metric before (uncached) after (cached)
per SkiaLayerRenderer::new() 8,436 µs 4.4 µs (~1,900×)
font-set identity cached set == a fresh enumeration (assert_eq) → byte-identical matching

The 8.40 ms/page measured end-to-end matches the 8,436 µs/call microbench exactly — the per-call cost reproduces in real renders, it is not an artifact of the loop.

  • cargo test --release --features native-skia --lib1985 passed, 0 failed, 6 ignored
  • cargo clippy --features native-skia --lib -- -D warnings → clean
  • Native-skia render path only; no wasm/NIF impact.

참고: lazy rendering과의 시너지

이 PR가 없애는 것은 로드(파싱) 비용이 아니라 렌더링의 페이지당 고정 비용입니다. SkiaLayerRenderer::new()가 매 페이지마다 시스템 폰트를 전수 열거(~8.4 ms)하던 것을 스레드당 한 번으로 캐시하므로, 페이지당 비용에서 폰트 초기화가 빠지고 사실상 실제 raster 시간만 남습니다. 따라서 lazy(on-demand) 렌더링과 결합하면 — 필요한 페이지만 렌더링(lazy) + 그 페이지마다의 고정 비용 제거(이 PR) — rhwp 렌더링이 아주 빨라질 수 있습니다. 전체를 한 번에 렌더링하는 경우(예: 전체 내보내기)에는 위 측정처럼 400페이지 기준 ~3.4 s가 그대로 절약됩니다.

🤖 Generated with Claude Code

…ayerRenderer::new

SkiaLayerRenderer::new() called FontMgr::default() + collect_system_families()
(enumerating every installed font family) on every page. Measured ~8.4 ms/call on
macOS — ~15-21% of a page's raster time, and ~3.6 s of pure repeated enumeration on
a 426-page export. The system font set is process/thread stable, so compute it once
in a thread_local and clone (FontMgr = refcount bump, families = small HashSet clone,
~3 us) into each new().

Font-matching inputs are unchanged, so render output is byte-identical (verified:
p13/p9 PNG md5 match baseline). new() drops 8390 us -> 3.0 us. Native-skia render
path only (no wasm/NIF impact). 48 skia tests pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@humdrum00001010 humdrum00001010 marked this pull request as ready for review June 26, 2026 09:33
edwardkim added a commit that referenced this pull request Jun 27, 2026
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@edwardkim edwardkim merged commit 9c6f2e8 into edwardkim:devel Jun 27, 2026
10 checks passed
@edwardkim

Copy link
Copy Markdown
Owner

merge 완료했습니다 (merge commit 9c6f2e8, origin/devel 포함 검증). 감사합니다, @humdrum00001010!

PR에서 하신 두 핵심 주장을 메인테이너가 직접 재현·검증했습니다:

1. 렌더 출력 byte-identical — before(devel, 캐시 전)/after(PR, 캐시 후) 바이너리를 각각 --release --features native-skia 빌드해 동일 문서(보고서 양식 30페이지)를 export-png로 렌더한 결과, PNG md5 해시 30/30 완전 일치(diff 0). 캐시가 폰트 매칭을 바꾸지 않음을 바이트 단위로 확인했습니다 (CI Canvas visual diff pass와 일치).

2. ~8.4ms/page 성능 개선 — 같은 30페이지 wall-clock이 before 평균 3.91s → after 평균 3.64s, ~9ms/page 절감으로 주장과 일치했습니다.

thread_local 1회 계산 + clone(FontMgr refcount, families HashSet) 접근이 깔끔하고, 스레드별 독립 캐시라 멀티스레드 렌더에서도 안전합니다. 정량 증거가 충실해 검증이 수월했습니다. 좋은 최적화 감사합니다!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants