Optimize char deserialization with manual UTF-8 decoder by tanmay4l · Pull Request #33 · anza-xyz/wincode

tanmay4l · 2025-11-17T10:20:36Z

addresses the TODO comment at line 247-250 which noted: "Could implement a manual decoder that avoids UTF-8
validate + chars() and instead performs the UTF-8 validity checks and produces a char directly. Some quick
micro-benchmarking revealed a roughly 2x speedup is possible."

Changes

Before:

let str = core::str::from_utf8(buf).map_err(invalid_utf8_encoding)?;
let c = str.chars().next().unwrap();

After:
- Manual UTF-8 decoding for 2-4 byte characters using bit masks
- Inline validation of continuation bytes (must be 10xxxxxx)
- Overlong encoding validation (3-byte: >= U+0800, 4-byte: >= U+10000)
- Surrogate validation (rejects U+D800..U+DFFF)
- Out of range validation (rejects > U+10FFFF)

tanmay4l added 2 commits November 17, 2025 15:49

Optimize char deserialization with manual UTF-8 decoder

e6e80ac

Clippy-clean

3a2c574

tanmay4l closed this Jan 22, 2026

tanmay4l deleted the optimize-char-decode branch January 22, 2026 18:58

tanmay4l restored the optimize-char-decode branch January 22, 2026 19:25

tanmay4l reopened this Jan 22, 2026

Merge branch 'master' into optimize-char-decode

f6cccf4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize char deserialization with manual UTF-8 decoder#33

Optimize char deserialization with manual UTF-8 decoder#33
tanmay4l wants to merge 3 commits intoanza-xyz:masterfrom
tanmay4l:optimize-char-decode

tanmay4l commented Nov 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tanmay4l commented Nov 17, 2025

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant