feat: use reader.copy_into_slice when deserializing char by kskalski · Pull Request #187 · anza-xyz/wincode

kskalski · 2026-02-13T04:25:05Z

Avoid calling fill_buf_exact during char deserialization such that Reader implementation isn't required to always provide borrowed slice of requested (2-4 byte) size.

Use a stack defined 4-byte array instead and fill it with requires (up to 4) bytes using copy_into_slice.

This removes the only call to fill_buf_exact in production code outside of default / forwarding implementations of Reader.

kskalski · 2026-02-13T04:28:47Z

wincode/src/schema/impls.rs

            0xF0..=0xF4 => 4,
            _ => return Err(invalid_char_lead(b0)),
        };
+        let mut buf = [0u8; 4];


I think we could use [MaybeUninit<u8>; 4] and assume_init_ref with rust >=1.93 (https://doc.rust-lang.org/stable/core/primitive.slice.html#method.assume_init_ref), but I doubt for 4-byte array it would make any difference

cpubot · 2026-02-13T18:10:20Z

For small reads like this it's totally fine to use fill_* methods rather than copying directly.

wincode/wincode/src/io/std_io.rs

Lines 103 to 106 in 85034c2

    
           if buffered_len >= n_bytes { 
        
               // SAFETY: `filled` always points to an initialized portion of the buffer. 
        
               return Ok(unsafe { from_raw_parts(buf.as_ptr().cast::<u8>().add(filled.start), n_bytes) }); 
        
           }

The proposed implementation will short-circuit if the buffer already contains enough bytes to fulfill the request.

fill_buf or fill_exact doesn't always entail IO -- since we accept an n_bytes argument, we can satisfy the request without necessarily filling.

Our Reader API necessarily entails some kind of underlying buffer (given definitions of fill_buf, fill_exact, peek). In general I think the optimal strategy should be "fill wherever possible unless reads may be large" -- this amortizes IO over fewer total syscalls. This strategy is reflected in current implementations. For small reads (integers, char, IpAddr, etc), we potentially trigger buffering such that subsequent small reads don't actually make IO calls, while larger sequences like Vec<T> where T: ZeroCopy, String take the direct copy path.

Consider a case where one is deserializing a struct containing all integers and chars. If those were to all use copy_into_slice, they could totally circumvent buffering depending on the underlying IO Reader implementation, in which case an IO call would be made for each individual deserialization.

The proposed no-grow implementation will still actually call fill_buf when reads are less than buffer capacity

wincode/wincode/src/io/std_io.rs

Line 213 in 85034c2

if needed >= capacity {

and this was done specifically to avoid the above case mentioned above. But, this is an implementation detail of the proposal, not something necessarily guaranteed by the API. By using the fill_* methods, we guarantee by definition that this case can't happen.

kskalski · 2026-02-14T00:48:05Z

The practical problem is that even when operating on buffers, satisfying fill_exact and fill_array may be difficult and force the implementation to do memmove or worse do allocations. This is the case for agave reader that operates on chunks of larger buffer reading them in parallel - generally when you are reading a series of values you get to the end of the current chunk, say you are at position pos: chunk.len() - 3 and when you are asked for contiguous slice of memory of size 4, you can't just point to chunk[pos..pos + 4], since it goes out of the current chunk. Even if you have the rest of the data already in other chunk, you need to do extra defragmentation:

move chunk[pos..] to chunk[0..chunk.len() - pos]
move chunk.len() - pos data from next chunk into current one
the management of positions / consume after such op become more complicated (data is now in different part of buffer than usually, etc.)

I would argue that copy_to_slice is an "always cheaper" version of that, especially when user is going to copy out of returned slice anyway. Instead, they should copy from the position where the data is already in to destination memory. copy_to_slice is efficient for buffered reader, since it can easily avoid the unnecessary or short IOs, the crucial part is that it can populate target slice in 2 steps: first from the existing available data, then maybe after doing IO or switching to next chunk, the rest (possibly repeating the process several times).

It's fine to have fill_buf(n) API when expected return slice can be <n. However I think we should avoid APIs (or at least uses of it) where expectation is >=n.

Arguably the de-fragmentation cost mentioned above is not that bad if the expectation for returned slice can be bounded to n <= predictable_max. So fortunately getting rid of fill_exact is not a blocker. In case of this PR I think populating small piece of uninitialized stack is better than forcing reader to give access to contiguous memory (even in your reference impl we can avoid unnecessary memmove this way). I will do some comparisons of the assembly involved in real case, so we also know perf implications for current out of mem readers.

cpubot · 2026-02-14T20:59:58Z

I would argue that copy_to_slice is an "always cheaper" version of that, especially when user is going to copy out of returned slice anyway.

The issue with a memcpy for small values is that it can prohibit scalarization because it is an opaque "copy some bytes" intrinsic. This is similar in motivation to #64. It gives the compiler more opportunity and more visibility to load rather than copy. That being said, since we inline so heavily, the compiler is likely to be able produce equivalent assembly, but we have seen explicit cases where it does not do so reliably (and this can be especially problematic on bpf targets).

I do agree with one of your points in #188, that because we employ this pattern so often:

let bytes: [u8; N] = *reader.fill_array();
unsafe { reader.consume_unchecked(N) };

There is likely room for an additional helper like your proposed read_to_array (or my suggested name, get_array).

I would certainly prefer this over using copy_into_slice everywhere on small loads for the reason mentioned above. Slice impls can implement get_array() with a dereference of fill_array() and avoid the memcpy (which I propose should be the default impl in the Reader trait), while still giving flexibility to other Reader implementations that want to do something different.

feat: use reader.copy_into_slice when deserializing char

a0afebf

kskalski commented Feb 13, 2026

View reviewed changes

kskalski marked this pull request as ready for review February 13, 2026 07:56

kskalski requested a review from cpubot February 13, 2026 07:58

cpubot mentioned this pull request Feb 13, 2026

RFC: Impl BufReader for std::io::Read #53

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use reader.copy_into_slice when deserializing char#187

feat: use reader.copy_into_slice when deserializing char#187
kskalski wants to merge 1 commit intoanza-xyz:masterfrom
kskalski:ks/char_fill_exact

kskalski commented Feb 13, 2026

Uh oh!

kskalski Feb 13, 2026

Uh oh!

cpubot commented Feb 13, 2026

Uh oh!

kskalski commented Feb 14, 2026 •

edited

Loading

Uh oh!

cpubot commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kskalski commented Feb 13, 2026

Uh oh!

kskalski Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

cpubot commented Feb 13, 2026

Uh oh!

kskalski commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cpubot commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kskalski commented Feb 14, 2026 •

edited

Loading