Skip to content

RFC: Impl BufReader for std::io::Read#53

Draft
cpubot wants to merge 1 commit intomasterfrom
buf-reader
Draft

RFC: Impl BufReader for std::io::Read#53
cpubot wants to merge 1 commit intomasterfrom
buf-reader

Conversation

@cpubot
Copy link
Contributor

@cpubot cpubot commented Dec 12, 2025

wincode currently doesn't provide direct support for interfacing with file-io (e.g., via Read) due to its specific trait semantics. In particular,

  • wincode has a concept of "trusted" readers that can elide bounds checking
  • in general prefers a BufRead style reader due the fact that SchemaRead implementations need to deal with variable sized type encodings (e.g., variable length encodings like ShortU16). For example, an implementation may need to inspect the next 3 bytes, but only use a subset of those bytes. Of course those implementations could get by with incremental reading as bytes are inspected, but that would be terribly inefficient without buffering, and so it's preferable for SchemaRead implementations in particular to deal with a BufRead style interface.

We can provide our own BufReader over any std::io::Read that implements the appropriate wincode semantics. The implementation is fairly straightforward until we need to deal with as_trusted_for on the Reader trait. Put simply, calling as_trusted_for means "I know for a fact that I will need n bytes for all subsequent reads on the returned Reader". This is trivial and obvious implementation-wise when the underlying source is in-memory, but the "right" way to handle this when dealing with what could be file or network IO is more nuanced.

Perhaps the most straightforward solution is to simply attempt to fill the BufReader's buffer up to capacity.min(n_bytes_requested), and then proceed as usual. This is great for cases where n_bytes_requested is less than capacity. It can become suboptimal however if n_bytes_requested is significantly larger than the buffer capacity. In particular, we incur additional syscall overhead for subsequent reads past capacity that could otherwise be avoided if the buffer were larger. If we were to instead grow the buffer to support the requested number of bytes, we could conceivably read most if not all required bytes in a single syscall.

So, to summarize the above options for dealing with as_trusted_for:

  1. Fill buffer capacity as much as possible without growth, capacity.min(n_bytes_requested), then proceed as usual.
  2. Grow the buffer to accommodate all future reads on the returned trusted reader and simply return a TrustedSliceWriter over those bytes.

Option 1:
Simpler conceptually, more predictable memory usage (no growth), and thus probably closer semantically to what users expect from std::io::BufReader. We can't really provide a "Trusted" variant of the BufReader in this case (in the way that we can with in-memory sources), because we still have to deal with bounds checking and potential syscalls.

Option 2:
Less predictable memory usage (may grow), but will reduce syscall overhead for large trusted windows. Additionally benefits from the performance of TrustedSliceWriter for all reads in that trusted window (no bounds checking + no syscalls -- high likelihood of vectorized reads)

I'd like to have a discussion on which route is better for wincode, so I'm providing both implementations in two draft PRs.

This PR includes an implementation of Option 2 (growable BufReader).

See #54 for Option 1 implementation.

Note the code in these implementations shouldn't be considered "final" per se, but working proof of concepts that are close to finalization.

@kskalski
Copy link
Contributor

I think it would be useful to first start with implementation for BufRead (this way we could also plug in any other implementation of BufRead, not necessarily rely on BufReader or implement it specifically).

In agave similar problem was handled by adding trait and impl that has overflow buffer (https://github.com/anza-xyz/agave/blob/4e031c8025f0c14ecc2b8652f1a75cf5051e00f3/fs/src/buffered_reader.rs#L87), my impression is that here we could use similar approach, but I will also read some of the code here and see if it or something else look better.

@cpubot
Copy link
Contributor Author

cpubot commented Dec 13, 2025

I think it would be useful to first start with implementation for BufRead (this way we could also plug in any other implementation of BufRead, not necessarily rely on BufReader or implement it specifically).

This PR actually introduces our own BufReader, which works over any Read, so we don't rely on std::io::BufReader at all

@kskalski
Copy link
Contributor

We could use traits for increasingly efficient implementations though:

  • make best-effort (i.e. not use trusted reader when obtained buffer is too small) implementation for BufRead - I think this could be useful if user already have some buffering and little control over it
  • define a trait deriving from BufRead that adds extra functionality of obtaining at least n_bytes buffer, like fill_buf_preferred_len
    • this way user can plug-in other implementations (i.e. in agave we use io-uring reader that can use kernel registered buffers with read-ahead)
    • that kind of API will allow implementation to decide when it will grow and when not (fill_buf_preferred_len can simply provide smaller buf and we fallback to slower impl - though we could consider having API that distinguish returning short buf from EOF)
    • wincode can provide BufReader with this functionality that we can advertise to use instead of default BufReader
  • the way to use wincode for io::Read should be through some of the above readers
  • wincode buf reader could include parameters for initial buffer size and max growth

@cpubot
Copy link
Contributor Author

cpubot commented Dec 13, 2025

  • make best-effort (i.e. not use trusted reader when obtained buffer is too small) implementation for BufRead - I think this could be useful if user already have some buffering and little control over it

This would be a nice option, though it would entail some indirection via Box<dyn Reader> or some enum / composite struct in the TrustedReader associated type that is aware of whether it needs to perform additional IO or can act as TrustedSliceReader, so there is some performance downside due to a vtable or additional branching in the implementation.

  • define a trait deriving from BufRead that adds extra functionality of obtaining at least n_bytes buffer, like fill_buf_preferred_len

wincode's Reader actually already provides similar APIs with either:

  • fill_buf(n_bytes)
    • make best attempt to ensure n_bytes are available in the buffer, or return whatever is buffered up to n_bytes at EOF
  • or fill_exact(n_bytes)
    • errors if cannot buffer n_bytes

Additionally, I don't think std::io::BufRead is necessarily suitable for providing an implementation of Reader. In particular, fill_exact(n_bytes) and fill_buf(n_bytes) aren't implementable with BufRead's API -- BufRead provides no minimum fill guarantee and doesn't do any additional buffering unless it's empty, which means our implementation over BufRead would actually have to use BufRead's underlying Read implementation (in which case there was no point of using BufRead and we have two buffers).

  • this way user can plug-in other implementations (i.e. in agave we use io-uring reader that can use kernel registered buffers with read-ahead)
  • that kind of API will allow implementation to decide when it will grow and when not (fill_buf_preferred_len can simply provide smaller buf and we fallback to slower impl - though we could consider having API that distinguish returning short buf from EOF)

This is technically already doable given Reader's API. agave could provide its own Reader implementation with the specific semantics around the io_uring or regular file IO path and handle its own buffering. As far as I can tell, agave's implementation already does its own buffering with the additional functionality you already mentioned, which should make it fairly straightforward to provide an implementation for Reader without going through wincode's BufReader.

  • wincode buf reader could include parameters for initial buffer size and max growth

This might be the best route -- giving users a choice as to whether they want a growable buffer. We could provide two implementations (e.g., a BufReader and GrowableBufReader) that the user could pick. Or, parameterize BufReader like struct BufReader<const GROWABLE: bool> over which we could provide two implementations with differing Trusted associated types based on whether its growable.

// Caller guarantees n_bytes is greater than the number of bytes already buffered.
let needed = unsafe { n_bytes.unchecked_sub(buffered_len) };
// SAFETY: we maintain the invariant that `filled.end` is always less than `capacity`.
let edge_capacity = unsafe { buf.capacity().unchecked_sub(filled.end) };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does unchecked_sub generate more efficient code than wrapping_sub?

@kskalski
Copy link
Contributor

Additionally, I don't think std::io::BufRead is necessarily suitable for providing an implementation of Reader. In particular, fill_exact(n_bytes) and fill_buf(n_bytes) aren't implementable with BufRead's

Ok, that's right, I forgot that Reader trait requires buffers of exact size, not just for trusted reader.
Agreed that Reader should be directly implemented by other buffered readers, since it requires communicating the need to obtain larger buffer and benefits from direct mutable access to the buffer.

  • wincode buf reader could include parameters for initial buffer size and max growth

This might be the best route -- giving users a choice as to whether they want a growable buffer. We could provide two implementations (e.g., a BufReader and GrowableBufReader) that the user could pick. Or, parameterize BufReader like struct BufReader<const GROWABLE: bool> over which we could provide two implementations with differing Trusted associated types based on whether its growable.

Not sure how trusted readers relate to the growth capacity, but even for regular Reader::fill_buf we should probably have the variants that are able to fulfill growth (and maybe this should have thresholds) or signal error / fallback to less efficient operation.
The most flexible API would allow specifying initial buffer capacity and growth limit. Do you think a constant "growable" boolean specialization would be noticeably more efficient than checking growth limit?

I suppose we have cases where large number of bytes are requested based on runtime sizes (e.g. deserializing to &[u8] need to point to a whole slice, so all of the bytes need to be available), but sometimes those are best effort? What are the cases here

  • minimum required buffer size to deserialize necessary types --> user should probably just set initial buffer capacity to at least that
  • minimum required size to deserialize some rarely occuring cases -> need to grow to that size depending on input, otherwise error
  • ideal buffer size to efficiently deserialize necessary types -> is reallocation or less efficient deserialization better

@cpubot
Copy link
Contributor Author

cpubot commented Dec 15, 2025

Not sure how trusted readers relate to the growth capacity, but even for regular Reader::fill_buf we should probably have the variants that are able to fulfill growth (and maybe this should have thresholds) or signal error / fallback to less efficient operation. The most flexible API would allow specifying initial buffer capacity and growth limit.

as_trusted_for happens to be the most likely case for growth. It's unlikely that typical calls to fill_buf outside of as_trusted_for wont fit into even a very modestly sized buffer. Most calls to fill_buf aren't going to be greater than 16 bytes (e.g., the 128 bit integer types). The exceptional case to this would be String, which currently will call fill_buf with the encoded length. But this should likely be adjusted to use copy_into_slice so we can avoid growing the buffer in cases where the string bytes are greater than the capacity and write directly to the underlying Vec<u8> (this PR's implementation of copy_into_slice for BufReader will read directly into destinations once reads exceed buffer capacity).

Here's a prototypical example of a call to fill_buf that may entail growing (via as_trusted_for)

let mut ptr = vec.as_mut_ptr().cast::<MaybeUninit<T::Dst>>();
#[allow(clippy::arithmetic_side_effects)]
// SAFETY: `T::TYPE_META` specifies a static size, so `len` reads of `T::Dst`
// will consume `size * len` bytes, fully consuming the trusted window.
let mut reader = unsafe { reader.as_trusted_for(size * len) }?;
for i in 0..len {
T::read(&mut reader, unsafe { &mut *ptr })?;
unsafe {
ptr = ptr.add(1);
#[allow(clippy::arithmetic_side_effects)]
// i <= len
vec.set_len(i + 1);
}
}
}

Sequences with statically sized elements that don't satisfy zero_copy: true will typically use a version of this branch. This could be a sequence of structs comprised of statically sized members that don't have a consistent repr (e.g., no repr(C) or repr(transparent)). as_trusted_for in this case means "I am about to read n bytes with many calls to T::read".

The zero_copy: true branch here doesn't need to grow, as it will call copy_into_slice_t, which will avoid buffering if size is greater than the buffer capacity.

let spare_capacity = vec.spare_capacity_mut();
// SAFETY: T::Dst is zero-copy eligible (no invalid bit patterns, no layout requirements, no endianness checks, etc.).
unsafe { reader.copy_into_slice_t(spare_capacity)? };
// SAFETY: `copy_into_slice_t` fills the entire spare capacity or errors.
unsafe { vec.set_len(len) };

The TypeMeta::Dynamic case is also unlikely to grow, since the number of bytes needed is not known up front and will likely contain multiple small fill_buf calls (and doesn't call as_trusted_for).

let mut ptr = vec.as_mut_ptr().cast::<MaybeUninit<T::Dst>>();
for i in 0..len {
T::read(reader, unsafe { &mut *ptr })?;
unsafe {
ptr = ptr.add(1);
#[allow(clippy::arithmetic_side_effects)]
// i <= len
vec.set_len(i + 1);
}
}

Do you think a constant "growable" boolean specialization would be noticeably more efficient than checking growth limit?

It would because we can use different Trusted associated types. The growable specialization can use TrustedSliceReader, which means that all reads in trusted windows can avoid bounds checking and is likely to generate vectorized copy instructions.

I suppose we have cases where large number of bytes are requested based on runtime sizes (e.g. deserializing to &[u8] need to point to a whole slice, so all of the bytes need to be available), but sometimes those are best effort? What are the cases here

&[u8] (zero-copy slice) in particular will never be possible with BufReader, since zero-copy deserialization requires that the lifetime of &[u8] is bound to the lifetime of the input buffer -- so this is only possible when deserializing directly from byte slices (e.g., &[u8] or &[MaybeUninit<u8>]).

For large, owned byte sequences, like Vec<u8>, those are likely to use copy_into_slice_t, which as shown above can circumvent the buffer if the required read is larger than the buffer size. And as mentioned above, we can adjust String to take this path as well rather than calling fill_buf.

@kskalski
Copy link
Contributor

kskalski commented Feb 6, 2026

Coming back to this with some more comments as I'm also considering implementation of Reader for io-uring reader in agave:

  • I think the as_trusted_for API is currently too demanding to work well for buffered / chunked inputs - it has "optimize all or error" approach, but for many use-cases it seems more reasonable to have "optimize most or fallback to slow". Say you want to efficiently read and deserialize 1 billion-len Vec<StaticStruct>, it's nice to avoid the bound checks for read calls deserializing each individual struct, but you don't really need 1B*size_of::<StaticStruct> bytes of input immediately and fully available to fill the destination collection - you could as well do that in chunks of 1 million elements at a time, re-request a new trusted reader for next chunk of input, still a 1billion -> 1k bound checks reduction. Requiring callers to handle trusted readers below requested capacity is breaking and maybe sometimes a bit inconvenient to implement, but seems reasonable (implementations could also fallback to slow path and for properly sized in-mem input it would never happen).
  • your no-grow implementation is close to what I imagined for BufRead default implementation - of course there is the issue of fill_buf that even if operating on large buffer can return small slice when read position is at the boundary. For me the best approach would be to introduce a trait in wincode that would add fill_buf_hint_at_least to hint/force the reader to defragment / up-fill the buffer. Maybe there isn't that much more in the impl Reader besides fill_buf_n, but there is still some unsafe / wincode specific code, trusted reader, copying slices, etc.

@cpubot
Copy link
Contributor Author

cpubot commented Feb 6, 2026

  • I think the as_trusted_for API is currently too demanding to work well for buffered / chunked inputs - it has "optimize all or error" approach, but for many use-cases it seems more reasonable to have "optimize most or fallback to slow". Say you want to efficiently read and deserialize 1 billion-len Vec<StaticStruct>, it's nice to avoid the bound checks for read calls deserializing each individual struct, but you don't really need 1B*size_of::<StaticStruct> bytes of input immediately and fully available to fill the destination collection - you could as well do that in chunks of 1 million elements at a time, re-request a new trusted reader for next chunk of input, still a 1billion -> 1k bound checks reduction. Requiring callers to handle trusted readers below requested capacity is breaking and maybe sometimes a bit inconvenient to implement, but seems reasonable (implementations could also fallback to slow path and for properly sized in-mem input it would never happen).

Yeah, I think you're probably right. Additionally, rather than implicitly growing, it's likely better to allow the user to specify the buffer size since they'll have information about the typical payload size they're dealing with and can tune accordingly.

  • your no-grow implementation is close to what I imagined for BufRead default implementation - of course there is the issue of fill_buf that even if operating on large buffer can return small slice when read position is at the boundary. For me the best approach would be to introduce a trait in wincode that would add fill_buf_hint_at_least to hint/force the reader to defragment / up-fill the buffer. Maybe there isn't that much more in the impl Reader besides fill_buf_n, but there is still some unsafe / wincode specific code, trusted reader, copying slices, etc.

If I'm understanding your concern correctly, I believe that the fill_buf implementation in the no-grow impl will handle the case you're describing. fill_buf in wincode does actually take an n_bytes argument, and the impl in the no-grow branch will defragment when n_bytes would push the buffer over the boundary (or return an OOM error if n_bytes exceeds capacity).

Say you want to efficiently read and deserialize 1 billion-len Vec<StaticStruct>, it's nice to avoid the bound checks for read calls deserializing each individual struct, but you don't really need 1B*size_of::<StaticStruct> bytes of input immediately and fully available to fill the destination collection - you could as well do that in chunks of 1 million elements at a time

On this point, assuming we go the no-grow route, we could achieve this by storing the total number of requested trusted bytes in the trusted reader, and all calls to fill_buf and related calls could maximally fill the buffer relative to that trusted total (decrementing with each fill). That way we do the minimum number of IO syscalls relative to the size of the buffer when trusted readers are used.

@kskalski
Copy link
Contributor

kskalski commented Feb 9, 2026

introduce a trait in wincode that would add fill_buf_hint_at_least to hint/force the reader to defragment / up-fill the buffer. Maybe there isn't that much more in the impl Reader besides fill_buf_n, but there is still some unsafe / wincode specific code, trusted reader, copying slices, etc.

If I'm understanding your concern correctly, I believe that the fill_buf implementation in the no-grow impl will handle the case you're describing. fill_buf in wincode does actually take an n_bytes argument, and the impl in the no-grow branch will defragment when n_bytes would push the buffer over the boundary (or return an OOM error if n_bytes exceeds capacity).

Right, the impl in the draft PR does that, I'm thinking of making things more useful by:

  • making that function pluggable by exposing a trait that user can implement on their own BufRead implementations, so for example:
impl BufReadWithSizeHint for MyBufReader {
   fn fill_buf_hint_at_least(&mut self, desired_len: usize) -> io::Result<&[u8]> { .. }
}
..
let wincode_reader = wincode::io::IoReader::from_refillable_buf_read(MyBufReader::new())

could be used and all the rest of the implementation in this / the other PR would be put into wincode::io::IoReader that only uses functions of BufRead + BufReadWithSizeHint

  • the current implementation of fill_buf(n_bytes) could also be exposed by wincode crate to ease implementation of BufReadWithSizeHint, I guess such function would take mutable buffer and Read to defragment + refill available buffer from "source" reader (obviously it should be a different reader than self ;-))

  • I suppose we would still need to provide our own buf reader implementation, hopefully with the above two functionalities extracted, it would actually be cleaner

On this point, assuming we go the no-grow route, we could achieve this by storing the total number of requested trusted bytes in the trusted reader, and all calls to fill_buf and related calls could maximally fill the buffer relative to that trusted total (decrementing with each fill). That way we do the minimum number of IO syscalls relative to the size of the buffer when trusted readers are used.

In this mode the trusted reader would make its own decision whether to call source reader vs just provide existing data? Checking all the relevant conditions might destroy the purpose of trusted reader that should avoid bound checks...

There is another thing to consider here - the implementations we discussed above will try to fulfil the desired len hint / n_bytes by defragmenting data within current buffer and adding extra - this isn't always the best thing to do, as_trusted_for is used when caller wants to make accessing next n bytes cheaper, but moving the bytes around the internal buffer might actually be more expensive than the bound checks being avoided... Maybe we need to distinguish cases of:

  • caller must get >= n_bytes -> supposedly this is just for zero copy cases and not applicable to buffered reader, just that overall API must support this case
  • caller would prefer to get >= n_bytes because it would allow them to do batched / vectorized operation on them in one go (e.g. single memcopy for all), but can handle incremental operation -> typically std::io APIs just give "as many as possible" bytes
  • caller wants to suspend bounds checks for the next n_bytes - it doesn't necessarily mean all those bytes need to be immediately available (I suppose this is what you proposed in the comment above)

The breaking change is that 2. and 3. as opposed to 1. would now need to handle getting less than requested number of bytes. We wouldn't de-fragment buffer in those cases (or it could be implementation specific, e.g. if available bytes are <1/8 of the buffer capacity or < some fixed size, it does consider move + refill).

@kskalski
Copy link
Contributor

kskalski commented Feb 12, 2026

* I think the `as_trusted_for` API is currently too demanding to work well for buffered  / chunked inputs - it has "optimize all or error" approach, but for many use-cases it seems more reasonable to have "optimize most or fallback to slow". Say you want to efficiently read and deserialize 1 billion-len 

Ok, actually the APIs might be usable without much change, though maybe the cleanest way would be to make as_trusted_for return value more explicit about "OK / Request Too large / Error" distinction. For now I hacked a POC that (ab)uses the io::ReadError a bit:
#185

Benchmark run suggests it won't harm us.

@cpubot
Copy link
Contributor Author

cpubot commented Feb 13, 2026

I think perhaps it's useful to categorize what we're discussing here as two distinct as_trusted_for and fill_buf policies -- strict and best effort.

  • strict: error when requested number of bytes isn't possible with the given buffer size
  • best effort: fill to maximal capacity and continue filling on each fill request in accordance with requested trusted window size, minimizing io

For fill_buf, specifically, I think strict is the correct policy. If it is never possible to fill the buffer with the requested size, we should error. The expectation with fill_buf is that it may return fewer bytes than requested at eof, but returning fewer because the buffer isn't large enough is a different condition -- so I think best-effort here would be a violation of expected semantics

For as_trusted_for, I lean towards best-effort. There are cases, as you previously highlighted, where actually filling the buffer for extremely large payloads is definitely the wrong decision. Best-effort has a nice balance here -- it's still tune-able by increasing the buffer size, but doesn't try to load ridiculously large amounts of bytes into the buffer. It also reduces the number of total syscalls by maximally filling on each fill call relative to the total trusted window size.

I'm not sure I understand how the proposed BufReadWithSizeHint would differ from functionality we already provide. We have fill_buf(n_bytes: usize), which in the proposed no-grow impl, will attempt to fill to requested size, defragment if needed to fulfill request, or error if requested bytes would exceed buffer capacity (strict policy). We also have fill_exact(n_bytes: usize), which differs from fill_buf in that it will error even on eof if the source doesn't have enough bytes remaining.

For these cases:

  • caller must get >= n_bytes -> supposedly this is just for zero copy cases and not applicable to buffered reader, just that overall API must support this case
  • caller would prefer to get >= n_bytes because it would allow them to do batched / vectorized operation on them in one go (e.g. single memcopy for all), but can handle incremental operation -> typically std::io APIs just give "as many as possible" bytes

I don't quite follow why an implementation would want >= n_bytes. In pretty much all SchemaRead cases I can think of, we know either exactly the number of bytes we want, or we're dealing with a variable-length encoding (e.g., ShortU16 or VarInt) where we want <= n_bytes. Curious to know if you're thinking of a use-case I haven't thought of here.

For this case:

caller must get >= n_bytes -> supposedly this is just for zero copy cases and not applicable to buffered reader, just that overall API must support this case

For cases where a caller must get a precise number of bytes, we have fill_exact which returns either exactly n_bytes or errors if that request cannot be fulfilled.

For this case:

caller would prefer to get >= n_bytes because it would allow them to do batched / vectorized operation on them in one go (e.g. single memcopy for all), but can handle incremental operation -> typically std::io APIs just give "as many as possible" bytes

For cases where the number of bytes desired is known and large, and the goal is something like a direct memcpy, we have copy_into_slice. The implementation in the draft will drain the buffer into the destination and perform the rest of the read directly into the given destination.

For this case:

caller wants to suspend bounds checks for the next n_bytes - it doesn't necessarily mean all those bytes need to be immediately available (I suppose this is what you proposed in the comment above)

My sense is that suspending bounds checks is only possible if we enforce as_trusted_for to be strict. The best-effort policy would still need to bounds check, but could reduce IO considerably by doing maximal fills relative to the requested window size. There is another potential route -- the trusted impl could be an enum. If the trusted window fits within the buffer size, it can proceed as usual without bounds checks. Otherwise, it can fall back to the best-effort policy.

@kskalski
Copy link
Contributor

kskalski commented Feb 13, 2026

Let's list cases where getting n_bytes size &[u8] is:

  1. required for arbitrary n_bytes -> it's impossible to fulfill with buffered reader without arbitrary re-allocs
  2. required for user-known limit n_max -> it's possible to implement with Reader(R: BufRead, vec(n_max)) by using the extra pre-allocated vec whenever BufRead::fill_buf returns fewer than n_bytes, or with custom buf reader that uses single n_max buffer and defragments / re-fills
  3. best-effort, meaning getting n_bytes gets better perf, but slow-path exists -> we can implement directly over BufRead (or make it trivial to implement by user for their own reader)

If I'm not mistaken zero-copy (especially deserializing &Struct) requires 1. I'm not sure if 2. would work, which brings my side-question. Is support for zero-copy with buffered reader simply impossible due to how Reader's lifetime / API is designed or it's just a matter of ensuring that requested part of the data is in buffer? I suspect it's the former, meaning borrow_exact lifetimes force impl to always have / own the relevant part of the memory, not only for the duration of single call / restricted piece of data.

If we exclude the zero-copy case (ideally we would hard-enforce buffered reader not to be used for it at all), I wonder if we can make things work with best-effort (3.). I looked at cases where fill_buf or fill_buf_exact is called and I think the reader can prevent those (i.e. re-implementing fill_array, copy_into_slice_t without requiring >= n_bytes, replacing some calls to fill_buf(n) with fill_array - there is a couple that are used with small N like int or char parsing).
This just leaves as_trusted_for - and I think we can force callers to handle the inability to get trusted reader for n_bytes (as in my #185 or by using other enum as you suggested).

I guess one important use-case I have in mind here that would work quite well with best-effort trusted reader is Vec<Vec<StaticSizeStruct>>, since each element of outer vector is arbitrary, but derived from len, size, handling each of them by trusted read occasionally falling back to slower impl (that iterates over inner vec elements doing bounds checks) would be almost at no perf loss.

Given above, maybe the BufReadWithSizeHint doesn't make sense (its purpose was to introduce a smaller trait that could be put on top of BufRead to "help" the implementer handle case 2. such that they can use single existing buffer and de-fragment / refill it on demand). I would say if 2. doesn't bring us much compared to 1. (i.e. both won't support zero-copy), then we can drop the idea.

@kskalski
Copy link
Contributor

kskalski commented Feb 13, 2026

Ok, I noticed now that Reader's fill_buf parameter n_bytes caps the returned slice at max n_bytes, not min n_bytes, so actually it can be trivially implemented on top of BufRead::fill_buf. The only calls we need to worry about are fill_buf_exact and as_trusted_reader:

  • the only call to fill_buf_exact outside of the other functions in Reader (whose default impl can be replaced for BufRead easily), is in char deserialization - I will send PR chaging it with fill_array or something.
  • your implementation skips borrow_* functions, so they get the default "error for zero-copy" behavior - I assume this is how we make the reader unusable for zero-copy and we don't worry about that
  • if we change the as_trusted_for contract such that callers need to fallback to slow impl, we can also implement it on top of BufRead

If I'm not missing anything else, we could then provide

pub struct BufReader<R: std::io::BufRead>(R);
impl<'a, R: std::io::BufRead> Reader<'a> for BufReader<R> { ..}

We could also have the impl that makes a bit more effort by having access to internal buffer (move mem and re-fill at times we identify as worth it), but having above either in code or listing functions to re-implement in the doc, would provide contract on how to implement the Reader in user's own buf reader impl.

@kskalski
Copy link
Contributor

if we change the as_trusted_for contract such that callers need to fallback to slow impl, we can also implement it on top of BufRead

Alternatively if we don't want to do that, we could indeed make buffered reader return a "trusted" reader that cheats and still does bound checks, at least the one that will make it re-fill on demand. Trusted reader just fulfills Reader trait, so there is no other extra guarantee of having contiguous memory of requested size...

@cpubot
Copy link
Contributor Author

cpubot commented Feb 13, 2026

Is support for zero-copy with buffered reader simply impossible due to how Reader's lifetime / API is designed or it's just a matter of ensuring that requested part of the data is in buffer? I suspect it's the former, meaning borrow_exact lifetimes force impl to always have / own the relevant part of the memory, not only for the duration of single call / restricted piece of data.

Correct, zero-copy is only possible where the lifetime of the reader's source outlives the reader, which is only true when passing in a byte slice. So zero-copy is never possible for std::io sources.

your implementation skips borrow_* functions, so they get the default "error for zero-copy" behavior - I assume this is how we make the reader unusable for zero-copy and we don't worry about that

Exactly, correct

If I'm not missing anything else, we could then provide

pub struct BufReader<R: std::io::BufRead>(R);
impl<'a, R: std::io::BufRead> Reader<'a> for BufReader<R> { ..}

We could also have the impl that makes a bit more effort by having access to internal buffer (move mem and re-fill at times we identify as worth it), but having above either in code or listing functions to re-implement in the doc, would provide contract on how to implement the Reader in user's own buf reader impl.

Reader is set up in such a way that it should generally not be possible to implement for an arbitrary std::io source without a buffer. This is intentional, as there are virtually no cases where not buffering will be efficient for deserialization. In particular, deserialization is typically "small read heavy" -- i.e., there are typically many small reads with occasional opportunities for large single memcpys. The latter case is not the typical case, as these are only available when the type satisfies TypeMeta::Static { zero_copy: true}, which is only a small subset of all possible types.

This is why the reference implementation here and in #54 implements (our) BufReader for any std::io::Read -- the assumption is that you must buffer, so our provided impl does exactly that over any std::io::Read.

the only call to fill_buf_exact outside of the other functions in Reader (whose default impl can be replaced for BufRead easily), is in char deserialization - I will send PR chaging it with fill_array or something.

I still think these are fine. fill_exact and fill_array should generally only be used for small values, and as mentioned above, we want to encourage buffering rather than direct reads for small values, as we want to ensure small direct reads don't inadvertently prevent buffering and trigger more IO syscalls (see this comment for further explanation).


Given all of the above, I don't think we need any changes to the Reader API to implement. This PR and #54 are close to a solid, working foundation.

I think the only thing that needs to be decided here is what to do when as_trusted_for is called for an n_bytes window that exceeds the buffer size. I think the best-effort policy is the right one, where we implement the Trusted reader as an enum, where one variant represents the condition where the full window fits within the buffer, and the other basically proceeds as usual with maximal filling up to the requested n_bytes.

I think fill_buf, fill_exact, etc should always obey the strict policy -- if we cannot ever satisfy n_bytes, hard error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants