Conversation
|
I think it would be useful to first start with implementation for In agave similar problem was handled by adding trait and impl that has overflow buffer (https://github.com/anza-xyz/agave/blob/4e031c8025f0c14ecc2b8652f1a75cf5051e00f3/fs/src/buffered_reader.rs#L87), my impression is that here we could use similar approach, but I will also read some of the code here and see if it or something else look better. |
This PR actually introduces our own |
|
We could use traits for increasingly efficient implementations though:
|
This would be a nice option, though it would entail some indirection via
wincode's
Additionally, I don't think
This is technically already doable given
This might be the best route -- giving users a choice as to whether they want a growable buffer. We could provide two implementations (e.g., a |
| // Caller guarantees n_bytes is greater than the number of bytes already buffered. | ||
| let needed = unsafe { n_bytes.unchecked_sub(buffered_len) }; | ||
| // SAFETY: we maintain the invariant that `filled.end` is always less than `capacity`. | ||
| let edge_capacity = unsafe { buf.capacity().unchecked_sub(filled.end) }; |
There was a problem hiding this comment.
does unchecked_sub generate more efficient code than wrapping_sub?
Ok, that's right, I forgot that
Not sure how trusted readers relate to the growth capacity, but even for regular I suppose we have cases where large number of bytes are requested based on runtime sizes (e.g. deserializing to
|
Here's a prototypical example of a call to wincode/wincode/src/schema/containers.rs Lines 342 to 356 in 53bc687 Sequences with statically sized elements that don't satisfy The wincode/wincode/src/schema/containers.rs Lines 332 to 336 in 53bc687 The wincode/wincode/src/schema/containers.rs Lines 358 to 367 in 53bc687
It would because we can use different
For large, owned byte sequences, like |
|
Coming back to this with some more comments as I'm also considering implementation of
|
Yeah, I think you're probably right. Additionally, rather than implicitly growing, it's likely better to allow the user to specify the buffer size since they'll have information about the typical payload size they're dealing with and can tune accordingly.
If I'm understanding your concern correctly, I believe that the
On this point, assuming we go the no-grow route, we could achieve this by storing the total number of requested trusted bytes in the trusted reader, and all calls to |
Right, the impl in the draft PR does that, I'm thinking of making things more useful by:
impl BufReadWithSizeHint for MyBufReader {
fn fill_buf_hint_at_least(&mut self, desired_len: usize) -> io::Result<&[u8]> { .. }
}
..
let wincode_reader = wincode::io::IoReader::from_refillable_buf_read(MyBufReader::new())could be used and all the rest of the implementation in this / the other PR would be put into
In this mode the trusted reader would make its own decision whether to call source reader vs just provide existing data? Checking all the relevant conditions might destroy the purpose of trusted reader that should avoid bound checks... There is another thing to consider here - the implementations we discussed above will try to fulfil the desired len hint /
The breaking change is that 2. and 3. as opposed to 1. would now need to handle getting less than requested number of bytes. We wouldn't de-fragment buffer in those cases (or it could be implementation specific, e.g. if available bytes are <1/8 of the buffer capacity or < some fixed size, it does consider move + refill). |
Ok, actually the APIs might be usable without much change, though maybe the cleanest way would be to make Benchmark run suggests it won't harm us. |
|
I think perhaps it's useful to categorize what we're discussing here as two distinct
For For I'm not sure I understand how the proposed For these cases:
I don't quite follow why an implementation would want For this case:
For cases where a caller must get a precise number of bytes, we have For this case:
For cases where the number of bytes desired is known and large, and the goal is something like a direct For this case:
My sense is that suspending bounds checks is only possible if we enforce |
|
Let's list cases where getting
If I'm not mistaken zero-copy (especially deserializing If we exclude the zero-copy case (ideally we would hard-enforce buffered reader not to be used for it at all), I wonder if we can make things work with best-effort (3.). I looked at cases where I guess one important use-case I have in mind here that would work quite well with best-effort trusted reader is Given above, maybe the |
|
Ok, I noticed now that
If I'm not missing anything else, we could then provide pub struct BufReader<R: std::io::BufRead>(R);
impl<'a, R: std::io::BufRead> Reader<'a> for BufReader<R> { ..}We could also have the impl that makes a bit more effort by having access to internal buffer (move mem and re-fill at times we identify as worth it), but having above either in code or listing functions to re-implement in the doc, would provide contract on how to implement the |
Alternatively if we don't want to do that, we could indeed make buffered reader return a "trusted" reader that cheats and still does bound checks, at least the one that will make it re-fill on demand. Trusted reader just fulfills |
Correct, zero-copy is only possible where the lifetime of the reader's source outlives the reader, which is only true when passing in a byte slice. So zero-copy is never possible for std::io sources.
Exactly, correct
This is why the reference implementation here and in #54 implements (our)
I still think these are fine. Given all of the above, I don't think we need any changes to the I think the only thing that needs to be decided here is what to do when I think |
wincode currently doesn't provide direct support for interfacing with file-io (e.g., via
Read) due to its specific trait semantics. In particular,BufReadstyle reader due the fact thatSchemaReadimplementations need to deal with variable sized type encodings (e.g., variable length encodings likeShortU16). For example, an implementation may need to inspect the next 3 bytes, but only use a subset of those bytes. Of course those implementations could get by with incremental reading as bytes are inspected, but that would be terribly inefficient without buffering, and so it's preferable forSchemaReadimplementations in particular to deal with aBufReadstyle interface.We can provide our own
BufReaderover anystd::io::Readthat implements the appropriate wincode semantics. The implementation is fairly straightforward until we need to deal withas_trusted_foron theReadertrait. Put simply, callingas_trusted_formeans "I know for a fact that I will neednbytes for all subsequent reads on the returnedReader". This is trivial and obvious implementation-wise when the underlying source is in-memory, but the "right" way to handle this when dealing with what could be file or network IO is more nuanced.Perhaps the most straightforward solution is to simply attempt to fill the
BufReader's buffer up tocapacity.min(n_bytes_requested), and then proceed as usual. This is great for cases wheren_bytes_requestedis less thancapacity. It can become suboptimal however ifn_bytes_requestedis significantly larger than the buffer capacity. In particular, we incur additional syscall overhead for subsequent reads past capacity that could otherwise be avoided if the buffer were larger. If we were to instead grow the buffer to support the requested number of bytes, we could conceivably read most if not all required bytes in a single syscall.So, to summarize the above options for dealing with
as_trusted_for:capacity.min(n_bytes_requested), then proceed as usual.TrustedSliceWriterover those bytes.Option 1:
Simpler conceptually, more predictable memory usage (no growth), and thus probably closer semantically to what users expect from
std::io::BufReader. We can't really provide a "Trusted" variant of theBufReaderin this case (in the way that we can with in-memory sources), because we still have to deal with bounds checking and potential syscalls.Option 2:
Less predictable memory usage (may grow), but will reduce syscall overhead for large trusted windows. Additionally benefits from the performance of
TrustedSliceWriterfor all reads in that trusted window (no bounds checking + no syscalls -- high likelihood of vectorized reads)I'd like to have a discussion on which route is better for wincode, so I'm providing both implementations in two draft PRs.
This PR includes an implementation of Option 2 (growable
BufReader).See #54 for Option 1 implementation.
Note the code in these implementations shouldn't be considered "final" per se, but working proof of concepts that are close to finalization.