Conversation
|
| Branch | rscbc32-23550438882-154-1 |
| Testbed | ubuntu-latest |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result microseconds (µs) (Result Δ%) | Upper Boundary microseconds (µs) (Limit %) |
|---|---|---|---|
| query | 📈 view plot 🚷 view threshold | 449.26 µs(+6.64%)Baseline: 421.28 µs | 484.47 µs (92.73%) |
| upsert_and_get | 📈 view plot 🚷 view threshold | 445.11 µs(+11.22%)Baseline: 400.19 µs | 460.22 µs (96.72%) |
|
| Branch | rscbc32-23551112822-155-1 |
| Testbed | ubuntu-latest |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result microseconds (µs) (Result Δ%) | Upper Boundary microseconds (µs) (Limit %) |
|---|---|---|---|
| query | 📈 view plot 🚷 view threshold | 417.05 µs(-1.00%)Baseline: 421.28 µs | 484.47 µs (86.08%) |
| upsert_and_get | 📈 view plot 🚷 view threshold | 407.96 µs(+1.94%)Baseline: 400.19 µs | 460.22 µs (88.64%) |
There was a problem hiding this comment.
Pull request overview
This PR reduces allocations on key KV hot paths by pushing zero-copy buffers (bytes::Bytes) through the core → SDK results boundary, avoiding eager allocations for retry bookkeeping, and simplifying compression buffering behavior.
Changes:
- Replace
Vec<u8>document bodies in KV results withbytes::Bytesto avoid memcpy/clones on gets. - Make retry reason tracking lazy (
Option<HashSet<_>>) to avoid allocating for non-retried operations. - Update compression to return
Cow<[u8]>and add unit tests around compression behavior/manager.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/couchbase/src/results/kv_results.rs | Switch SDK GetResult content storage to Bytes and propagate from core results. |
| sdk/couchbase/src/mutation_state.rs | Change bucket name storage to Arc<str> to reduce cloning; update (de)serialization. |
| sdk/couchbase/src/clients/couchbase_core_kv_client.rs | Store bucket name as Arc<str> and return projection content as Bytes. |
| sdk/couchbase-core/tests/allocations.rs | Adjust expected allocation counts and tighten allocation assertion. |
| sdk/couchbase-core/tests/agent_ops.rs | Update one assertion to work with Bytes-backed get results. |
| sdk/couchbase-core/src/retry.rs | Lazily allocate retry reasons and adjust accessor/Display formatting. |
| sdk/couchbase-core/src/results/kv.rs | Replace KV result value buffers from Vec<u8> to Bytes. |
| sdk/couchbase-core/src/crudcomponent.rs | Avoid extra copies by propagating Bytes from responses; pass compressed data as borrowed slice. |
| sdk/couchbase-core/src/compressionmanager.rs | Change compressor API to return Cow<[u8]>, remove internal heap buffer, and add unit tests. |
Comments suppressed due to low confidence (1)
sdk/couchbase-core/tests/allocations.rs:239
- This assertion switched to
assert_eq!(exact match) but the failure message still says "Expected max {} allocations". Either update the message to reflect exact equality, or revert to the previous<= expected_allocssemantics if some variance is expected across environments/toolchains.
dhat::assert_eq!(
total_allocs,
expected_allocs,
"Expected max {} allocations, was {}",
expected_allocs,
total_allocs
);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for (vbid, (seqno, vbuuid)) in vbuckets { | ||
| let key = MutationStateKey { | ||
| bucket_name: bucket_name.clone(), | ||
| bucket_name: Arc::from(bucket_name.as_str()), | ||
| vbid: vbid.parse().map_err(de::Error::custom)?, |
There was a problem hiding this comment.
In MutationState deserialization, bucket_name: Arc::from(bucket_name.as_str()) is executed once per vBucket entry, which allocates and copies the bucket name repeatedly (defeating the goal of using Arc<str> to avoid clones). Consider converting the bucket_name: String to a single Arc<str> once per outer loop iteration (e.g., let bucket = Arc::<str>::from(bucket_name);) and then cloning the Arc for each MutationStateKey.
| datatype: DataTypeFlag, | ||
| input: &'a [u8], | ||
| ) -> error::Result<(&'a [u8], u8)> { | ||
| ) -> error::Result<(Cow<'a, [u8]>, u8)> { |
There was a problem hiding this comment.
Since you're considering leveraging Bytes elsewhere, why not use it here as well? One issue with using Cow when you use Bytes elsewhere is that you do not get the ability to maintain the same Bytes object through a function like this without resorting to some hackery.
There was a problem hiding this comment.
The reason for Cow here is that we return actually either a borrow or an owned value. In the case of Bytes i think we'd need to allocate in the borrow case.
| #[derive(Clone, Debug, Eq, PartialEq, Hash)] | ||
| pub struct GetResult { | ||
| pub value: Vec<u8>, | ||
| pub value: Bytes, |
There was a problem hiding this comment.
This would be a serious breaking change. I'm not sure you can change this without a v2 release?
There was a problem hiding this comment.
It's in core, which is basically labelled as "do not use" so should be very low risk.
| #[derive(Clone, Debug, Eq, PartialEq, Ord, PartialOrd, Hash)] | ||
| struct MutationStateKey { | ||
| bucket_name: String, | ||
| bucket_name: Arc<str>, |
There was a problem hiding this comment.
I believe this should be Arc still.
There was a problem hiding this comment.
It hid what I wrote >_<. Arc of String***
There was a problem hiding this comment.
What makes you say it should he string?
| let mut buckets: HashMap<String, SparseScanVectors> = HashMap::default(); | ||
| for (key, token) in value.tokens { | ||
| let bucket = buckets.entry(key.bucket_name.clone()).or_default(); | ||
| let bucket = buckets.entry(key.bucket_name.to_string()).or_default(); |
There was a problem hiding this comment.
Using to_string() here means you need to convert the key.bucket_name value from a &str to a String (which involves copying the data out). I think that the intention here is probably that we want to access the bucket using &str comparisons.
There was a problem hiding this comment.
I think we have to use String as we need to take ownership of the value
There was a problem hiding this comment.
While the HashMap itself has to be a String for storage purposes, when looking up values in the map you shouldn't need to construct a String just to do that, you should be able to pass a &str to avoid allocating.
There was a problem hiding this comment.
As it's calling entry it does require string. It needs ownership in case an insert is called against the entry (which it in fact is here) .
Changes: - Get result values use Bytes instead of Vec<u8>, eliminating a full document-body memcpy on every get/get_and_lock/get_and_touch/get_meta by propagating zero-copy Bytes from the codec through to the SDK layer - RetryRequest.retry_reasons is now Option<HashSet>, only allocating on the first retry attempt instead of every operation - Compressor.compress returns Cow<[u8]> instead of borrowing from self, removing heap state from StdCompressor entirely - MutationToken bucket_name uses Arc<str> instead of String, replacing a per-mutation String::clone with an atomic ref-count bump - Add unit tests for compression manager
|
| Branch | rscbc32-23590194426-156-1 |
| Testbed | ubuntu-latest |
Click to view all benchmark results
| Benchmark | Latency | Benchmark Result microseconds (µs) (Result Δ%) | Upper Boundary microseconds (µs) (Limit %) |
|---|---|---|---|
| query | 📈 view plot 🚷 view threshold | 439.87 µs(+4.40%)Baseline: 421.33 µs | 484.53 µs (90.78%) |
| upsert_and_get | 📈 view plot 🚷 view threshold | 409.02 µs(+1.94%)Baseline: 401.23 µs | 461.41 µs (88.65%) |
Changes: