feat: make SyncNotes return multiple blocks#1843
Conversation
6961007 to
6e0dd69
Compare
bobbinth
left a comment
There was a problem hiding this comment.
Looks good! Thank you! I left a few small comments inline.
| /// Estimated byte size of a single [`NoteSyncRecord`]. | ||
| /// | ||
| /// Note ID (~38 bytes) + index + metadata (~26 bytes) + sparse merkle path with 16 | ||
| /// siblings (~608 bytes). | ||
| const NOTE_RECORD_BYTES: usize = 700; |
There was a problem hiding this comment.
This is fine for now, but most likely will be a significant overestimate because sparse Merkle paths get compressed, and in most cases shouldn't be more than a couple hundred bytes. But the compression depends on how many paths there are (the more paths, the worse the compression) - so, taking the worst case is fine for now.
igamigo
left a comment
There was a problem hiding this comment.
Looks great! I think there are a couple of edge cases that might need addressing though.
proto/proto/rpc.proto
Outdated
| // `block_to` matches the request's `block_range.block_to`, or the chain tip if it was | ||
| // not specified. |
There was a problem hiding this comment.
nit: This is not always true, right? The text below mentions that the block_to may be smaller than what the user requested (and the chain tip as well)
There was a problem hiding this comment.
I removed the text below; as mentioned in this discussion the block_to will always return either the requested block or the chain tip.
crates/store/src/state/sync_state.rs
Outdated
| /// - `note_tags`: The tags the client is interested in, resulting notes are restricted to the | ||
| /// first block containing a matching note. | ||
| /// - `block_range`: The range of blocks from which to synchronize notes. | ||
| /// Returns as many blocks with matching notes as fit within the response payload |
There was a problem hiding this comment.
nit: You can reference the max payload size constant here
crates/store/src/state/sync_state.rs
Outdated
| block_range: RangeInclusive<BlockNumber>, | ||
| ) -> Result<(NoteSyncUpdate, MmrProof, BlockNumber), NoteSyncError> { | ||
| ) -> Result<Vec<(NoteSyncUpdate, MmrProof)>, NoteSyncError> { | ||
| let inner = self.inner.read().await; |
There was a problem hiding this comment.
I think we want to take the lock in the loop to avoid keeping it for the whole duration
There was a problem hiding this comment.
(and it can be moved right next to the open_at() call)
proto/proto/rpc.proto
Outdated
| // Merkle path to verify the block's inclusion in the MMR at the returned | ||
| // `block_range.block_to`. | ||
| // | ||
| // An MMR proof can be constructed for the leaf of index `block_header.block_num` of | ||
| // an MMR of forest `block_range.block_to` with this path. | ||
| primitives.MerklePath mmr_path = 2; |
There was a problem hiding this comment.
Is the comment on line 480-481 correct here? I thought the MMR path is done against the requested block_range.block_to rather than the returned one.
crates/store/src/server/rpc_api.rs
Outdated
| let block_from = block_range.start(); | ||
| // Clamp block_to to the chain tip to avoid erroring when opening the MMR proof | ||
| let block_to = block_range.end().min(&chain_tip); | ||
| let clamped_block_range = *block_from..=*block_to; |
There was a problem hiding this comment.
I think things like this should be enforced in BlockRange::into_inclusive_range(). But also, I'm not sure we want to clamp here. Maybe returning an error is a better approach?
crates/store/src/server/rpc_api.rs
Outdated
| block_num: last_block_included.as_u32(), | ||
| block_range: Some(proto::rpc::BlockRange { | ||
| block_from: block_from.as_u32(), | ||
| block_to: Some(block_to.as_u32()), |
There was a problem hiding this comment.
block_to here is probably not what we want, as it is just the same number that the user might have requested. Instead, what we need (AFAIK) is to return the largest block num in the results (or have it returned separately on sync_notes). Or, if no valid sync notes response was found, then we reached the user's block_to
There was a problem hiding this comment.
I changed this so the block_to returned marks the last block included in the response. This makes sense for the proposed flow where the client calls SyncChainMmr and then uses that chain tip as parameter for the sync notes. But if a user made a sync notes request without specifying the chain tip, it would have no way of finding out which is the chain tip used for the MMR paths included in the response. It might not be the last included block if the response is truncated.
There was a problem hiding this comment.
A simple solution would be to add a chain_tip field to the response
Edit: actually, using PaginationInfo would be a better fit.
There was a problem hiding this comment.
Agree, that should work well
crates/store/src/state/sync_state.rs
Outdated
| // Use block_end + 1 as the MMR checkpoint so that block_end itself can be proven. | ||
| let mmr_checkpoint = block_end + 1; |
There was a problem hiding this comment.
nit: I'd move this down closer to the open_at call
proto/proto/rpc.proto
Outdated
| // If `response.block_range.block_to` is less than the requested range end, make another | ||
| // request starting from `response.block_range.block_to + 1` to continue syncing. |
There was a problem hiding this comment.
I think the underlying query is committed_at > block_from, so I think prompting for + 1 is not correct here.
bobbinth
left a comment
There was a problem hiding this comment.
Looks good! Thank you. I left some more comments inline - mostly about making sure we treat block ranges consistently.
| pub(crate) fn select_notes_since_block_by_tag_and_sender( | ||
| conn: &mut SqliteConnection, | ||
| account_ids: &[AccountId], | ||
| note_tags: &[u32], | ||
| block_range: RangeInclusive<BlockNumber>, | ||
| ) -> Result<(Vec<NoteSyncRecord>, BlockNumber), DatabaseError> { | ||
| ) -> Result<Vec<NoteSyncRecord>, DatabaseError> { |
There was a problem hiding this comment.
Not for this PR, but it seems like this function is never invoked with account_ids supplied. If so, we should remove the account_ids parameter and simplify the associated SQL. Let's create an issue for this.
| if !results.is_empty() && accumulated_size > MAX_RESPONSE_PAYLOAD_BYTES { | ||
| break; | ||
| } |
There was a problem hiding this comment.
Why do we have !results.is_empty() as a condition here? Basically, if the result is empty, why do we need to fall through to getting the MMR proof?
There was a problem hiding this comment.
This was added after this dicussion #1843 (comment) to guarantee that at least one update is included in the response. If the db.get_note_sync returns None we do break .
There was a problem hiding this comment.
I see. Basically, the idea here is that if somehow accumulated_size is over the limit on the first iteration (i.e., when esults.is_empty()) we still want to send the response. Practically, this shouldn't happen (unless we are off in accumulated_size computations), but it's better to be on the safe side.
igamigo
left a comment
There was a problem hiding this comment.
LGTM! Left some minor, non-blocking comments
| #[instrument(level = "debug", target = COMPONENT, skip_all, ret(level = "debug"), err)] | ||
| pub async fn sync_notes( | ||
| &self, | ||
| note_tags: Vec<u32>, | ||
| block_range: RangeInclusive<BlockNumber>, | ||
| ) -> Result<(NoteSyncUpdate, MmrProof, BlockNumber), NoteSyncError> { | ||
| let inner = self.inner.read().await; | ||
|
|
||
| let (note_sync, last_included_block) = | ||
| self.db.get_note_sync(block_range, note_tags).await?; | ||
| ) -> Result<(Vec<(NoteSyncUpdate, MmrProof)>, BlockNumber), NoteSyncError> { |
There was a problem hiding this comment.
With some of these changes we might need to update the store's README.md
| /// -- filter the block's notes and return only the ones matching the requested tags or senders | ||
| /// (tag IN (?1) OR sender IN (?2)) | ||
| /// ``` | ||
| pub(crate) fn select_notes_since_block_by_tag_and_sender( |
There was a problem hiding this comment.
Could this be modified to return multiple responses at once? I think this was the original suggestion (although we should do it in a separate issue/PR)
| // if results is empty, return `block_end` since the sync is complete. | ||
| let last_block_checked = | ||
| results.last().map_or(block_end, |(update, _)| update.block_header.block_num()); |
There was a problem hiding this comment.
There is a small optimization here where we return the block number we dropped, minus one. For example, if we match against blocks 1, 5, 10 and 15 but block 15 does not make it into the response, instead of returning 10 as the last included block, we return 14 (so the next query avoids blocks 10-14 because the user will start at 15 already). I doubt this is significant due to how the tables are probably indexed, so I would not bother doing it unless it's really trivial, but maybe it can be done in a follow up PR or issue
bobbinth
left a comment
There was a problem hiding this comment.
All looks good! Thank you!
Closes #1809
This PR updates the RPC component to batch multiple blocks into a single
SyncNotesresponse.Changes
sync_noteshandler now loops over store calls, accumulating blocks until the 4MB budget is exceeded or the range is exhaustedSyncNotesResponseproto from single-block (header + mmr_path + notes) to multi-block (repeatedNoteSyncBlockblocks +BlockRange). AddedStoreSyncNotesResponsefor the store-internal single-block response, usingBlockRangeinstead ofPaginationInfo.sync_notesusesopen_at(block_num, checkpoint)instead ofopen(block_num)so MMR proofs are relative toblock_toFollow ups: