Optimize ScyllaDB usage #3985

ndr-ds · 2025-05-22T03:22:33Z

Motivation

There are some optimization that we can do in how we're using ScyllaDB

Proposal

This PR removes the use of:

Non token aware queries
Unpaged queries
Most usages of ALLOW FILTERING (the ones remaining should be fine)
Non prepared statements

Also:

Tuned page creation and other settings a bit
Some refactoring/code cleanup

Test Plan

CI + deployed a network, benchmarked it, and saw an over 10x decrease in ScyllaDB`s write latency

Release Plan

Nothing to do / These changes follow the usual release cycle.

ndr-ds · 2025-05-22T03:22:47Z

Optimize ScyllaDB usage #3985 : 3 dependent PRs (#4015 , #4021 , #4047 ) 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

ma2bd · 2025-05-26T17:49:40Z

linera-views/src/backends/scylla_db.rs

 }

 impl ScyllaDbClient {
+    async fn get_multi_stmt(
+        &self,
+        selects_value: bool,


nit: Can we remove this parameter and have two separate methods get_multi_key_values_statement and get_multi_keys_statement instead?

ma2bd · 2025-05-26T17:50:13Z

linera-views/src/backends/scylla_db.rs

    find_keys_by_prefix_unbounded: PreparedStatement,
    find_keys_by_prefix_bounded: PreparedStatement,
    find_key_values_by_prefix_unbounded: PreparedStatement,
    find_key_values_by_prefix_bounded: PreparedStatement,
+    multi_kv_ps: DashMap<usize, PreparedStatement>,


multi_key_values and multi_keys

How large are PreparedStatement in memory by the way?

Should be pretty small, it's just some metadata, not sure it justifies an LRU cache or something here tbh 🤔

linera-views/src/backends/scylla_db.rs

ma2bd · 2025-05-26T17:55:46Z

linera-views/src/backends/scylla_db.rs

        // The schema appears too complicated for non-trivial reasons.
        // See TODO(#1069).


#1069 is closed. Let's remove this mysterious comment

linera-views/src/backends/scylla_db.rs

ma2bd · 2025-05-26T18:02:23Z

linera-views/src/backends/scylla_db.rs

+    let default_profile = ExecutionProfile::builder()
+        .load_balancing_policy(lbp)
+        .retry_policy(Arc::new(DefaultRetryPolicy::new()))
+        .consistency(Consistency::LocalOne)


This consistency level may be insufficient in the future for dynamic shard migrations.

And more importantly, it could also be a problem for blob storage and certificates. Because we occasionally store a blob, then notify another worker (or indirectly the proxy) to do something with it.

I see 🤔 makes sense. Let me see how it performs with LOCAL_QUORUM instead. Maybe after we redo the partitioning it'll be good enough

Performance hit is pretty substantial btw, but hopefully we can make up for it by optimizing stuff after partitioning the data more

afck

(Nit: The PR description suggests it removes the use of refactoring and tuned page creation.)

afck · 2025-05-27T08:36:46Z

linera-views/src/backends/scylla_db.rs

-    async fn read_multi_values_internal(
-        &self,
-        root_key: &[u8],
+    fn get_occurences_map(


Suggested change

fn get_occurences_map(

fn get_occurrences_map(

afck · 2025-05-27T08:39:18Z

linera-views/src/backends/scylla_db.rs

+        keys: Vec<Vec<u8>>,
+    ) -> Result<Vec<Option<Vec<u8>>>, ScyllaDbStoreInternalError> {
+        let mut values = vec![None; keys.len()];
+        let map = Self::get_occurences_map(keys)?;


Suggested change

let map = Self::get_occurences_map(keys)?;

let map = Self::get_occurrences_map(keys)?;

afck · 2025-05-27T08:40:27Z

linera-views/src/backends/scylla_db.rs

        while let Some(row) = rows.next().await {
            let (key,) = row?;
-            for i_key in map.get(&key).unwrap().clone() {
-                *values.get_mut(i_key).expect("an entry in values") = true;
+            for i_key in map.get(&key).expect("key is supposed to be in map") {


(Or even just map[&key]?)

afck · 2025-05-27T08:42:45Z

linera-views/src/backends/scylla_db.rs

-        let query3 = &self.write_batch_deletion;
+        try_join_all(futures).await?;
+
+        let mut futures: Vec<BoxFuture<'_, Result<(), ScyllaDbStoreInternalError>>> = Vec::new();


I guess the deletions of individual keys could even happen concurrently with the prefix deletions?

afck · 2025-05-27T08:45:26Z

linera-views/src/backends/scylla_db.rs

                'class' : 'NetworkTopologyStrategy', \
                'replication_factor' : {} \
            }}",


Suggested change

'class' : 'NetworkTopologyStrategy', \

'replication_factor' : {} \

}}",

'class' : 'NetworkTopologyStrategy', \

'replication_factor' : {} \

}}",

Good catch, missed that 😅

afck · 2025-05-27T08:46:38Z

linera-views/src/backends/scylla_db.rs

+                root_key blob, \
+                k blob, \
+                v blob, \
+                PRIMARY KEY (root_key, k) \
+            ) \
+            WITH compaction = {{ \
+                'class'            : 'SizeTieredCompactionStrategy', \
+                'min_sstable_size' : 268435456, \
+                'bucket_low'       : 0.5, \
+                'bucket_high'      : 1.5, \
+                'min_threshold'    : 4, \
+                'max_threshold'    : 32 \
+            }} \
+            AND compression = {{ 'sstable_compression': 'LZ4Compressor', 'chunk_length_kb':'8' }} \
+            AND caching = {{ 'enabled': 'true' }}",


Suggested change

root_key blob, \

k blob, \

v blob, \

PRIMARY KEY (root_key, k) \

) \

WITH compaction = {{ \

'class' : 'SizeTieredCompactionStrategy', \

'min_sstable_size' : 268435456, \

'bucket_low' : 0.5, \

'bucket_high' : 1.5, \

'min_threshold' : 4, \

'max_threshold' : 32 \

}} \

AND compression = {{ 'sstable_compression': 'LZ4Compressor', 'chunk_length_kb':'8' }} \

AND caching = {{ 'enabled': 'true' }}",

root_key blob, \

k blob, \

v blob, \

PRIMARY KEY (root_key, k) \

) \

WITH compaction = {{ \

'class' : 'SizeTieredCompactionStrategy', \

'min_sstable_size' : 268435456, \

'bucket_low' : 0.5, \

'bucket_high' : 1.5, \

'min_threshold' : 4, \

'max_threshold' : 32 \

}} \

AND compression = {{ 'sstable_compression': 'LZ4Compressor', 'chunk_length_kb':'8' }} \

AND caching = {{ 'enabled': 'true' }}",

MathieuDutSik

There are some changes I can approve and some I cannot.
By this I mean the use of the futures for the writing of batches.

MathieuDutSik · 2025-05-30T17:02:08Z

linera-views/src/backends/scylla_db.rs

@@ -51,15 +57,15 @@ use crate::{
 /// The limit is in reality 100. But we need one entry for the root key.
 const MAX_MULTI_KEYS: usize = 99;


Maybe put 100 - 1 here in spirit to other entries of the file.

MathieuDutSik · 2025-05-30T17:07:35Z

linera-views/src/backends/scylla_db.rs

+        if let Some(entry) = self.multi_key_values.get(&num_markers) {
+            return Ok(entry.clone());
+        }
+        let markers = std::iter::repeat_n("?", num_markers)
+            .collect::<Vec<_>>()
+            .join(",");
+        let query = format!(
+            "SELECT k,v FROM kv.{} WHERE root_key = ? AND k IN ({})",
+            self.namespace, markers
+        );
+        let prepared_statement = self.session.prepare(query).await?;
+        self.multi_key_values
+            .insert(num_markers, prepared_statement.clone());


A small problem of this approach is that we have two queries:

The first multi_key_values.get.

The second with multi_key_values.insert.

This is slightly suboptimal.

For BTreeMap we can get an entry and then later fill it up. This seems possible with DashMap as well: https://docs.rs/dashmap/6.1.0/dashmap/mapref/entry/enum.Entry.html

MathieuDutSik · 2025-05-30T17:23:08Z

linera-views/src/backends/scylla_db.rs

-    write_batch_delete_prefix_unbounded: BatchStatement,
-    write_batch_delete_prefix_bounded: BatchStatement,
-    write_batch_deletion: BatchStatement,
-    write_batch_insertion: BatchStatement,
+    write_batch_delete_prefix_unbounded: PreparedStatement,
+    write_batch_delete_prefix_bounded: PreparedStatement,
+    write_batch_deletion: PreparedStatement,
+    write_batch_insertion: PreparedStatement,


The switch from BatchStatement to PrepareStatement is done by the removal of the .into()?

Nice find.

MathieuDutSik · 2025-05-30T17:24:10Z

linera-views/src/backends/scylla_db.rs

-        let query = format!(
-            "SELECT v FROM kv.{} WHERE root_key = ? AND k = ? ALLOW FILTERING",
-            namespace
-        );
-        let read_value = session.prepare(query).await?;
+        let read_value = session
+            .prepare(format!(
+                "SELECT v FROM kv.{} WHERE root_key = ? AND k = ?",
+                namespace
+            ))
+            .await?;


I prefer the old style where we build first the query and the read_value, but that is only stylistic.

MathieuDutSik · 2025-05-30T17:25:33Z

linera-views/src/backends/scylla_db.rs

-    async fn read_multi_values_internal(
-        &self,
-        root_key: &[u8],
+    fn get_occurrences_map(


MathieuDutSik · 2025-05-30T17:32:07Z

linera-views/src/backends/scylla_db.rs

-            for i_key in map.get(&key).unwrap().clone() {
-                *values.get_mut(i_key).expect("an entry in values") = true;
+            for i_key in &map[&key] {
+                values[*i_key] = true;


Interesting rewrite.

MathieuDutSik · 2025-05-30T17:37:54Z

linera-views/src/backends/scylla_db.rs

        }
-        let query4 = &self.write_batch_insertion;
+
+        try_join_all(futures).await?;


I have several issues with the design using the try_join_all:

Is it actually faster to use many futures instead of a single batch? I think no.

What we want is for all the transactions to pass, or none at all. That is what the batch should allow doing.

So, I am not sure that this rewrite is a good one.

Yeah, you're right, this one is wrong. I wrote this before I realized our need for atomicity in these operations. In the PR altering the partitions we'll either need to use LOGGED batches (and take a huge performance hit), or make sure that we make it clear that we guarantee atomicity only within the same partition, and group the batches by partition key.
Anyways, I'll mostly revert this part for now, thanks for the feedback!

I ended up just writing the batching properly here, seemed to make more sense than in the next PR

MathieuDutSik · 2025-05-30T17:38:39Z

linera-views/src/backends/scylla_db.rs

+            let prepared_statement = &self.write_batch_insertion;
+            let values = (root_key.clone(), key, value);
+            futures.push(Box::pin(async move {
+                session
+                    .execute_single_page(prepared_statement, values, PagingState::start())
+                    .await
+                    .map(|_| ())
+                    .map_err(Into::into)
+            }));


So, n keys being inserted, lead to n futures? How could that work well?

We were trading extra network calls overhead with the fact that we were sending prepared statements to the correct shards directly (which is more performant), but this approach overall will be rewritten, as I mentioned in the other comment, as we lose the atomicity of batches

MathieuDutSik · 2025-05-30T17:44:45Z

linera-views/src/backends/scylla_db.rs

+async fn build_session(uri: &str) -> Result<Session, ScyllaDbStoreInternalError> {
+    let policy = DefaultPolicy::builder().token_aware(true).build();
+    let default_profile = ExecutionProfile::builder()
+        .load_balancing_policy(policy)
+        .retry_policy(Arc::new(DefaultRetryPolicy::new()))
+        .consistency(Consistency::LocalQuorum)
+        .build();
+    let handle = default_profile.into_handle();
+    SessionBuilder::new()
+        .known_node(uri)
+        .default_execution_profile_handle(handle)
+        .compression(Some(Compression::Lz4))
+        .build()
+        .boxed()
+        .await
+        .map_err(Into::into)
+}


There are a number of choices being made here, and so it would be nice to have some documentation.

MathieuDutSik · 2025-05-30T17:48:17Z

linera-views/src/backends/scylla_db.rs

+                "CREATE TABLE kv.{} (\
+                    root_key blob, \
+                    k blob, \
+                    v blob, \
+                    PRIMARY KEY (root_key, k) \
+                ) \
+                WITH compaction = {{ \
+                    'class'            : 'SizeTieredCompactionStrategy', \
+                    'min_sstable_size' : 52428800, \
+                    'bucket_low'       : 0.8, \
+                    'bucket_high'      : 1.25, \
+                    'min_threshold'    : 4, \
+                    'max_threshold'    : 8 \
+                }} \
+                AND compression = {{ \
+                    'sstable_compression': 'LZ4Compressor', \
+                    'chunk_length_in_kb':'4' \
+                }} \
+                AND caching = {{ \
+                    'enabled': 'true' \
+                }}",


Some documentation of this would be nice as well. Why are those parameters chosen that way?

ma2bd · 2025-05-31T15:53:17Z

linera-views/src/backends/scylla_db.rs

+    // Returns a batch query with a sticky shard policy, that always tries to route to the same
+    // ScyllaDB shard.
+    // Should be used only on batches where all statements are to the same partition key.
+    async fn get_sticky_batch_query(


What's happening here? Do we need this? This looks pretty non-trivial and the PR summary doesn't mention it.

ma2bd

Explain why default load balancing policies are not enough (if that's the case)
https://java-driver.docs.scylladb.com/stable/manual/core/load_balancing/#built-in-policies

ndr-ds · 2025-06-02T13:50:22Z

Talked offline and will do the write batch rewrite in a different PR

ndr-ds · 2025-06-02T23:21:59Z

Merge activity

Jun 2, 11:21 PM UTC: A user started a stack merge that includes this pull request via Graphite.
Jun 2, 11:22 PM UTC: @ndr-ds merged this pull request with Graphite.

ndr-ds force-pushed the 05-22-optimize_scylladb_usage branch 8 times, most recently from 903558c to 6ceaa35 Compare May 26, 2025 17:26

ma2bd reviewed May 26, 2025

View reviewed changes

ndr-ds requested review from afck, bart-linera, christos-h, deuszx, ma2bd, MathieuDutSik and Twey May 26, 2025 18:13

ndr-ds force-pushed the 05-22-optimize_scylladb_usage branch 2 times, most recently from 218d2ee to 8493ce6 Compare May 26, 2025 19:40

ndr-ds marked this pull request as ready for review May 26, 2025 19:59

This was referenced May 27, 2025

Add type_id as primary key to ScyllaDb schema #4015

Closed

Add bucket_id to ScyllaDb's primary key #4016

Closed

ndr-ds force-pushed the 05-22-optimize_scylladb_usage branch from 8493ce6 to d681ce9 Compare May 27, 2025 03:32

afck reviewed May 27, 2025

View reviewed changes

ndr-ds force-pushed the 05-22-optimize_scylladb_usage branch from d681ce9 to 2fccd92 Compare May 27, 2025 18:05

ndr-ds mentioned this pull request May 27, 2025

Add key partition prefix to ScyllaDB schema #4021

Closed

ndr-ds force-pushed the 05-22-optimize_scylladb_usage branch from 2fccd92 to 8702072 Compare May 29, 2025 18:12

ndr-ds requested a review from afck May 29, 2025 18:12

ndr-ds mentioned this pull request May 29, 2025

Rename clone_with_root_key #4036

Merged

MathieuDutSik reviewed May 30, 2025

View reviewed changes

ndr-ds force-pushed the 05-22-optimize_scylladb_usage branch 2 times, most recently from d91638d to e8c62d5 Compare May 31, 2025 02:47

ndr-ds requested a review from MathieuDutSik May 31, 2025 02:48

ndr-ds force-pushed the 05-22-optimize_scylladb_usage branch 5 times, most recently from ce533ed to ef194de Compare May 31, 2025 14:36

ma2bd reviewed May 31, 2025

View reviewed changes

ma2bd requested changes May 31, 2025

View reviewed changes

ndr-ds force-pushed the 05-22-optimize_scylladb_usage branch from ef194de to 4acb372 Compare June 2, 2025 14:16

ndr-ds requested a review from ma2bd June 2, 2025 15:10

ndr-ds mentioned this pull request Jun 2, 2025

Optimize ScyllaDB's batch writes #4047

Draft

Optimize ScyllaDB usage

6106db7

ndr-ds force-pushed the 05-22-optimize_scylladb_usage branch from 4acb372 to 6106db7 Compare June 2, 2025 19:05

ndr-ds mentioned this pull request Jun 2, 2025

New ScyllaDB key space partitioning #4049

Draft

ma2bd approved these changes Jun 2, 2025

View reviewed changes

ndr-ds merged commit af51607 into main Jun 2, 2025
27 checks passed

ndr-ds deleted the 05-22-optimize_scylladb_usage branch June 2, 2025 23:22

		// The schema appears too complicated for non-trivial reasons.
		// See TODO(#1069).

	let map = Self::get_occurences_map(keys)?;
	let map = Self::get_occurrences_map(keys)?;

		@@ -51,15 +57,15 @@ use crate::{
		/// The limit is in reality 100. But we need one entry for the root key.
		const MAX_MULTI_KEYS: usize = 99;

Optimize ScyllaDB usage #3985

Optimize ScyllaDB usage #3985

Uh oh!

Conversation

ndr-ds commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Proposal

Test Plan

Release Plan

Uh oh!

ndr-ds commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ma2bd May 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

afck left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MathieuDutSik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ndr-ds May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ndr-ds May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

ndr-ds commented May 22, 2025 •

edited

Loading

ndr-ds commented May 22, 2025 •

edited

Loading

ma2bd May 26, 2025 •

edited

Loading

ndr-ds May 30, 2025 •

edited

Loading

ndr-ds May 30, 2025 •

edited

Loading

ndr-ds commented Jun 2, 2025 •

edited

Loading