Feat/marf compression #6654

jcnelson · 2025-11-05T05:07:14Z

This PR implements work discovered in #6593. Specifically, it makes the following changes to the on-disk representation of the MARF:

If a TriePtr is not a back-block pointer, then the back_block field is not stored since it's always 0.
If a node's children TriePtr list is sparse, then instead of storing TriePtr::default() for empty pointers, the system now stores a bitmap of which pointers are non-empty, and then only stores the non-empty TriePtrs. It only does this if this actually saves space over just storing the entire list (storing the entire list is cheaper if the list is nearly full).
Instead of copying a trie node from one trie to the next as part of a copy-on-write, the system will only store a patch from the old node to the new node. It will create a list of up to 16 patches across 16 tries before storing a full copy of the node.

In my (very small scale) benchmarks, this saves over 50% of space.

This PR is still a draft and will likely remain so for some time. It needs a lot more unit tests, and it would benefit significantly from fuzz and property testing against the current implementation. In addition, it will need a lot of performance tuning, since the act of reading a list of patch nodes will slow down reads.

…ue if it's 0, store a bitmap and non-empty trie ptrs if the list is sparse, and store node patches atop full nodes (and read them back)

…then record the original node from which it was copied so that a TrieNodePatch can be calculated and stored instead

…oid repeating nodes across tries

… each node and see if we can instead patch an existing node instead of storing a (mostly-unchanged) copy

jcnelson · 2025-11-05T05:08:08Z

stackslib/src/chainstate/stacks/index/marf.rs

+                .with_compression(true),
+            MARFOpenOpts::new(TrieHashCalculationMode::Immediate, "noop", true)
+                .with_compression(true),
+            */


TODO: revert this comment block

jcnelson · 2025-11-05T05:08:26Z

stackslib/src/chainstate/stacks/index/mod.rs

    + rusqlite::types::ToSql
    + rusqlite::types::FromSql
-    + stacks_common::codec::StacksMessageCodec
+    + crate::codec::StacksMessageCodec


TODO: use stacks_common not crate

jcnelson · 2025-11-05T05:08:36Z

stackslib/src/chainstate/stacks/index/mod.rs

 #[cfg(test)]
-use stacks_common::types::chainstate::BlockHeaderHash;
-use stacks_common::types::chainstate::{
+use crate::types::chainstate::BlockHeaderHash;


TODO: use stacks_common not crate

federico-stacks

I’ve provided a number of inline remarks, mostly nits, suggestions, and minor improvements.

Here are a few more in-depth points:

Bitmap flag (aka 0xff): Could we use a bit of the Node ID to indicate whether a bitmap is present? This could save one byte per node and rely on the node format described by the Node ID itself.
Compression at target: The current code correctly handles both compressed and uncompressed MARF scenarios. Once compression becomes the standard, could we plan on simplify the code to follow a single “linear” path, keeping only the compression-related logic?
Node ID as a type: Currently, TriePtr id is represented as u8 (not introduced by this PR). Since we frequently interact with it, checking flags and node types, it might be useful to introduce a dedicated type. This could encapsulate convenient behaviors and avoid repeated conversions and manual flag interpretation. (In that case, this change could be addressed in a separate PR)

stackslib/src/chainstate/stacks/index/bits.rs

federico-stacks · 2025-11-11T09:19:16Z

stackslib/src/chainstate/stacks/index/trie_sql.rs

 use crate::chainstate::stacks::index::storage::TrieStorageConnection;
 use crate::chainstate::stacks::index::{trie_sql, Error, MarfTrieId};
+use crate::types::chainstate::TrieHash;
+use crate::types::sqlite::NO_PARAMS;


In stackslib/src/chainstate/stacks/index/file.rs has been changed using use rusqlite::params;

Possibly it is better to use same import for both cases.

This would potentially touch a lot of lines. Let's safe refactoring until the end, once there's sufficient test coverage.

stackslib/src/chainstate/stacks/index/storage.rs

stackslib/src/chainstate/stacks/index/bits.rs

federico-stacks · 2025-11-18T09:13:14Z

stackslib/src/chainstate/stacks/index/bits.rs

+    let ptrs_start_disk_ptr = r
+        .seek(SeekFrom::Current(0))
+        .inspect_err(|e| error!("Failed to ftell the read handle: {e:?}"))?;


Could we use the more idiomatic stream_position() here?

Suggested change

let ptrs_start_disk_ptr = r

.seek(SeekFrom::Current(0))

.inspect_err(|e| error!("Failed to ftell the read handle: {e:?}"))?;

let ptrs_start_disk_ptr = r

.stream_position()

.inspect_err(|e| error!("Failed to ftell the read handle: {e:?}"))?;

federico-stacks · 2025-11-18T09:17:26Z

stackslib/src/chainstate/stacks/index/storage.rs

+        assert_eq!(node_data_order.len(), offsets.len());
+
+        // write parent block ptr
+        f.seek(SeekFrom::Start(0))?;


Could we use the idiomatic rewind here?

Suggested change

f.seek(SeekFrom::Start(0))?;

f.rewind()?;

I also noticed two other occurrences of this pattern in storage.rs that aren’t touched by this PR. We might want to update those as well

federico-stacks · 2025-11-18T09:26:22Z

stackslib/src/chainstate/stacks/index/storage.rs

+            let buffer = if self.compress {
+                let mut compressed_buffer = Cursor::new(Vec::new());
+                trie_ram.dump_compressed(self, &mut compressed_buffer, &bhh)?;
+                compressed_buffer.into_inner()
+            } else {
+                let mut buffer = Cursor::new(Vec::new());
+                trie_ram.dump(self, &mut buffer, &bhh)?;
+                buffer.into_inner()
+            };


There’s some minor duplication here. Maybe we could let the conditional select the appropriate method (dump_compressed vs. dump) and keep the rest of the logic shared?

federico-stacks · 2025-11-18T16:56:20Z

stackslib/src/chainstate/stacks/index/storage.rs

+
+        for (ix, indirect) in node_data_order.iter().enumerate() {
+            if let Some((hash_bytes, patch)) = indirect.hash_and_patch() {
+                let f_pos_before = f.seek(SeekFrom::Current(0))?;


Could we use the more idiomatic stream_position() here?

federico-stacks · 2025-11-18T16:56:34Z

stackslib/src/chainstate/stacks/index/storage.rs

+                    Error::CorruptionError(format!("Failed to serialize patch: {e:?}"))
+                })?;
+
+                let f_pos_after = f.seek(SeekFrom::Current(0))?;


Could we use the more idiomatic stream_position() here?

jcnelson · 2025-11-21T19:21:13Z

Bitmap flag (aka 0xff): Could we use a bit of the Node ID to indicate whether a bitmap is present? This could save one byte per node and rely on the node format described by the Node ID itself.

The need for the sparse TriePtr bitmap flag is to ensure that this byte cannot be interpreted as a TrieNodeID, so I don't think so.

Compression at target: The current code correctly handles both compressed and uncompressed MARF scenarios. Once compression becomes the standard, could we plan on simplify the code to follow a single “linear” path, keeping only the compression-related logic?

I don't think so. We have no way to compel existing node operators to re-compress their chainstate, and they could continue to use uncompressed chainstate even across hard forks.

Node ID as a type: Currently, TriePtr id is represented as u8 (not introduced by this PR). Since we frequently interact with it, checking flags and node types, it might be useful to introduce a dedicated type. This could encapsulate convenient behaviors and avoid repeated conversions and manual flag interpretation. (In that case, this change could be addressed in a separate PR)

That's fine in principle, but that kind of refactoring would change potentially many lines. Let's first make sure the functional test coverage is sufficiently adequate that we feel comfortable shipping this, and then we can worry about the refactoring in a follow-up PR.

…ead due to a mismatch between cur_block and cur_block_id borne out of retargeting a trie

jcnelson · 2025-12-03T02:31:07Z

Alright, all CI tests pass @federico-stacks

federico-stacks · 2025-12-03T10:48:59Z

stackslib/src/chainstate/stacks/index/storage.rs

                    if real_bhh != &bhh {
                        // note: this was moved from the block_retarget function
                        //  to avoid stepping on the borrow checker.
-                        debug!("Retarget block {} to {}", bhh, real_bhh);
+                        debug!(
+                            "Retarget block {} to {}. Current block ID is {:?}",
+                            bhh, real_bhh, &self.data.cur_block_id
+                        );
                        // switch over state
                        self.data.retarget_block(real_bhh.clone());
                    }
-                    self.with_trie_blobs(|db, blobs| match blobs {
+                    let new_block_id = self.with_trie_blobs(|db, blobs| match blobs {
                        Some(blobs) => blobs.store_trie_blob(db, real_bhh, &buffer),
                        None => {
                            test_debug!("Stored trie blob {} to db", real_bhh);
                            trie_sql::write_trie_blob(db, real_bhh, &buffer)
                        }
-                    })?
+                    })?;
+                    self.data.set_block(real_bhh.clone(), Some(new_block_id));
+                    new_block_id


With this change, where all block data are always set at once via set_block, it seems to supersede the retarget management, so that logic could likely be removed. I wrote some local unit tests to validate this behavior, and also checked the behaviour running integration tests

federico-stacks · 2025-12-03T14:43:47Z

stackslib/src/chainstate/stacks/index/storage.rs

            self.unconfirmed()
        );

+        let (saved_block_hash, saved_block_id) = self.get_cur_block_and_id();


Managing the block restore appears unnecessary, since none of the downstream code (including the invoked methods) modifies the currently opened block.

It’s harmless to keep the restore in place, but it doesn’t have any observable effect, so we could safely remove it to simplify the logic.

I also ran the consensus tests and some integration tests without the block-restore logic, and they all passed.

jcnelson added 16 commits November 4, 2025 23:02

feat: codec for [u8; 4]

0a3fa71

feat: compress trie ptrs in multiple ways: don't store back_block val…

b49e29c

…ue if it's 0, store a bitmap and non-empty trie ptrs if the list is sparse, and store node patches atop full nodes (and read them back)

chore: add method documentation

c6929bf

fix: remove redundant unit tests

e2bbdaa

chore: debug read I/O failures

bcad6ed

feat: if a node is copied to a new trie as part of MARF::walk_cow(), …

045dc0a

…then record the original node from which it was copied so that a TrieNodePatch can be calculated and stored instead

chore: new error variants, including Error::Patch(..)

f6e7014

chore: clean up formatting and unused variables

4dfce7e

chore: remove unused variable

a112f52

feat: use patch nodes in place of complete node copies in order to av…

e5d6ec5

…oid repeating nodes across tries

feat: when storing a TrieRAM to disk, look at .cowptr and .patches in…

1d492bd

… each node and see if we can instead patch an existing node instead of storing a (mostly-unchanged) copy

feat: light unit tests for compression

adad821

fix: typo in comment

00c6c5b

chore: better debugging in proof tests

295d227

chore: return block hash from which a node was read

83831c7

chore: better rusqlite conventions

ab778c0

jcnelson commented Nov 5, 2025

View reviewed changes

jcnelson and others added 2 commits November 5, 2025 10:31

chore: address clippy warnings

4420530

Merge branch 'develop' into feat/marf-compression

eecc737

fdefelici requested a review from federico-stacks November 10, 2025 10:43

federico-stacks reviewed Nov 18, 2025

View reviewed changes

jcnelson and others added 3 commits December 2, 2025 13:31

chore: fix consenus test bug in which the wrong trie root would get r…

73a2a9d

…ead due to a mismatch between cur_block and cur_block_id borne out of retargeting a trie

Merge branch 'develop' into feat/marf-compression

dc944e3

Merge branch 'develop' into feat/marf-compression

ed3475a

federico-stacks reviewed Dec 3, 2025

View reviewed changes

crc: addressing nits, stacks-network#6593

756d79e

Feat/marf compression #6654

Are you sure you want to change the base?

Feat/marf compression #6654

Uh oh!

Conversation

jcnelson commented Nov 5, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

federico-stacks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jcnelson commented Nov 21, 2025

Uh oh!

jcnelson commented Dec 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants