Skip to content

Conversation

damiramirez
Copy link
Contributor

@damiramirez damiramirez commented Aug 20, 2025

This PR adds an Index that allows storing only new or modified nodes while referencing unchanged nodes by their file offsets.

To use this, we first insert nodes into our InMemoryTrieDB. Once we finish inserting, we can commit the root node to the DB. After that, we can commit the trie to transform the saved nodes into NodeRef::Hash. In the next iteration, when inserting nodes, new nodes will be NodeRef::Node, while unchanged nodes will be NodeRef::Hash. We use these two types to determine whether we need to serialize a node or simply reference an existing one in the DB.

  • New Index Module (src/index.rs)

  • Refactor in serialization.rs

    • Before: Serialized entire tree every commit
    • Now: Check the index before serializing any node.
      • If it's a NodeRef::Hash, use the index to refer to the existing node offset.
      • If it's a NodeRef::Node, serialize it and save it in the DB.
  • New format in db (file):

    [header: 8 bytes]      -> Latest root offset
    [prev_root: 8 bytes]   -> Link to previous commit
    [root_node_data]       -> Root (may reference old nodes)
    [new_child_nodes...]   -> Only NEW/modified nodes
    
  • Update db_benchmark.rs. (Currently, I removed the old code and added new logic, but maybe we can keep both.)

    • Run make bench

      FINAL COMPARISON
      =================
      Scale     EthrexDB Write    LibMDBX Write    EthrexDB Read    LibMDBX Read    Keys Read
      ------    -------------    -------------    -------------    ------------    ---------
      10k                87ms             57ms              0ms            13ms         1000
      100k             1176ms           1426ms              1ms            18ms         1000
      500k             8882ms          18173ms              6ms           110ms         5000
      1M              20949ms          51326ms             13ms           239ms        10000
      

      Closes Deduplicated multiversion support #8

Base automatically changed from feat/db to main August 21, 2025 21:16
@damiramirez damiramirez self-assigned this Aug 25, 2025
@damiramirez damiramirez marked this pull request as ready for review August 25, 2025 14:30
Copy link

Lines of code report

Total lines added: 225
Total lines removed: 0
Total lines changed: 225

Detailed view
+------------------+-------+------+
| File             | Lines | Diff |
+------------------+-------+------+
| db.rs            | 379   | +123 |
+------------------+-------+------+
| file_manager.rs  | 119   | +7   |
+------------------+-------+------+
| index.rs         | 70    | +70  |
+------------------+-------+------+
| lib.rs           | 7     | +1   |
+------------------+-------+------+
| serialization.rs | 607   | +24  |
+------------------+-------+------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deduplicated multiversion support
2 participants