Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpt: Added delayed pruning for state, due to issue #3828 #3829

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

VladChernenko
Copy link

Hi folk x2 👋

It's been a while since I opened issue #3828

As it turns out from the details of the Discord discussion, the current MPT implementation does not have the ability to delayed pruning of the state. Thus, all old state modifications remain in the storage.

The only option to remove (as @jochem-brouwer stated) was to walk the tree and purge nodes that are not associated with the current root. However, this is a very slow process that would not work well with a large storage.

Problem statement

We need a mechanism for delayed state pruning that:

  1. Initially stores all state changes, so we have the ability to rollback to the very first states (so useNodePruning=false)
  2. Over time, we need to delete some old states so that they do not take up space in the storage

For example, if we have a sequence of state changes:

S1 -> S2 -> S3 -> S4 -> S5

Then, we need functionality that:

  1. Allows a rollback from S5 to S1 back
  2. Over time, allows the oldest states, say S1 and S2, to be deleted. At the same time, the network retains the ability to rollback from S5 to S4 and S3, BUT NOT TO S1 and S2

What was added/changed in the pull request

  1. I changed the put and del functions - added the functionality for tracking operations for delete batch. Simply pass an additional parameter trackPruningOps=true to track storage pruning operations as if useNodePruning=true
  2. These functions now return an array of BatchDBOp[] like:
[

{
type: 'del',
key: toBuffer('0x7faec532fb97575824d052080273b9f09cb97c3956e7f48d513ee8f3e9ea496e')
},

{
type: 'del',
key: toBuffer('0x2ea8c6d75a907d9739c7b4d60c49648850cf8ff54bb8517eda3cf6549674a110') 
},

 {
 type: 'del',
 key: toBuffer('0x637e2f8ae1872b5a7886c90d52f14645afd703609ea3de6436139b447c22fc81')
 },

 {
 type: 'del',
 key: toBuffer('0x35c1813fe71a6d4d8d37ebeed4d1b5ba8b158d026e6f2e4304cc41aa82a598bf')
 },

 {
 type: 'del',
 key: toBuffer('0xd62fbfc121e09c1d25abcc9846d61b3981e363b43bf92428d1490c4376a76afd')
 },

 {
 type: 'del',
 key: toBuffer('0xcc24fb875de96399250d53aa8213bbf69a4ae3290138f8945140b8543415d041')
 },

 {
 type: 'del',
 key: toBuffer('0xd0e39f62614a374c9c3ca82304ff9fb186e5e00ddd0a9239734ea00334c75a2f')
}

]

You can save these keys somewhere (in some 3rd party KV database) and use them later for delayed pruning.

The del() and put() function signatures now look like this:

async del(key: Uint8Array, skipKeyTransform: boolean = false, trackPruningOps: boolean = false): Promise<BatchDBOp[]>;

async put(key: Uint8Array, value: Uint8Array | null, skipKeyTransform: boolean = false, trackPruningOps: boolean = false): Promise<BatchDBOp[]>;
  1. Added the function
async delPrevStatesData(ops: BatchDBOp[]): Promise<void>

which takes an array of keys to delete as input. This function can be used to prune the database

Demo

Let's see how it works:

💡 Attention: I used older version with @ethereumjs/trie instead of @ethereumjs/mpt

1. Adding accounts for Alice, Bob, Charlie

import { Trie } from "@ethereumjs/trie";
import { bufferToHex, toBuffer } from "@ethereumjs/util";
import { LevelDB } from "./LevelDB.js";
import { Level } from "level";

class MPTWrapper {
  constructor() {
    this.db = new LevelDB(new Level("STORAGE"));
    this.trie = new Trie({ db: this.db });
  }

  async insert(key, value) {

    const keyBuffer = Buffer.from(key, "utf8").toString("hex");
    const valueBuffer = toBuffer(value);

    let ops = await this.trie.put(keyBuffer, valueBuffer);
 
    return ops;

 }

  async get(key) {

    const keyBuffer = Buffer.from(key, "utf8").toString("hex");
    const valueBuffer = await this.trie.get(keyBuffer);

    if (!valueBuffer) {
      return null;
    }

    return valueBuffer.toString("hex");
  }

  async getStateRootHash() {
    const root = await this.trie.root();
    return bufferToHex(root);
  }

  async rollbackToState(stateRoot) {
    const rootBuffer = toBuffer(stateRoot);
    await this.trie.root(rootBuffer);
  }
}

const mptWrapper = new MPTWrapper();

console.log(
  "State root hash before inserting Alice => ",
  await mptWrapper.getStateRootHash()
);

let ops1 = await mptWrapper.insert(
  "Alice",
  "0xbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
);

console.log(`Ops for delayed pruning => `,ops1)


console.log(
  "State root hash after inserting Alice => ",
  await mptWrapper.getStateRootHash()
);

let ops2 = await mptWrapper.insert(
  "Bob",
  "0xabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
);

console.log(`Ops for delayed pruning => `,ops2)

console.log(
  "State root hash after inserting Bob => ",
  await mptWrapper.getStateRootHash()
);

let ops3 = await mptWrapper.insert(
  "Charlie",
  "0xaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
);

console.log(`Ops for delayed pruning => `,ops3)

console.log(
  "State root hash after inserting Charlie => ",
  await mptWrapper.getStateRootHash()
);

let ops4 = await mptWrapper.insert(
  "Charlie",
  "0xccbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"
);

console.log(`Ops for delayed pruning => `,ops4)

console.log(
  "State root hash after updating Charlie => ",
  await mptWrapper.getStateRootHash()
);

console.log("Alice data => ", await mptWrapper.get("Alice"));
console.log("Bob data => ", await mptWrapper.get("Bob"));
console.log("Charlie data => ", await mptWrapper.get("Charlie"));

Output:

State root hash before inserting Alice =>  0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421
Ops for delayed pruning =>  []
State root hash after inserting Alice =>  0x7faec532fb97575824d052080273b9f09cb97c3956e7f48d513ee8f3e9ea496e
Ops for delayed pruning =>  [
  {
    type: 'del',
    key: '0x7faec532fb97575824d052080273b9f09cb97c3956e7f48d513ee8f3e9ea496e'
  }
]
State root hash after inserting Bob =>  0x2bb2b680afef9301f80681b3426e90d41ff041c1f10e9defe4342bab7da7bca1
Ops for delayed pruning =>  [
  {
    type: 'del',
    key: '0x2bb2b680afef9301f80681b3426e90d41ff041c1f10e9defe4342bab7da7bca1'
  },
  {
    type: 'del',
    key: '0x50ded985cebcb774febe4e19cde15ea5780f88c119712cdbb49881f1087d3fbb'
  }
]
State root hash after inserting Charlie =>  0xf59cef53dd5a17eda03c2ca0b6573a5ccaac9f81c39b5471b93336a9ff57ef75
Ops for delayed pruning =>  [
  {
    type: 'del',
    key: '0xf59cef53dd5a17eda03c2ca0b6573a5ccaac9f81c39b5471b93336a9ff57ef75'
  },
  {
    type: 'del',
    key: '0xa61e32a6bd60e5008a1219bf14b518ab3d172a63c3067fb9ccb00eb4d6b87057'
  },
  {
    type: 'del',
    key: '0x3809146f47fc79bc0b345d6167e68b78f599cc0c8f42a4857f44aee33e19df5b'
  }
]
State root hash after updating Charlie =>  0xe846b6131176a110dffd78c943dcfaf7b7d9fc1f6909b87d692d3d3ae0195b17
Alice data =>  bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
Bob data =>  abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
Charlie data =>  0ccbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

2. Check the ops for pruning (to use it later)

As you have seen, from the put() function we get delete operations. So, currently our DB (with useNodePruning=false) contains:

Total keys: 11

If we make useNodePruning=true, then:

Total keys: 5

Now - look at the output - there are indeed 6 keys, so 11- 6 = 5. Everything works!

3. Make pruning

Let's delete it using the delPrevStatesData() function:

console.log(
  "State root hash after updating Charlie => ",
  await mptWrapper.getStateRootHash()
);

let allOps = ops1.concat(ops2, ops3, ops4).map(x=>({type:'del',key:toBuffer(x.key)}));

console.log(allOps);

await mptWrapper.trie.delPrevStatesData(allOps);

console.log("Alice data => ", await mptWrapper.get("Alice"));
console.log("Bob data => ", await mptWrapper.get("Bob"));
console.log("Charlie data => ", await mptWrapper.get("Charlie"));


console.log(
    "State root hash after everything => ",
    await mptWrapper.getStateRootHash()
  );

Output:

State root hash after updating Charlie =>  0xe846b6131176a110dffd78c943dcfaf7b7d9fc1f6909b87d692d3d3ae0195b17
[
  {
    type: 'del',
    key: <Buffer 7f ae c5 32 fb 97 57 58 24 d0 52 08 02 73 b9 f0 9c b9 7c 39 56 e7 f4 8d 51 3e e8 f3 e9 ea 49 6e>
  },
  {
    type: 'del',
    key: <Buffer 2b b2 b6 80 af ef 93 01 f8 06 81 b3 42 6e 90 d4 1f f0 41 c1 f1 0e 9d ef e4 34 2b ab 7d a7 bc a1>
  },
  {
    type: 'del',
    key: <Buffer 50 de d9 85 ce bc b7 74 fe be 4e 19 cd e1 5e a5 78 0f 88 c1 19 71 2c db b4 98 81 f1 08 7d 3f bb>
  },
  {
    type: 'del',
    key: <Buffer f5 9c ef 53 dd 5a 17 ed a0 3c 2c a0 b6 57 3a 5c ca ac 9f 81 c3 9b 54 71 b9 33 36 a9 ff 57 ef 75>
  },
  {
    type: 'del',
    key: <Buffer a6 1e 32 a6 bd 60 e5 00 8a 12 19 bf 14 b5 18 ab 3d 17 2a 63 c3 06 7f b9 cc b0 0e b4 d6 b8 70 57>
  },
  {
    type: 'del',
    key: <Buffer 38 09 14 6f 47 fc 79 bc 0b 34 5d 61 67 e6 8b 78 f5 99 cc 0c 8f 42 a4 85 7f 44 ae e3 3e 19 df 5b>
  }
]
Alice data =>  bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
Bob data =>  abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
Charlie data =>  0ccbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
State root hash after everything =>  0xe846b6131176a110dffd78c943dcfaf7b7d9fc1f6909b87d692d3d3ae0195b17

Also, if we check the total keys in state we'll see:

Total keys: 5

Summary

  1. As you see - we still can get the latest version of state (accounts of Alice, Bob, Charlie are reachable after pruning)
  2. State root is the same - before pruning and after

4. What about access to some previous states, not to the latest

  1. In the example above, I showed full pruning. That is, there is access only to the latest state
  2. Now let's try partial pruning

In our example, let's try deleting data that relates to the state when the storage only contained data about Alice and Bob.

That is:

  1. S1 - empty db
  2. S2 - there is Alice's account
  3. S3 - there is Bob's account
  4. S4 - there is Charlie's account
  5. S5 - Charlie's account has been updated

Let's delete S1, S2, S3. This way, we can still rollback to state S4 and get Charlie's old data.

// ...(previous code)
console.log(
  "S5 State root hash after updating Charlie => ",
  await mptWrapper.getStateRootHash()
);

let allOpsWithoutLast2States = ops1.concat(ops2, ops3).map(x=>({type:'del',key:toBuffer(x.key)}));

console.log('\n Ops to delete S1, S2, S3 => ',allOpsWithoutLast2States)

await mptWrapper.trie.delPrevStatesData(allOpsWithoutLast2States)

console.log("S5 Alice data => ", await mptWrapper.get("Alice"));
console.log("S5 Bob data => ", await mptWrapper.get("Bob"));
console.log("S5 Charlie data => ", await mptWrapper.get("Charlie"));
 

console.log(
    "State root hash after everything => ",
    await mptWrapper.getStateRootHash()
  );



// Rollback to state with Alice, Bob and Charlie (Charlie has old state)
await mptWrapper.rollbackToState('0xf59cef53dd5a17eda03c2ca0b6573a5ccaac9f81c39b5471b93336a9ff57ef75');

console.log("S4 Alice data after rollback => ", await mptWrapper.get("Alice"));
console.log("S4 Bob data after rollback => ", await mptWrapper.get("Bob"));
console.log("S4 Charlie data after rollback => ", await mptWrapper.get("Charlie"));

Output:

S1 State root hash before inserting Alice =>  0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421
S2 State root hash after inserting Alice =>  0x7faec532fb97575824d052080273b9f09cb97c3956e7f48d513ee8f3e9ea496e
S3 State root hash after inserting Bob =>  0x2bb2b680afef9301f80681b3426e90d41ff041c1f10e9defe4342bab7da7bca1
S4 State root hash after inserting Charlie =>  0xf59cef53dd5a17eda03c2ca0b6573a5ccaac9f81c39b5471b93336a9ff57ef75
S5 State root hash after updating Charlie =>  0xe846b6131176a110dffd78c943dcfaf7b7d9fc1f6909b87d692d3d3ae0195b17

 Ops to delete S1, S2, S3 =>  [
  {
    type: 'del',
    key: <Buffer 7f ae c5 32 fb 97 57 58 24 d0 52 08 02 73 b9 f0 9c b9 7c 39 56 e7 f4 8d 51 3e e8 f3 e9 ea 49 6e>
  },
  {
    type: 'del',
    key: <Buffer 2b b2 b6 80 af ef 93 01 f8 06 81 b3 42 6e 90 d4 1f f0 41 c1 f1 0e 9d ef e4 34 2b ab 7d a7 bc a1>
  },
  {
    type: 'del',
    key: <Buffer 50 de d9 85 ce bc b7 74 fe be 4e 19 cd e1 5e a5 78 0f 88 c1 19 71 2c db b4 98 81 f1 08 7d 3f bb>
  }
]
S5 Alice data =>  bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
S5 Bob data =>  abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
S5 Charlie data =>  0ccbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
State root hash after everything =>  0xe846b6131176a110dffd78c943dcfaf7b7d9fc1f6909b87d692d3d3ae0195b17
S4 Alice data after rollback =>  bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
S4 Bob data after rollback =>  abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
S4 Charlie data after rollback =>  0aabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

As you see - we still has access to states S5 and S4 but not to states S1-S3

If we check the total keys number in db:

Total keys: 8

Indeed, because, we deleted only 3 values. In case we also want to delete data related to S4, we can delete extra 3 keys and get 8-3=5 total keys - as expected.

packages/mpt/src/mpt.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@acolytec3 acolytec3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool idea, though we should think through whether we are able to split the delete ops list into a set of deletes for each state transition rather than just one single list. Otherwise, we are only able to roll back to one previous state root (the one before the delete operations began).

@VladChernenko
Copy link
Author

Cool idea, though we should think through whether we are able to split the delete ops list into a set of deletes for each state transition rather than just one single list. Otherwise, we are only able to roll back to one previous state root (the one before the delete operations began).

@acolytec3 Thank you!

Yes, sure. We can split the list. As I demonstrated in the last example, we can delete only some keys, not the whole list. This will give us the ability to rollback N steps back, but not more. Useful to smoothly get rid of super old state.

@@ -218,9 +219,10 @@ export class MerklePatriciaTrie {

// If value is empty, delete
if (value === null || value.length === 0) {
return this.del(key)
await this.del(key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initially returned from this method, but now it does not, so I suspect this will yield some bugs (?)

@VladChernenko
Copy link
Author

VladChernenko commented Jan 4, 2025

UPD: Fixed the problem with deletion of necessary nodes

Thanks @acolytec3 @jochem-brouwer

So guys, after a series of tests I actually encountered the problem that with naive deletion we can accidentally delete nodes that will still be needed in the last states.

Here's why it went unnoticed:

  1. Previously, we tried only adding nodes via put() operations, but not deleting
  2. When deleting via the del() method, the root changes again and when migrating between states, a situation may indeed arise when the desired node is accidentally deleted

Solution

So, after spending more time on research, I found 2 useful articles that helped with this:

  1. Article by Vitalik Buterin
  2. Article by Nethermind

TLDR - using the reference counting method

In simple terms - by modifying the methods put(), del() and saveStack() we track any changes to the tree - adding new nodes, deleting - in general any changes to the tree.

What was changed in the commit

  1. Added an internal array _nodesOps which contains tree operations that were performed
  2. Added a method nodesOps() which can return this array
  3. Added a method getKeysToPrune() which takes as input an array obtained from nodesOps() and which filters node identifiers with a reference counter of <=0.

Using this method - keys with a counter value of <=0 can be safely deleted and which WILL NOT BE NEEDED BY THE LAST STATE.

  1. This list of nodesOps can be saved separately (let's say in some 3rd party database).
  2. When you need it - just prune

Copy link
Contributor

@acolytec3 acolytec3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I want to clarify. Is the idea that this allows you to only roll back to one previous state root (i.e. whatever stateroot existed before you call delPrevStatesData? In other words, it's only allowing you to roll back to one previous state root?

@VladChernenko
Copy link
Author

@acolytec3

Look, I'll add a step-by-step explanation:

  1. I took into account the comments regarding the fact that naive raw deletion of keys can lead to the fact that we delete the necessary nodes. Totally agree because tested myself

  2. Therefore, based on 2 articles(references in PR) and deeper analysis, I proposed a solution through references counting.

  3. In this form - this allows you to perform pruning in the following sequence of steps:

  4. The trie works without pruning and changes states - let's say S0 -> S1 -> S2

  5. At the moment S2, we stop the work and start pruning - from S0 to S2

  6. After pruning, we will only have access to the S2 state (only the last current state)

  7. In the future, this references counting mechanism can also be improved by introducing the ability to access not only the last state, but also earlier ones to a certain depth (let's say clean up the excess in the interval S0 -> S1, but leave the ability to rollback from S2 to S1 and back).

However, at the moment - only pruning works here, which will leave only the last current state (S2)

(that's enough for my needs I mentioned)


Here are the details:

This solution allows you to do it like this:

  1. First, we start the work without pruning (useNodePruning=false). This gives us the ability to move between historical states
  2. When inserting into the trie, each node is assigned a counter. When inserting, the counter increments (+1), when deleting (-1)
  3. Insert/delete operations from the database can be easily tracked by hooking the saveStack() function

Let's imagine that the server/virtual machine has been working for some time and during this time it has managed to change states, say:

S0 -> S1 -> S2 -> S3 -> S4 -> S5 -> ...

Now what we have:

  1. A database without pruning - the latest state of the trie and its history are stored here
  2. We have a separate database (let's call it KeysReferences) where the counter values ​​for the keys were written. This is a key-value store of the form:
key1 -> 0
key2 -> 1
key3 -> 0
...
keyN -> 1

Now let's imagine that we want to do pruning and delete all past states only to access the latest one.

  1. We use the KeysReferences database and iterate over the keys - if the counter <= 0 - the node can be safely removed from the main database as it is no longer needed by the latest state
  2. After pruning, we can completely delete the KeysReferences database - it is no longer needed

After this step, we have:

  1. A database with a trie after pruning - only the state S5 will be stored here
  2. An empty KeysReferences. Now we can continue writing reference counters here, but for the following trie changes: S5 -> S6 -> S7 ... -> Sn
  3. Then we can repeat the process and pruning on the interval S5 -> Sn. And so on.

}

// Filter keys with counter <= 0
let opsWithKeysToDelete = Object.keys(counters).filter(key => counters[key] <= 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this distinguish between first a put, then a del, or first a del, then a put? 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular example, an array containing sequential changes is passed to the input. Therefore, the earlier positions contain operations that were performed in earlier states. And the last elements of the array are operations that were performed later.

All keys that have a counter <=0 are not needed for the last state. Because if they were needed, the counter would be equal to >= 1, which means that they were inserted into the trie.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants