Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

silkworm: wrong state root after reorg at chain tip #2585

Closed
canepat opened this issue Dec 11, 2024 · 1 comment · Fixed by #2580
Closed

silkworm: wrong state root after reorg at chain tip #2585

canepat opened this issue Dec 11, 2024 · 1 comment · Fixed by #2580
Assignees
Labels
bug Something isn't working

Comments

@canepat
Copy link
Member

canepat commented Dec 11, 2024

Running standalone Silkworm at commit 5059a5f an unexpected wrong state root error occurred after a reorg while following the chain tip:

...
  INFO [10-16|02:01:49.003 UTC] PoSSync: fork_choice_update head_block_hash=51f1d8a7bc22601b9d786ac133ddde8c385be6c2c8bcf953b9ce46eadedb704f safe_block_hash=ef99240dd43ca80c4983bbe190074798590dc15eefcaece582b51e921718f3db finalized_block_hash=64c5fd09b493bf85b5b706c3a9d3f9487b1adc9da4435e7ab231969313b16799
  INFO [10-16|02:01:49.004 UTC] ExecutionEngine                          verifying chain 51f1d8a7bc22601b9d786ac133ddde8c385be6c2c8bcf953b9ce46eadedb704f
  INFO [10-16|02:01:49.004 UTC] ExecutionEngine                          updating fork choice to 51f1d8a7bc22601b9d786ac133ddde8c385be6c2c8bcf953b9ce46eadedb704f
  INFO [10-16|02:01:49.138 UTC] PoSSync: fork_choice_update OK latest_valid_hash=51f1d8a7bc22601b9d786ac133ddde8c385be6c2c8bcf953b9ce46eadedb704f current_head=51f1d8a7bc22601b9d786ac133ddde8c385be6c2c8bcf953b9ce46eadedb704f current_height=
  INFO [10-16|02:01:54.825 UTC] sync::SentryClient                       Peer 4b9e8ecc53879de9c21baadb92031399f6b7ab9f disconnected, active 40, info: -info-not-found-
  INFO [10-16|02:02:00.617 UTC] ExecutionEngine                          verifying chain ae28293b66207940a0b99aa2db0b1de2d16b57755c3439078df653fbf4dbd8e9
  INFO [10-16|02:02:00.617 UTC] ExecutionPipeline                        Forward start
  INFO [10-16|02:02:00.617 UTC] [1/12 Headers]                           op=kForward from=20975006 to=20975007 span=1 
  INFO [10-16|02:02:00.617 UTC] [2/12 BlockHashes]                       op=kForward from=20975006 to=20975007 span=1 
  INFO [10-16|02:02:00.617 UTC] [3/12 Bodies]                            op=kForward from=20975006 to=20975007 span=1 
  INFO [10-16|02:02:00.617 UTC] [4/12 Senders]                           op=kForward from=20975006 to=20975007 span=1 max_batch_size=139810 
  INFO [10-16|02:02:00.623 UTC] [5/12 Execution]                         op=kForward from=20975006 to=20975007 span=1 
  INFO [10-16|02:02:00.652 UTC] Flushed state                            size=61.74 KB in=2.434ms 
  INFO [10-16|02:02:00.652 UTC] [5/12 Execution] commit                  batch time= 
  INFO [10-16|02:02:00.652 UTC] [5/12 Execution]                         op=Forward done=28.296ms 
  INFO [10-16|02:02:00.652 UTC] [6/12 HashState]                         op=kForward from=20975006 to=20975007 span=1 
  INFO [10-16|02:02:00.660 UTC] [7/12 IntermediateHashes] begin          op=kForward from=20975006 to=20975007 span=1 
  INFO [10-16|02:02:00.690 UTC] [7/12 IntermediateHashes]                op=Forward done=30.353ms 
  INFO [10-16|02:02:00.690 UTC] [8/12 HistoryIndex]                      op=kForward from=20975006 to=20975007 span=1 
  INFO [10-16|02:02:00.701 UTC] [8/12 HistoryIndex]                      op=Forward done=10.829ms 
  INFO [10-16|02:02:00.701 UTC] [9/12 LogIndex]                          op=kForward from=20975006 to=20975007 span=1 
  INFO [10-16|02:02:00.707 UTC] [10/12 CallTraces]                       op=kForward from=20975006 to=20975007 span=1 
  INFO [10-16|02:02:00.713 UTC] [11/12 TxLookup]                         op=kForward from=20975006 to=20975007 span=1 
  INFO [10-16|02:02:00.714 UTC] [12/12 Finish]                           op=kForward from=20975006 to=20975007 span=1 
  INFO [10-16|02:02:00.714 UTC] ExecutionPipeline                        Forward done
  INFO [10-16|02:02:00.714 UTC] MainChain::verify_chain forward_result=kSuccess
  INFO [10-16|02:02:00.714 UTC] PoSSync: new_payload VALID current_head=ae28293b66207940a0b99aa2db0b1de2d16b57755c3439078df653fbf4dbd8e9
  INFO [10-16|02:02:00.738 UTC] PoSSync: fork_choice_update head_block_hash=ae28293b66207940a0b99aa2db0b1de2d16b57755c3439078df653fbf4dbd8e9 safe_block_hash=ef99240dd43ca80c4983bbe190074798590dc15eefcaece582b51e921718f3db finalized_block_hash=64c5fd09b493bf85b5b706c3a9d3f9487b1adc9da4435e7ab231969313b16799
  INFO [10-16|02:02:00.739 UTC] ExecutionEngine                          verifying chain ae28293b66207940a0b99aa2db0b1de2d16b57755c3439078df653fbf4dbd8e9
  INFO [10-16|02:02:00.739 UTC] ExecutionEngine                          updating fork choice to ae28293b66207940a0b99aa2db0b1de2d16b57755c3439078df653fbf4dbd8e9
  INFO [10-16|02:02:00.804 UTC] PoSSync: fork_choice_update OK latest_valid_hash=ae28293b66207940a0b99aa2db0b1de2d16b57755c3439078df653fbf4dbd8e9 current_head=ae28293b66207940a0b99aa2db0b1de2d16b57755c3439078df653fbf4dbd8e9 current_height=
  INFO [10-16|02:02:14.971 UTC] ExecutionEngine                          verifying chain 4dca2395b6d4d14a39ae0d9bff5ee253ad5dc56199fa5cf3eb6e371db8ed3b98
  INFO [10-16|02:02:14.971 UTC] ExecutionPipeline                        Forward start
  INFO [10-16|02:02:14.971 UTC] [1/12 Headers]                           op=kForward from=20975007 to=20975008 span=1 
  INFO [10-16|02:02:14.971 UTC] [2/12 BlockHashes]                       op=kForward from=20975007 to=20975008 span=1 
  INFO [10-16|02:02:14.971 UTC] [3/12 Bodies]                            op=kForward from=20975007 to=20975008 span=1 
  INFO [10-16|02:02:14.971 UTC] [4/12 Senders]                           op=kForward from=20975007 to=20975008 span=1 max_batch_size=139810 
  INFO [10-16|02:02:14.977 UTC] [5/12 Execution]                         op=kForward from=20975007 to=20975008 span=1 
  INFO [10-16|02:02:15.040 UTC] Flushed state                            size=32.65 KB in=2.160ms 
  INFO [10-16|02:02:15.040 UTC] [5/12 Execution] commit                  batch time= 
  INFO [10-16|02:02:15.041 UTC] [5/12 Execution]                         op=Forward done=63.228ms 
  INFO [10-16|02:02:15.041 UTC] [6/12 HashState]                         op=kForward from=20975007 to=20975008 span=1 
  INFO [10-16|02:02:15.049 UTC] [7/12 IntermediateHashes] begin          op=kForward from=20975007 to=20975008 span=1 
  INFO [10-16|02:02:15.074 UTC] [7/12 IntermediateHashes]                op=Forward done=25.118ms 
  INFO [10-16|02:02:15.074 UTC] [8/12 HistoryIndex]                      op=kForward from=20975007 to=20975008 span=1 
  INFO [10-16|02:02:15.085 UTC] [8/12 HistoryIndex]                      op=Forward done=11.014ms 
  INFO [10-16|02:02:15.085 UTC] [9/12 LogIndex]                          op=kForward from=20975007 to=20975008 span=1 
  INFO [10-16|02:02:15.090 UTC] [10/12 CallTraces]                       op=kForward from=20975007 to=20975008 span=1 
  INFO [10-16|02:02:15.097 UTC] [11/12 TxLookup]                         op=kForward from=20975007 to=20975008 span=1 
  INFO [10-16|02:02:15.098 UTC] [12/12 Finish]                           op=kForward from=20975007 to=20975008 span=1 
  INFO [10-16|02:02:15.098 UTC] ExecutionPipeline                        Forward done
  INFO [10-16|02:02:15.098 UTC] MainChain::verify_chain forward_result=kSuccess
  INFO [10-16|02:02:15.098 UTC] PoSSync: new_payload VALID current_head=4dca2395b6d4d14a39ae0d9bff5ee253ad5dc56199fa5cf3eb6e371db8ed3b98
  INFO [10-16|02:02:15.122 UTC] PoSSync: fork_choice_update head_block_hash=4dca2395b6d4d14a39ae0d9bff5ee253ad5dc56199fa5cf3eb6e371db8ed3b98 safe_block_hash=ef99240dd43ca80c4983bbe190074798590dc15eefcaece582b51e921718f3db finalized_block_hash=64c5fd09b493bf85b5b706c3a9d3f9487b1adc9da4435e7ab231969313b16799
  INFO [10-16|02:02:15.122 UTC] ExecutionEngine                          verifying chain 4dca2395b6d4d14a39ae0d9bff5ee253ad5dc56199fa5cf3eb6e371db8ed3b98
  INFO [10-16|02:02:15.122 UTC] ExecutionEngine                          updating fork choice to 4dca2395b6d4d14a39ae0d9bff5ee253ad5dc56199fa5cf3eb6e371db8ed3b98
  INFO [10-16|02:02:15.193 UTC] PoSSync: fork_choice_update OK latest_valid_hash=4dca2395b6d4d14a39ae0d9bff5ee253ad5dc56199fa5cf3eb6e371db8ed3b98 current_head=4dca2395b6d4d14a39ae0d9bff5ee253ad5dc56199fa5cf3eb6e371db8ed3b98 current_height=
  INFO [10-16|02:02:15.223 UTC] Resource usage                           mem=91.53 GB chain=2.37 TB temp=0.00 B uptime=3d 20h 55m 2s 
  INFO [10-16|02:02:16.775 UTC] sync::SentryClient                       Peer fb63c612c2c4df34267b2690398f493b9577593d connected, active 41, info: client_id=Nethermind/v1.26.0-rc+0380355b/linux-x64/dotnet8.0.8 / enode_url=enode://fb63c612c2c4df34267b2690398f493b9577593d5737b515cb3d3b6bf7b9970a46929134c837575cf9061f4c982adb34bf79cf43c8cc617b278dde3006f859c2@51.154.49.132:30303
  INFO [10-16|02:02:20.775 UTC] sync::SentryClient                       Peer 43bc2ef1a9df6506e064592d6755d74675dde3cf connected, active 42, info: client_id=Nodecrawler/v1.14.0-unstable/linux-arm64/go1.22.5 / enode_url=enode://43bc2ef1a9df6506e064592d6755d74675dde3cf98b7f0a3878b468e530e2d7abfe728bf7a7087c1bb475bfa742511b4a5e5a22c1ac2bf401e0821b562fbe1ae@49.13.11.197:30310
  INFO [10-16|02:02:25.038 UTC] ExecutionEngine                          verifying chain f29c2e72646e5b038ed693222620a3a8de8c002eb1014e4ac61e37a12dbeca72
  INFO [10-16|02:02:25.038 UTC] ExecutionPipeline                        Unwind start
  INFO [10-16|02:02:25.038 UTC] [1/12 Finish]                            op=kUnwind from=20975008 to=20975007 span=1 
  INFO [10-16|02:02:25.038 UTC] [2/12 TxLookup]                          op=kUnwind from=20975008 to=20975007 span=1 
  INFO [10-16|02:02:25.038 UTC] [3/12 CallTraces]                        op=kUnwind from=20975008 to=20975007 span=1 
  INFO [10-16|02:02:25.041 UTC] [4/12 LogIndex]                          op=kUnwind from=20975008 to=20975007 span=1 
  INFO [10-16|02:02:25.043 UTC] [5/12 HistoryIndex]                      op=kUnwind from=20975008 to=20975007 span=1 
  INFO [10-16|02:02:25.046 UTC] [6/12 HashState]                         op=kUnwind from=20975008 to=20975007 span=1 
  INFO [10-16|02:02:25.050 UTC] [7/12 IntermediateHashes] begin          op=kUnwind from=20975008 to=20975007 span=1 
 ERROR [10-16|02:02:25.060 UTC] Wrong trie root                          expected=0xc94a868df97206dd11c822a9f6fedcc346f3ba1524e6f60ac0533d389dbf3ea9 got=0x88453e5acd83a936411efda43cba31e936059c534ddc4065f6880312975a66d6 
 ERROR [10-16|02:02:25.061 UTC] [7/12 IntermediateHashes]                function=unwind exception=kWrongStateRoot 
 ERROR [10-16|02:02:25.061 UTC] [7/12 IntermediateHashes]                op=Unwind returned=kWrongStateRoot 
 ERROR [10-16|02:02:25.061 UTC] ExecPipeline                             Unwind interrupted due to stage IntermediateHashes failure
 ERROR [10-16|02:02:25.061 UTC] PoSSync: unexpected error processing payload: kWrongStateRoot
@canepat canepat added the bug Something isn't working label Dec 11, 2024
@canepat canepat self-assigned this Dec 11, 2024
@canepat
Copy link
Member Author

canepat commented Dec 12, 2024

Using the debug_unwind and reset_to_download commands in staged_pipeline utility, you can simulate any reorg over the block snapshots by repeatedly advancing the execution pipeline to some target block (--height) and then going some steps behind (--step), e.g.

% ./cmd/dev/staged_pipeline --exclusive --datadir ~/Silkworm debug_unwind --height 10000000 --stop_before_stage HistoryIndex --step 1 --force

Even though the blocks in snapshots are canonical by definition and hence such reorgs cannot happen in practice, this approach is still extremely useful to debug the unwind procedure.

The first block where you encounter the dreaded wrong state root error is 694'128 (very early in the chain):

% ./cmd/dev/staged_pipeline --exclusive --datadir /Users/tullio/Library/Silkworm debug_unwind --height 694128 --stop_before_stage HistoryIndex --step 1 --force
  INFO [12-08|19:12:56.504 UTC] Debug unwind datadir: /Users/tullio/Library/Silkworm
  INFO [12-08|19:13:05.167 UTC] Reopen snapshot repository folder: /Users/tullio/Library/Silkworm/snapshots
  INFO [12-08|19:13:08.651 UTC] Total reopened bundles: 72 max block available: 20'474'999
  INFO [12-08|19:13:08.651 UTC] Debug unwind: forward to=694'128 START
...
  INFO [12-08|19:13:20.829 UTC] Debug unwind: forward to=694'128 END
  INFO [12-08|19:13:20.829 UTC] Debug unwind: unwind down to block=694'127 START
  INFO [12-08|19:13:20.829 UTC] ExecutionPipeline                        Unwind start
  INFO [12-08|19:13:20.829 UTC] [7/13 HashState]                         op=kUnwind from=694128 to=694127 span=1 
  INFO [12-08|19:13:20.829 UTC] [8/13 IntermediateHashes] begin          op=kUnwind from=694128 to=694127 span=1 
  INFO [12-08|19:13:20.829 UTC] [8/13 IntermediateHashes]                clearing=TrieAccount 
  INFO [12-08|19:13:20.829 UTC] [8/13 IntermediateHashes]                clearing=TrieStorage 
 ERROR [12-08|19:13:20.992 UTC] [8/13 IntermediateHashes]                function=regenerate_intermediate_hashes exception=expected 0xa1d37be94ad1f837c5c07eb22e37fc0fdd3148c68e8a7ff713bb0cb3a3ee945d got 0x1669df5b4a59509cf11e4f8dcd0133c2738dc8151061e0d76c1c4002f2c83e64 
 ERROR [12-08|19:13:20.992 UTC] [8/13 IntermediateHashes]                function=unwind exception=kWrongStateRoot 
 ERROR [12-08|19:13:20.992 UTC] [8/13 IntermediateHashes]                op=Unwind returned=kWrongStateRoot 
 ERROR [12-08|19:13:20.992 UTC] ExecutionPipeline                        Unwind interrupted due to stage IntermediateHashes failure
Error: unwind failed: kWrongStateRoot

@canepat canepat linked a pull request Dec 12, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant