fix: continue pruning if version is not found #49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
We found a case in Osmosis node where there is a root key that is points to a node that doesn't exists and it hangs the pruning process because fails at get root key (returns ErrVersionDoesNotExist).
There is already code to clean the dangling ref node up, but it just never get there because it early returns ErrVersionDoesNotExist before getting there.
This means when pruning we cannot prune a version of the store because it gets stuck. This PR moves onto the next version in the store if pruning returns a not found error.
Notes about legacy nodes
first
tolegacyLatestVersion+1
see:
Downloading state
https://snapshots.testnet.osmosis.zone/
Checking broken stores
Use this PR and run:
osmosis-labs/cosmprund#2
Pruning broken stores
Use this PR and run:
osmosis-labs/cosmprund#2
State will then be fixed
Things we don't know
Why are there states deleted outside of pruning? Why does this become more apparent with async pruning?
Another version of the fix
cosmos#1048
This fix, works in the same way and just continues after the is a version not found error, this moves past both checks, version and version+1
Why this is needed
Currently if pruning breaks with this error the chain state will start to grow quickly.
What the fix will look like
Osmosis mainnet with broken state:
Before this would have and the state would bloat
This is osmosis testnet with broken state
This represents a large backlog as pruning is on
7203887
27208281