-
Notifications
You must be signed in to change notification settings - Fork 308
fix: continue pruning if version is not found #1063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: continue pruning if version is not found #1063
Conversation
WalkthroughThe changes enhance the Changes
Sequence Diagram(s)sequenceDiagram
participant NodeDB
participant Cache as Cache.getRootKey
participant OrphanTraversal as traverseOrphansWithRootkeyCache
NodeDB->>Cache: getRootKey(version)
alt Version does not exist
Cache-->>NodeDB: ErrVersionDoesNotExist
NodeDB->>NodeDB: Log error and continue
else Valid rootKey returned
Cache-->>NodeDB: rootKey
NodeDB->>OrphanTraversal: traverseOrphansWithRootkeyCache(rootKey)
alt traverse error is ErrVersionDoesNotExist
OrphanTraversal-->>NodeDB: Ignored error
else Other error occurs
OrphanTraversal-->>NodeDB: Return error
end
end
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (3)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
nodedb.go (1)
466-468
: Fix typo in error log message.There's a typo in the error message: "moving on the the next version" (duplicate "the").
- ndb.logger.Error("Error while pruning, moving on the the next version in the store", "version missing", version, "next version", version+1, "err", err) + ndb.logger.Error("Error while pruning, moving on to the next version in the store", "version missing", version, "next version", version+1, "err", err)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nodedb.go
(2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
nodedb.go (3)
mutable_tree.go (1)
ErrVersionDoesNotExist
(18-18)node.go (1)
Node
(59-75)cache/cache.go (1)
Node
(10-12)
🔇 Additional comments (3)
nodedb.go (3)
462-464
: Improved error handling for ErrVersionDoesNotExist.This change modifies the error handling to specifically check for
ErrVersionDoesNotExist
and continue execution in that case, rather than immediately returning the error. This aligns with the PR's objective to allow pruning to continue when versions are missing.
470-493
: Added conditional traversal and ErrVersionDoesNotExist handling.This change adds a null check on
rootKey
before traversing orphans, which prevents potential nil pointer dereferences. It also modifies the error handling to ignoreErrVersionDoesNotExist
errors during traversal, consistent with the other changes in this PR.
506-508
: Consistent error handling for next version root key.This change applies the same improved error handling pattern for the next version's root key check, maintaining consistency with the earlier changes.
nodedb.go
Outdated
@@ -497,7 +503,7 @@ func (ndb *nodeDB) deleteVersion(version int64, cache *rootkeyCache) error { | |||
|
|||
// check if the version is referred by the next version | |||
nextRootKey, err := cache.getRootKey(ndb, version+1) | |||
if err != nil { | |||
if err != nil && err != ErrVersionDoesNotExist { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could nextRootKey
be nil
above if ErrVersionDoesNotExist
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it can if both the current version and the next version are missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't have a check for it being nil right?
@Mergifyio backport release/v1.2.x |
@Mergifyio backport release/v1.3.x |
✅ Backports have been created
|
✅ Backports have been created
|
(cherry picked from commit 8a2e2fe)
(cherry picked from commit 8a2e2fe)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Co-authored-by: PaddyMc <[email protected]>
Co-authored-by: PaddyMc <[email protected]>
Description
We found a case in Osmosis node where there is a root key that is points to a node that doesn't exists and it hangs the pruning process because fails at get root key (returns ErrVersionDoesNotExist).
There is already code to clean the dangling ref node up, but it just never get there because it early returns ErrVersionDoesNotExist before getting there.
This means when pruning we cannot prune a version of the store because it gets stuck. This PR moves onto the next version in the store if pruning returns a not found error.
Notes about legacy nodes
first
tolegacyLatestVersion+1
see:
Downloading state
https://snapshots.testnet.osmosis.zone/
or rn polkachu snapshots have and issue with
bank
andconcentratedliquidity
https://polkachu.com/tendermint_snapshots/osmosis
I ran this PR on this state on osmosis mainnet and it fixed the issue see => osmosis-labs/osmosis#9333
Checking broken stores
Use this PR and run:
osmosis-labs/cosmprund#2
Pruning broken stores
Use this PR and run:
osmosis-labs/cosmprund#2
State will then be fixed
Things we don't know
Why are there states deleted outside of pruning? Why does this become more apparent with async pruning?
Another version of the fix
#1048
This fix, works in the same way and just continues after the is a version not found error, this moves past both checks, version and version+1
Why this is needed
Currently if pruning breaks with this error the chain state will start to grow quickly.
What the fix will look like
Osmosis mainnet with broken state:
Before this would have and the state would bloat
This is osmosis testnet with broken state
This represents a large backlog as pruning is on
7203887
27208281
Summary by CodeRabbit
Summary by CodeRabbit