-
Notifications
You must be signed in to change notification settings - Fork 412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add prune wasm codes proposal #1455
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1455 +/- ##
==========================================
- Coverage 57.81% 57.79% -0.02%
==========================================
Files 62 62
Lines 8308 8416 +108
==========================================
+ Hits 4803 4864 +61
- Misses 3099 3143 +44
- Partials 406 409 +3
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work. 🏄
Unfortunately the task has some more complexity with duplicate code uploads and instances. This was not well documented in the issue. I have added some notes in the review.
I have some feeling that the list of codeIDs is very hard to generate and validate by voters off chain. How about replacing them with a single max codeID that is ensured to be < current lastCodeId
sequence. We can then iterate through all codeIDs up to this number and build the list of unused codeIs on proposal execution. This is going to be an expensive operation on Gas but it is executed in an end-blocker.
It would be good to collect some numbers how much gas this would cost and how long this takes for some realistic data like Juno mainnet. But this can be a seconds task
proto/cosmwasm/wasm/v1/tx.proto
Outdated
string authority = 1 [ (cosmos_proto.scalar) = "cosmos.AddressString" ]; | ||
// CodeIDs references the WASM codes | ||
repeated uint64 code_ids = 2 [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can be a large number. The proposal submitter needs to provide this list upfront, which can be a lot of work.
It would be nice if there would be a way that this list is generated automatically on execution.
expErr: true, | ||
}, | ||
} | ||
for name, spec := range specs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good test cases.
We need to ensure also that no codes are pruned where an instance exists.
k.wasmVM.RemoveCode(info.CodeHash) | ||
k.deleteCodeInfo(ctx, c) | ||
} | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you would also need to ensure that no instance exists for a codeID.
x/wasm/keeper/keeper.go
Outdated
|
||
info := k.GetCodeInfo(ctx, c) | ||
if info == nil { | ||
return types.ErrNoSuchCodeFn(c).Wrapf("code id %d", c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's not abort the operation but skip the number. There is not much we can do in this case.
|
||
// then | ||
if spec.expErr { | ||
require.Error(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just add a return or the code after the else block is executed as well. The else can then be dropped and NoError is on the main path
x/wasm/keeper/keeper.go
Outdated
if info == nil { | ||
return types.ErrNoSuchCodeFn(c).Wrapf("code id %d", c) | ||
} | ||
k.wasmVM.RemoveCode(info.CodeHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, we are removing code by checksum and not codeID. It is very likely that multiple codeIDs for the same hash exist. We need to ensure that we only delete by a checksum when no instance for any of codeID exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌
proto/cosmwasm/wasm/v1/tx.proto
Outdated
string authority = 1 [ (cosmos_proto.scalar) = "cosmos.AddressString" ]; | ||
// LatestCodeID references the upper limit until unpinned | ||
// code ids will be pruned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you be explicit here if there upper limit is included or not? "latest code ID" sounds lit it is included but the implementation seems to be <.
I think <= is a bit simpler to reason about.
x/wasm/keeper/keeper.go
Outdated
pinnedCodesHash := make(map[string]struct{}, 0) | ||
unpinnedCodes := make([]uint64, 0) | ||
|
||
for c := uint64(1); c < codeID; c++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add a minimum here? This can be optional. But then you can clean
First proposal: ID <= 1000
Second proposal 1001 <= ID <= 3000
Third proposal 3001 <= ID <= 5000
Fourth proposal 2001 <= ID <= 4000 (let's say some old codes got unpinned or migrated aways from)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, looking at the other problem regarding deletion by hash I am not sure if this suggestion makes sense. We need to look through all code IDs anyways to know which checksum is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 we would aways iterate through all contract-infos and pined codes but when we store the max-code-id of the last run, we could abort early when reached. No need to pass it with the proposal IMHO.
This would be an optimization, let's do it in a second PR
x/wasm/keeper/keeper.go
Outdated
} | ||
|
||
k.deleteCodeInfo(ctx, code) | ||
_ = k.wasmVM.RemoveCode(info.CodeHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want to ignore those errors? It typically means a file could not be removed and indicates invalid state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is either abort the whole operation or not. IMHO there is value in removing the code-info data from the chain state even in case of an error here. There may be individual reasons on single machines, like file permissions that must not affect consensus. On the next state sync, the file would not be transferred to new nodes.
Logging the error makes sense IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates! There is some non optimal code when looping over the code-id range and using single get-pinned calls. It may not be very intuitive to work with iterators but they are essential for bulk operations. Please see my code example for inspiration:
// iteratePinnedCodes iterates over all pinned code ids in ascending order
func (k Keeper) iteratePinnedCodes(ctx sdk.Context, cb func(codeID uint64) bool) {
store := prefix.NewStore(ctx.KVStore(k.storeKey), types.PinnedCodeIndexPrefix)
iter := store.Iterator(nil, nil)
defer iter.Close()
l := len(types.PinnedCodeIndexPrefix)
for ; iter.Valid(); iter.Next() {
codeID := sdk.BigEndianToUint64(iter.Key()[l:])
if cb(codeID) {
return
}
}
}
// PruneWasmCodes iterates over all contracts and code-infos to remove unused codes that are smaller than given code id.
func (k Keeper) PruneWasmCodes(ctx sdk.Context, maxCodeID uint64) error {
usedCodeIDs := make(map[uint64]struct{})
// used by contracts
k.IterateContractInfo(ctx, func(_ sdk.AccAddress, info types.ContractInfo) bool {
if info.CodeID < maxCodeID {
usedCodeIDs[info.CodeID] = struct{}{}
}
return false
})
// pinned to cache by gov
k.iteratePinnedCodes(ctx, func(codeID uint64) bool {
usedCodeIDs[codeID] = struct{}{} // may add maxCodeID but does not matter as we check later
return codeID >= maxCodeID // abort early
})
// execute prune
done := make(map[string]struct{})
var deleteCounter int
k.IterateCodeInfos(ctx, func(codeID uint64, info types.CodeInfo) bool {
if codeID >= maxCodeID {
return true // abort early
}
checksum := string(info.CodeHash)
if _, ok := usedCodeIDs[codeID]; ok {
done[checksum] = struct{}{}
return false
}
k.deleteCodeInfo(ctx, codeID) // clean up state
if _, ok := done[checksum]; ok { // do not delete twice on disk
return false
}
done[checksum] = struct{}{}
if err := k.wasmVM.RemoveCode(info.CodeHash); err != nil {
k.Logger(ctx).Error("failed to delete wasm file on disk", "checksum", info.CodeHash)
} else {
deleteCounter++
}
return false
})
k.Logger(ctx).Info("deleted unused wasm files", "total", deleteCounter)
return nil
}
x/wasm/keeper/keeper.go
Outdated
pinnedCodesHash := make(map[string]struct{}, 0) | ||
unpinnedCodes := make([]uint64, 0) | ||
|
||
for c := uint64(1); c < codeID; c++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 we would aways iterate through all contract-infos and pined codes but when we store the max-code-id of the last run, we could abort early when reached. No need to pass it with the proposal IMHO.
This would be an optimization, let's do it in a second PR
x/wasm/keeper/keeper.go
Outdated
} | ||
|
||
k.deleteCodeInfo(ctx, code) | ||
_ = k.wasmVM.RemoveCode(info.CodeHash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is either abort the whole operation or not. IMHO there is value in removing the code-info data from the chain state even in case of an error here. There may be individual reasons on single machines, like file permissions that must not affect consensus. On the next state sync, the file would not be transferred to new nodes.
Logging the error makes sense IMHO
x/wasm/keeper/keeper.go
Outdated
// PruneWasmCodes deletes code info for unpinned codes <= than latestCodeID | ||
func (k Keeper) PruneWasmCodes(ctx sdk.Context, latestCodeID uint64) error { | ||
usedCodeIds := make(map[uint64]struct{}, 0) | ||
usedChecksums := make(map[string]bool, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also remove the size parameter here and the line above. We know already that it will be >0
. Let's go with stdlib defaults instead
x/wasm/keeper/keeper.go
Outdated
|
||
// check if instances are used only by unpinned code ids <= latestCodeID | ||
k.IterateCodeInfos(ctx, func(codeID uint64, info types.CodeInfo) bool { | ||
if _, ok := usedCodeIds[codeID]; !ok && codeID <= latestCodeID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The codeID <= latestCodeID
is not correct. Just remove this part that we iterate over the whole set of code infos
For example, there can be code ID 1 and 10 for the same checksum. If only code ID 10 has an instance then a prune with max code id 2 would miss the instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I iterate over the whole set of code infos. I need this condition to mark as "false" only the checksums with unpinned code ids <= latestCodeID. This helps because we cannot delete checksums with unpinned code ids > latestCodeID.
These will be marked as "true", so as usedChecksums.. same as if they were pinned.
This also answer to the other comment why we need map[string]bool and it is not enough to have map[string]struct{}
x/wasm/keeper/keeper.go
Outdated
// check if instances are used only by unpinned code ids <= latestCodeID | ||
k.IterateCodeInfos(ctx, func(codeID uint64, info types.CodeInfo) bool { | ||
if _, ok := usedCodeIds[codeID]; !ok && codeID <= latestCodeID { | ||
if _, found := usedChecksums[string(info.CodeHash)]; !found { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting solution to store state with the checksum. But is it needed? If you don't set anything here, the result would be the same for if !usedChecksums[string(info.CodeHash)]
. Or do I miss something?
x/wasm/keeper/keeper.go
Outdated
// delete all unpinned code ids <= latestCodeID and all the | ||
// instances used only by unpinned code ids <= latestCodeID | ||
k.IterateCodeInfos(ctx, func(codeID uint64, info types.CodeInfo) bool { | ||
if _, ok := usedCodeIds[codeID]; !ok && codeID <= latestCodeID { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conditions make sense here. I would prefer some better naming for latestCodeID
. Maybe maxCodeID
? Latest is a moving target for me as there can be new code-infos stored while the gov proposal is in-flight
x/wasm/keeper/keeper.go
Outdated
} | ||
return false | ||
} | ||
usedChecksums[string(info.CodeHash)] = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A struct{}
would be a more memory efficient value in usedChecksums
I checked logs and it fails for the issue we already raised: empty validator set. |
@@ -167,7 +167,7 @@ func (m *MockWasmer) Unpin(checksum wasmvm.Checksum) error { | |||
} | |||
|
|||
func (m *MockWasmer) RemoveCode(checksum wasmvm.Checksum) error { | |||
if m.UnpinFn == nil { | |||
if m.RemoveCodeFn == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@@ -100,3 +102,26 @@ func BenchmarkCompilation(b *testing.B) { | |||
}) | |||
} | |||
} | |||
|
|||
// Calculate the time it takes to prune about 100k wasm codes | |||
func BenchmarkPruningWasmCodes(b *testing.B) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start to get some numbers.
This runs with a mem db, tiny contracts and no duplicate instances. I can imagine disk IO can have some impact here but let's add another DB to compare them
52de6a2
to
4dfc1f4
Compare
Closing because performance are very bad (See bench tests). |
Resolves #999