fix: DKG goroutine and instance lifecycle (v3.1.0)#239
Draft
olegshmuelov wants to merge 3 commits intomainfrom
Draft
fix: DKG goroutine and instance lifecycle (v3.1.0)#239olegshmuelov wants to merge 3 commits intomainfrom
olegshmuelov wants to merge 3 commits intomainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes v3.1.0 QA 2.6 "goroutine cleanup — memory leaks still persist." Threads a lifecycle context through
LocalOwner+instWrapper, replaces kyber'sTimePhaserwith a cancel-aware version, and adds a background reaper so expired instances release heap pressure under sparse traffic.What's fixed
<-o.donedeadlock on the success path — nothing ever sent too.done, so every successful reshare leaked its WaitEnd goroutine.sync.Once-guarded close now used acrossPostDKG,PostReshare,broadcastError.bchansenders on timeout — broadcast closure exits viainstanceCtx.Done().runWaitEndracesWaitEnd()againstctx.Done()with a 5s grace for late completions.TimePhaserresidue (~30s) — replaced withcancellablePhaserinpkgs/wire/phaser.go; kyber exits within one phase signal.Instance.Close()plumbed intoProcessMessagetimeout,cleanInstances,validateInstances;ProcessMessageshonors the lifecycle ctx.Switch.StartReapersweeps expired entries every 30s.Scope
In: goroutine lifecycle, kyber cancellation, instance eviction, heap-retention reaper.
Deferred (future releases): error-code unification (1.4 / 1.5 / 2.3), Pong version (4.1), body-limit tightening (2.1), atomic
MaxInstancescap, CLI, Docker/CI, streaming SSZ.Known limitations
MaxInstances=1024may briefly overshoot by the number of in-flight admission goroutines.Test plan
go test -race -timeout 300s ./pkgs/...— all passgolangci-lint run ./pkgs/...— 0 issuespkgs/wire/phaser_test.go,pkgs/dkg/lifecycle_test.go,pkgs/operator/lifecycle_test.go; goleak verifies kyber residue clears on cancel viaTestDKGCancelReleasesKyberGoroutines