Prevent HTTP handler goroutines from blocking forever (F-ssv-dkg-007)#229
Prevent HTTP handler goroutines from blocking forever (F-ssv-dkg-007)#229julienh-ssv merged 7 commits intomainfrom
Conversation
olegshmuelov
left a comment
There was a problem hiding this comment.
The timeout covers channel reads in ProcessMessages and CreateInstance, but there is another blocking point in the same HTTP handler path: processDKG() writes to unbuffered board channels (o.board.DealC <- *b, ResponseC, JustificationC). If the DKG protocol goroutine has exited or is stuck, this write blocks forever before the select timeout fires — same attack vector.
Worth addressing in this PR since it is the same code path and the same finding. A buffered channel (capacity 1) or a select with timeout on the write side would close the gap.
Separately, the DKG protocol goroutines (StartDKG, StartReshareDKG*) that wait on p.WaitEnd() will also leak if the protocol stalls — not a handler-blocking issue but a resource leak. Worth a separate follow-up finding.
olegshmuelov
left a comment
There was a problem hiding this comment.
All concerns addressed, board channel write timeouts, identity-guarded cleanup, reflect simplification. Clean implementation.
One follow-up worth tracking separately: the DKG protocol goroutines (StartDKG, StartReshareDKG*) that wait on p.WaitEnd() will leak if the protocol stalls. Not a handler-blocking issue, but a resource leak under the same attack model. Can you open a follow-up for that in Github and the board pls?
# Conflicts: # pkgs/dkg/drand.go
Fix for:
Blocking Channel Reads with No Timeout
ssv-dkg · Security · Assigned: dev4 · Effort: —
ProcessMessagesblocks indefinitely on<-iw.respChan.CreateInstanceblocks on<-bchan. If the DKG protocol stalls (lost message, deadlock), the HTTP handler goroutine is permanently blocked.Impact: Attacker can exhaust all server goroutines by triggering protocol stalls, making the operator unresponsive to all requests.
Recommendation: Use
selectwithcontext.WithTimeoutfor all channel reads. A reasonable timeout is theMaxInstanceTime(currently 1 minute).