ChannelClosedException when server close connection right after releasing the last http2 stream #6258
+7
−4
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
The AWS service returns the final part of the response.
ResponseHandler.finalizeResponse is invoked to complete processing of the response.
The client sends an RST_STREAM frame to acknowledge the completion.
It then calls release on the channel pool. Since the channel pool consists of multiple layers, this invocation eventually reaches HttpOrHttp2ChannelPool.release.
HttpOrHttp2ChannelPool.release needs to access the protocolImpl field, which is only safely accessed from the pool’s event loop. To ensure thread safety, it submits a task to perform the release within that event loop.
Meanwhile, the server receives the reset frame and decides to immediately close the connection (No idea why. It's a question for the service developer).
This triggers channelInactive on the Http2ConnectionHandler.
As a result, MultiplexedChannelRecord.closeAndExecuteOnChildChannels is called. It detects that there are still unreleased child channels because the task submitted in step 5 has not yet been executed. This leads to a ClosedChannelException being thrown and logged as an error.
The release task is now running on the pool's event loop, but a little bit too late.
It may also be the root cause of #2914
Modifications
By declaring protocolImpl as volatile, we ensure visibility across threads once it is assigned within the pool's event loop. This guarantees that it can be safely accessed without thread-safety concerns. The underlying BetterFixedChannelPool is thread-safe, as its mutable state is managed exclusively within its dedicated event loop. For release operations, state updates are performed via a future listener after the underlying pool completes the release, effectively avoiding the concurrency issues observed with HttpOrHttp2ChannelPool.
Tests
Reproducing the issue consistently in a test environment is challenging, as it relies on the precise timing and order of event handler invocations.
We’ve been running the fix without the volatile keyword on protocolImpl in one of our production services for some time. It has been working well, as evidenced by a noticeable drop in error logs:

The deployment with the volatile keyword added is now live. I’ll share an update on the results shortly.