Description
Recently, the main build failed. An affected tests (EventsTest.matchingErrorEventsMustRunImmediately
) failed because it assumes that an error event handler that is added to a provider in an error state will be called immediately and exactly once. However, this handler was called twice.
On researching that, it turned out that the test fails only under certain circumstances. In my case this was when I executed the whole test suite; running only the test mentioned did not result in an error.
I believe the cause to be a race condition in the following parts of the sdk:
final TriConsumer<EventProvider, ProviderEvent, ProviderEventDetails> localOnEmit = this.onEmit;
if (localOnEmit != null) {
emitterExecutor.submit(() -> localOnEmit.accept(this, event, details));
}
This code snippet shows parts of how the EventProvider
emits events. Note that the localOnEmit
is called only inside a lambda that is executed by the emitterExecutor
. When this is changed to be invoked to on the current thread, the test does not fail.
The failing test adds a new error event listener to a client with a provider already in an error state.
I believe that in the time the emitterExecuter
takes to start executing the lambda, the handler is added to the collection of handlers, and is therefore executed a second time by the emitterExecutor
. This might be why the test does not fail when instead the current thread calls localOnEmit.accept(this, event, details)
.
Just moving the execution of this call to the current thread might not be enough to fix this issue, however, as we do not know the interleaving of threads in a general case.
Instead, it might be required to guard additions of new handlers and running of existing ones with a lock.