Skip to content

[deleted] #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 10, 2025
Merged

[deleted] #158

merged 1 commit into from
Feb 10, 2025

Conversation

ryanofsky
Copy link
Collaborator

@ryanofsky ryanofsky commented Feb 10, 2025

[deleted]

Currently, there are two cases where code may attempt to lock
`EventLoop::mutex` after the `EventLoop::loop()` method has finished running.
This is not a problem by itself, but is a problem if there is external code
which deletes the `EventLoop` object immediately after the method returns,
which is the case in Bitcoin Core. Both cases of locking after the loop method
returns happen due to uses of the `Unlock()` utility function which unlocks a
mutex temporarily, runs a callback function and relocks it.

The first case happens in `EventLoop::removeClient` when the
`EventLoop::done()` condition is reached and it calls `Unlock()` in order to be
able to call `write(post_fd, ...)` without blocking and wake the EventLoop
thread. In this case, since `EventLoop` execution is done there is not really
any point to using `Unlock()` and relocking, so the code is changed to use a
simple `lock.unlock()` call and not try to relock.

The second case happens in `EventLoop::post` where `Unlock()` is used in a
similar way, and depending on thread scheduling (if the thread is delayed for a
long time before relocking) can result in failing to relock
`EventLoop::m_mutex` after calling `write()`. In this case, since relocking the
mutex is actually necessary for the code that follows, the fix is different:
new `addClient`/`removeClient` calls are added to this code, so the
`EventLoop::loop()` function will never exit while the `post()` function is
waiting.

These two changes are being labeled as a bugfix even though there is not
technically a bug here in the libmultiprocess code or API. The `EventLoop`
object itself was safe before these changes as long as outside code waited for
`EventLoop` methods to finish executing before deleting the `EventLoop` object.
Originally the multiprocess code added in
bitcoin/bitcoin#10102 did this, but later as more
features were added for binding and connecting to unix sockets, and unit tests
were added which would immediately delete the `EventLoop` object after
`EventLoop::loop()` returned, it meant this code could now start failing to
relock after unlocking.

A previous attempt was made to fix this bug in
bitcoin/bitcoin#31815 outside of libmultiprocess code.
But it only addressed the first case of a failing `Unlock()` in
`EventLoop::removeClient()`, not the second case in `EventLoop::post()`.

This first case described above was not observed but is theoretically possible.
The second case happens intermittently on macos and was reported
bitcoin-core#154
@ryanofsky ryanofsky merged commit 61bf945 into bitcoin-core:master Feb 10, 2025
@ryanofsky ryanofsky changed the title bugfix: Do not lock EventLoop::mutex after EventLoop is done [deleted] Feb 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant