Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix PeerManager mutex bottleneck #875

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

Conversation

gammazero
Copy link
Contributor

@gammazero gammazero commented Mar 7, 2025

Do not try to acquire a MessageQueue mutex while holding the PeerManager mutex. A MessageQueue mutex may already be held for message processing and waiting for these will delay releasing the global PeerManager mutex. Instead, enqueue calls to MessageQueue and let an asynchronous goroutine execute the calls so that the PeerManager lock can be released sooner, without having to wait for the MessageQueue operations.

In short, this decouples the PeerManager from the per-peer MessageQueue mutexes.

Kubo PR: ipfs/kubo#10749

@gammazero gammazero force-pushed the fix-pm-mutex-bottleneck branch 2 times, most recently from 7c0ca2e to f9be3f2 Compare March 7, 2025 07:01
@gammazero gammazero marked this pull request as ready for review March 7, 2025 19:02
@gammazero gammazero requested a review from a team as a code owner March 7, 2025 19:02
@gammazero gammazero force-pushed the fix-pm-mutex-bottleneck branch from 8bb481e to 7b97171 Compare March 11, 2025 06:26
Copy link
Contributor

@guillaumemichel guillaumemichel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a way to quantify performance gains of linearizing the execution using chanqueue?

@gammazero gammazero marked this pull request as draft March 11, 2025 15:10
Do not try to acquire a MessageQueue mutex while holding the PeerManager mutex. A MessageQueue mutex may already be held for message processing and waiting for these will delay releasing the global PeerManager mutex. Instead, start goroutines to update the MessageQueue asynchronously so that the PeerManager lock can be released sooner.
@gammazero gammazero force-pushed the fix-pm-mutex-bottleneck branch from 7b97171 to bc94cda Compare March 12, 2025 02:13
@gammazero
Copy link
Contributor Author

gammazero commented Mar 12, 2025

Do we have a way to quantify performance gains of linearizing the execution using chanqueue?

Watch staging metrics for now.

There is not any "linearizing the execution" since the code passed to the chanqueue was already serialized by the MessageQueue wllock mutex, and there is a separate chanqueue for each peer (one per MessageQueue).

My bigger concern is that now there are 2 more goroutines per peer, so if this does not reduce waiting goroutines then it actually can make things worse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants