You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On my self-hosted setup, colibri endpoint allocation occasionally takes over 5 seconds, causing jicofo to timeout and drop the bridge. This leads to sporadic failures when starting a conference -- starting it a second or third time sometimes fixes the issue if the colibri endpoint is allocated quickly enough.
Current behavior
Relevant part of the jicofo log
Jicofo 2024-12-04 12:17:18.808 SEVERE: [130] [[email protected] meeting_id=d4e34611-0d36-4f51-94cb-09478a70abd4] ColibriV2SessionManager.allocate#397: Failed to allocate a colibri2 endpoint for d01221d8: Timeout
Jicofo 2024-12-04 12:17:20.188 INFO: [131] [[email protected] meeting_id=d4e34611-0d36-4f51-94cb-09478a70abd4] ColibriV2SessionManager.allocate#378: Ignoring response for a session that's no longer active ([email protected]/157354f3-2348-498b-bc9b-e0bd05258b5b)
Jicofo 2024-12-04 12:17:20.214 SEVERE: [131] [[email protected] meeting_id=d4e34611-0d36-4f51-94cb-09478a70abd4 participant=6a63fefa] ParticipantInviteRunnable.doRun#230: Failed to allocate colibri channels
org.jitsi.jicofo.bridge.colibri.ColibriAllocationFailedException: Session no longer active ([email protected]/157354f3-2348-498b-bc9b-e0bd05258b5b)
at org.jitsi.jicofo.bridge.colibri.ColibriV2SessionManager.allocate(ColibriV2SessionManager.kt:379)
at org.jitsi.jicofo.conference.ParticipantInviteRunnable.doRun(ParticipantInviteRunnable.java:214)
at org.jitsi.jicofo.conference.ParticipantInviteRunnable.run(ParticipantInviteRunnable.java:153)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Jicofo 2024-12-04 12:17:20.803 INFO: [128] [[email protected] meeting_id=d4e34611-0d36-4f51-94cb-09478a70abd4] JitsiMeetConferenceImpl$ColibriSessionManagerListener.bridgeRemoved#2513: Bridge Bridge[[email protected]/157354f3-2348-498b-bc9b-e0bd05258b5b, version=2.3.168-g28674f78, relayId=null, region=null, stress=0.02] was removed from the conference. Re-inviting its participants: [d01221d8, 6a63fefa]
Jicofo 2024-12-04 12:17:20.823 SEVERE: [130] [[email protected] meeting_id=d4e34611-0d36-4f51-94cb-09478a70abd4 participant=d01221d8] ParticipantInviteRunnable.doRun#230: Failed to allocate colibri channels
org.jitsi.jicofo.bridge.colibri.ColibriAllocationFailedException: Timeout
at org.jitsi.jicofo.bridge.colibri.ColibriV2SessionManager.handleResponse(ColibriV2SessionManager.kt:436)
at org.jitsi.jicofo.bridge.colibri.ColibriV2SessionManager.allocate(ColibriV2SessionManager.kt:392)
at org.jitsi.jicofo.conference.ParticipantInviteRunnable.doRun(ParticipantInviteRunnable.java:214)
at org.jitsi.jicofo.conference.ParticipantInviteRunnable.run(ParticipantInviteRunnable.java:153)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)
Jicofo 2024-12-04 12:17:21.452 INFO: [131] [[email protected] meeting_id=d4e34611-0d36-4f51-94cb-09478a70abd4] ColibriV2SessionManager.allocate#281: Allocating for 6a63fefa
Jicofo 2024-12-04 12:17:21.453 INFO: [130] [[email protected] meeting_id=d4e34611-0d36-4f51-94cb-09478a70abd4] ColibriV2SessionManager.allocate#281: Allocating for d01221d8
Jicofo 2024-12-04 12:17:21.507 WARNING: [131] BridgeSelector.selectBridge#182: There are no operational bridges.
Jicofo 2024-12-04 12:17:21.538 WARNING: [130] BridgeSelector.selectBridge#182: There are no operational bridges.
Jicofo 2024-12-04 12:17:21.539 SEVERE: [131] [[email protected] meeting_id=d4e34611-0d36-4f51-94cb-09478a70abd4 participant=6a63fefa] ParticipantInviteRunnable.doRun#218: Can not invite participant, no bridge available.
Jicofo 2024-12-04 12:17:21.576 SEVERE: [130] [[email protected] meeting_id=d4e34611-0d36-4f51-94cb-09478a70abd4 participant=d01221d8] ParticipantInviteRunnable.doRun#218: Can not invite participant, no bridge available.
(room name, domain, and external IP address redacted to roomname, redacted.domain and 123.123.123.123 respectively)
(room name, domain, and external IP address redacted to roomname, redacted.domain and 123.123.123.123 respectively)
As you can see, colibri endpoint allocation takes too long, so jicofo gives up waiting for it. After that, it removes the bridge from the meeting, and because this is the only bridge that I have, the meeting dies and has to be restarted.
Expected Behavior
I would like to adjust the timeout in jicofo so that it doesn't give up waiting for the colibri endpoint too early.
Possible Solution
The part that waits for the allocation request is here:
The nextResult function is documented here. It uses the default connection (AbstractXMPPConnection) timeout that is initialized through SmackConfiguration to be 5 seconds. A timeout of 5 seconds is roughly consistent with what I am observing in the logs.
// How long to wait for a response to a stanza before giving up.
reply-timeout = 15 seconds
I may have missed something in the analysys above, in which case I apologize -- I have only taken a cursory look and I don't use Java/Kotlin much.
The best solution to me appears to be to setReplyTimeout on xmppConnection in ColibriV2SessionManager with the timeout value from the config, but maybe I am missing some system design details here.
Steps to reproduce
I honestly don't know what causes the endpoint allocation to be slow -- I noticed that the colibri reply lists all of the server's IP addresses, so it may be because my server has multiple of them (I'm doing VPN shenanigans), or just because it hits some sort of IO bottleneck (I'm running off of an HDD). If you want to reproduce it, adding sleep(5000) in the part of JVB (?) that creates the colibri endpoint should probably do it. I don't know where that part of code is.
Description
On my self-hosted setup, colibri endpoint allocation occasionally takes over 5 seconds, causing jicofo to timeout and drop the bridge. This leads to sporadic failures when starting a conference -- starting it a second or third time sometimes fixes the issue if the colibri endpoint is allocated quickly enough.
Current behavior
Relevant part of the jicofo log
(room name, domain, and external IP address redacted to
roomname
,redacted.domain
and123.123.123.123
respectively)Corresponding part of the JVB log
(room name, domain, and external IP address redacted to
roomname
,redacted.domain
and123.123.123.123
respectively)As you can see, colibri endpoint allocation takes too long, so jicofo gives up waiting for it. After that, it removes the bridge from the meeting, and because this is the only bridge that I have, the meeting dies and has to be restarted.
Expected Behavior
I would like to adjust the timeout in jicofo so that it doesn't give up waiting for the colibri endpoint too early.
Possible Solution
The part that waits for the allocation request is here:
jicofo/jicofo-selector/src/main/kotlin/org/jitsi/jicofo/bridge/colibri/ColibriV2SessionManager.kt
Line 368 in dd2d778
stanzaCollector
gets created earlier:jicofo/jicofo-selector/src/main/kotlin/org/jitsi/jicofo/bridge/colibri/ColibriV2SessionManager.kt
Line 338 in dd2d778
Here is the function that does that:
jicofo/jicofo-selector/src/main/kotlin/org/jitsi/jicofo/bridge/colibri/Colibri2Session.kt
Lines 89 to 117 in dd2d778
The
nextResult
function is documented here. It uses the default connection (AbstractXMPPConnection
) timeout that is initialized throughSmackConfiguration
to be 5 seconds. A timeout of 5 seconds is roughly consistent with what I am observing in the logs.The way to change it is to either call
nextResult
with atimeout
parameter or tosetReplyTimeout
on thexmppConnection
object, neither of which jicofo currently does.There seems to be a config option for it, but apparently it is not used in this case:
jicofo/jicofo-selector/src/main/resources/reference.conf
Lines 405 to 406 in fb29dc8
I may have missed something in the analysys above, in which case I apologize -- I have only taken a cursory look and I don't use Java/Kotlin much.
The best solution to me appears to be to
setReplyTimeout
onxmppConnection
inColibriV2SessionManager
with the timeout value from the config, but maybe I am missing some system design details here.Steps to reproduce
I honestly don't know what causes the endpoint allocation to be slow -- I noticed that the colibri reply lists all of the server's IP addresses, so it may be because my server has multiple of them (I'm doing VPN shenanigans), or just because it hits some sort of IO bottleneck (I'm running off of an HDD). If you want to reproduce it, adding
sleep(5000)
in the part of JVB (?) that creates the colibri endpoint should probably do it. I don't know where that part of code is.Environment details
I followed the Self-Hosting Guide - Debian/Ubuntu server on Debian Bookworm.
The text was updated successfully, but these errors were encountered: