Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.x] Janus server stalls at rooms_mutex lock in janus_videoroom.c #3478

Closed
ramprakash110109 opened this issue Nov 15, 2024 · 8 comments
Closed
Labels
multistream Related to Janus 1.x

Comments

@ramprakash110109
Copy link

What version of Janus is this happening on?
v1.2.1

Have you tested a more recent version of Janus too?
No, since it was a rare issue and unable to reproduce again

Was this working before?
This is the second time I am facing this issue in past 6 months

Is there a gdb or libasan trace of the issue?
https://pastebin.com/eVLkvU3H

Additional context
Sessions were running smoothly for more than 3 months.
But suddenly, threads seems to get locked and no http requests or rmq requests were processed in mediaserver.
When I debugged, I came to know that rooms_mutex lock was not unlocked in one of the previous request. But I was unable to find the exact request which caused this issue.

I have pasted the debug_lock output where I hit new http request after the mediaserver gets locked due to some previous request.

I suspect one of the stop_rtp_forward request could have caused this. But I am not sure.

Could you please guide on this?

@ramprakash110109 ramprakash110109 added the multistream Related to Janus 1.x label Nov 15, 2024
@lminiero
Copy link
Member

Please test master, there have been a ton of fixes recently on potential deadlocks.

@ramprakash110109
Copy link
Author

videoroom.c code is almost same inmy version too. anyways, we will try to check latest master as well. Thanks

@lminiero
Copy link
Member

@ramprakash110109 any update?

@ramprakash110109
Copy link
Author

ramprakash110109 commented Dec 12, 2024 via email

@Dev2Trailblazer
Copy link

We have been running janus-gateway v1.3.0 for ~1 month, and we occasionally get similar behavior to this one. Randomly, one of our instances stops responding to all HTTP requests, for example:

  • POST /janus/{sessionID} (attach janus.plugin.videoroom)
  • POST /janus/{sessionID} (keepalive)

Instances keep failing from time to time, so it's possible that we can reproduce the issue. I'm not sure this is the same issue as this ticket. Do you have any suggestions for us on debugging it or providing more data?

Thank you!

@atoppi
Copy link
Member

atoppi commented Jan 16, 2025

The usual steps for similar issues:

  • try to reproduce with AddressSanitizer
  • enable lock debugging
    Expect a downgrade in performance and an increase in disk usage if you are saving the logs.

@Dev2Trailblazer
Copy link

Hello, it's me again!

We are not confident in enabling lock debugging for its performance implications and the cost of storing the logs. However, we re-compiled janus-gateway with -O3 -g3 -ggdb3 -fno-omit-frame-pointer. We are not confident in removing -O3 just yet.

We managed to connect GDB to the running process and extracted a stack trace with symbols. You can find them here: https://gist.github.com/Dev2Trailblazer/5dc846843713364a9a2abb41eba11f8b

I see some of the threads to be stuck on janus_mutex_lock, which makes me suspect this may be the same problem as this issue.

As a reminder, we are running version 1.3.0 (dfd86e3).

@atoppi
Copy link
Member

atoppi commented Jan 28, 2025

This thread is holding the participant->streams_mutex, while waiting for handle->mutex:

Thread 6 (LWP 4193):
#0  0x00007abe10e1a88d in syscall () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007abe115a0abc in ?? () from target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x000059efdb13f589 in janus_ice_relay_rtcp (packet=0x7abe0dfcc1a0, handle=0x7abdf4020960) at ice.c:5100
#3  janus_ice_relay_rtcp (handle=0x7abdf4020960, packet=0x7abe0dfcc1a0) at ice.c:5096
#4  0x000059efdb13f998 in janus_ice_send_pli_stream (handle=0x7abdf4020960, mindex=1) at ice.c:5160
#5  0x00007abe0dfe348e in janus_videoroom_reqpli (ps=0x7abdfd0f3610, reason=0x7abe0e041f8d "Keyframe request") at /usr/local/src/janus-gateway/src/janus_videoroom.c:2982
#6  0x00007abe0e03da4e in janus_videoroom_handler (data=<optimized out>) at /usr/local/src/janus-gateway/src/janus_videoroom.c:10914

As a consequence two threads are blocked on the participant->streams_mutex:

Thread 11 (LWP 200239):
#0  0x00007abe10e1a88d in syscall () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007abe115a0abc in ?? () from target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007abe0dfed556 in janus_videoroom_incoming_rtp_internal (session=0x7abdf42c8390, participant=0x7abdfc3654e0, pkt=0x7abd53715480) at /usr/local/src/janus-gateway/src/janus_videoroom.c:8668
#3  0x000059efdb13d386 in janus_ice_cb_nice_recv (agent=<optimized out>, stream_id=<optimized out>, component_id=<optimized out>, len=<optimized out>, buf=0x7abdf04bca00 "<redacted>", ice=0x7abdf026d5a0) at ice.c:2916
#4  0x00007abe1188ee43 in nice_component_emit_io_callback () from target:/lib/x86_64-linux-gnu/libnice.so.10
#5  0x00007abe11889fdf in component_io_cb () from target:/lib/x86_64-linux-gnu/libnice.so.10

Thread 5 (LWP 4192):
#0  0x00007abe10e1a88d in syscall () from target:/lib/x86_64-linux-gnu/libc.so.6
#1  0x00007abe115a0abc in ?? () from target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#2  0x00007abe0dfe3f0d in janus_videoroom_query_session (handle=<optimized out>) at /usr/local/src/janus-gateway/src/janus_videoroom.c:4597
#3  janus_videoroom_query_session (handle=<optimized out>) at /usr/local/src/janus-gateway/src/janus_videoroom.c:4559
#4  0x000059efdb153198 in janus_process_incoming_admin_request (request=0x7abdf849b450) at janus.c:3047
#5  0x000059efdb15615c in janus_transport_requests (data=<optimized out>) at janus.c:3592

This is all that we can gather from your gdb stack trace. We need further data unfortuantely.
I'd start with Address Sanitizer and if nothing shows up try with mutex lock debug.

We should continue the discussion on a new Issue. Please open a new one referring this one and reporting the most relevant info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multistream Related to Janus 1.x
Projects
None yet
Development

No branches or pull requests

4 participants