Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

srvd resilience can fail due to lack of file handles #1821

Open
cconstab opened this issue Mar 23, 2025 · 5 comments
Open

srvd resilience can fail due to lack of file handles #1821

cconstab opened this issue Mar 23, 2025 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@cconstab
Copy link
Member

Describe the bug

If an srvd cannot get a file handle for a new socket it still grabs the mutex.

Steps to reproduce

  1. First run srvd
  2. hammer the srvd with half open connections
  3. eventually it will run out of file handles (def on prod at 1024)
  4. srvd still grabs mutex then files to get socket handle

Expected behavior

get the socket first then the mutex (if possible) or some way to release the mutex (nasty)

Screenshots

SHOUT|2025-03-19 17:20:54.887268| srvd |😎 Will handle request from @cconstab; acquired mutex 9b8070c1-f4cd-411b-807b-4798fe73d5c7.session_mutexes.sshrvd@rv_am
Unhandled exception:
SocketException: Failed to create server socket (OS Error: Too many open files, errno = 24), address = 0.0.0.0, port = 0
#0      _NativeSocket.bind (dart:io-patch/socket_patch.dart:1218)
<asynchronous suspension>
#1      _RawServerSocket.bind.<anonymous closure> (dart:io-patch/socket_patch.dart:2157)
<asynchronous suspension>
#2      _ServerSocket.bind.<anonymous closure> (dart:io-patch/socket_patch.dart:2513)
<asynchronous suspension>
#3      SocketConnector.serverToServer (package:socket_connector/src/socket_connector.dart:320)
<asynchronous suspension>
#4      socketConnector (package:noports_core/src/srvd/socket_connector.dart:94)
<asynchronous suspension>

Smartphones

  • srvd 5.9.0

Were you using an atApplication when the bug was found?

srvd

Additional context

No response

@cconstab cconstab added the bug Something isn't working label Mar 23, 2025
@gkc
Copy link
Contributor

gkc commented Mar 23, 2025

It is the spawned isolate which binds the ports and then sends the ports back to the main isolate. The resolution would be for each relay to first spawn its isolate, and only try to acquire the mutex when it has heard back from the spawned isolate. If it acquires the mutex, great, respond to the noports client and all is well. If it does not acquire the mutex then it should kill the isolate.

@gkc
Copy link
Contributor

gkc commented Mar 24, 2025

@cconstab I've got a fix ready to go for this; the relay now spawns its isolate but has a timeout when awaiting the port pair info. If it doesn't hear back from the spawned isolate within one second, it logs an exception and returns without trying to grab the mutex. One second should be more than enough to avoid having false negatives where the isolate spawns successfully but the main isolate times out while waiting for the spawned to send the port pair info.

@gkc
Copy link
Contributor

gkc commented Mar 24, 2025

@cconstab FYI in my testing it typically takes about 15ms to spawn the isolate, bind the two ports, send the info to the main isolate, and have the main isolate receive it.

@gkc
Copy link
Contributor

gkc commented Mar 25, 2025

@cconstab #1825 fixes this. Please retest when you have time - thank you

@gkc
Copy link
Contributor

gkc commented Mar 31, 2025

I believe this is fixed but will leave for @cconstab to verify and close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants