-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unknown socket error #64305
Comments
Tagging subscribers to this area: @dotnet/ncl Issue DetailsDescriptionWe are POC'ing a basic microservice architecture that uses the template dotnet create webapi project and just adds an IHostedService to do background processing. Kestrel is used in order to be able to answer health checks from Kubernetes, so this basic architectural idea seems to make sense. Our background work for this POC involves pretty heavy interaction with AWS resources (SQS, S3, STS), and we are encountering SocketExceptions that have been very difficult to pin down. The canon guidance to use a static HttpClient instance application-wide does not work well when using the AWS clients in Amazon's SDK packages. The various AWS client constructors, and the config objects you can hand in, do not accept an instance of HttpClient. Instead, you must either rely on their built-in caching of HttpClient instance(s) that are created internally, or you have to provide an implementation that derives from their HttpClientFactory. This is fairly straight-forward if you use IHttpClientFactory in .Net. The Create() method override that you implement simply calls .Create() on an instance of IHttpClientFactory provided via DI. However, even when providing this HttpClientFactory to all instances of any AWS client that is instantiated anywhere and everywhere in this service, it does not solve the SocketException problem. I will provide the stack trace below, but the properties of the actual SocketException object are extremely unhelpful:
The twist here is that these errors have only been encountered on Linux/in containers. Extended running tests on Windows cannot reproduce the problem when tested under load. The other twist is that our health checks calls are producing Given that this architectural choice of using a WebApplication to host a long-lived service doing background processing seems the most ideal and straightforward way of designing basic microservices, we would like to get to the bottom of this issue and figure out a way to have Kestrel and IHttpClientFactory play nice. If I am mischaracterizing the issue and there is a different cause, again, we'd very much like to get to the bottom of it. The fact that the runtime is throwing a SocketException with an error code that isn't even in the spec for TCP errors (nice article here) is the reason I am bringing it to this group. The runtime is burping up what would appear to be the equivalent of the Any help or guidance here is appreciated. Reproduction StepsThe basic structure is a web api project created with dotnet create webapi with an Expected behaviorInteraction with AWS resources from a web application should not cause SocketExceptions. Actual behaviorWe are encountering SocketExceptions that have error codes that aren't even part of the TCP specs:
Full stack trace:
Regression?The application in its current form is a .Net 6 solution. In a prior version, it was .Net 5 and the SocketExceptions were very rare if not non-existent. However, even in the .Net 5 solution Kestrel was throwing the Known WorkaroundsWe have implemented heavy retry policies using Polly, which allows the application to move past the errors, but it quite often takes a near complete restart of the background processing logic to clear up the SocketExceptions and allow the application to restart a job and create new connections to the AWS resources being used. Configuration
The errors only appear running in containers on Linux. Other informationNo response
|
Any chance you can run it under |
This issue has been marked |
The error code we get isn't helpful.
From the stacktrace we see the exception occurs during connect to If you let .NET generate a trace for the app, the The other exception is different problem:
This exception can happen when multiple |
Are you talking about running
Yeah, I brought this part up because I have strong suspicions that this may be what is causing all of the problems. This part of the issue has existed since the inception of this application. If Kestrel is being allocated sockets in a way that is not copacetic to the way |
I mean dotnet-trace.
Something causes the file descriptor to be registered twice. It could be your code or a 3rd party library. If you're not doing anything special with handles, you should try and figure out what 3rd party library it causing the issue. I doubt it is .NET or ASP.NET Core because there would be reports about it. |
This is the exception from the
That doesn't look like anything to do with the file system unless that's under the hood. |
Yes, this stacktrace doesn't tell us much. We're interested to find out who registered the first Socket for this fd. |
About There is a related issue (#56750, which may probably be closed). A blog post gets mentioned In that case, the issue disappeared when removing a 3rd party library: https://zblesk.net/blog/aspnetcore-identity-litedb-breaks-on-ubuntu/. About Maybe you can make a small reproducer by making |
Having a repro for this would be extremely valuable, this is the 4th report of #56750 if I'm counting well. |
@antonfirsov it doesn't happen often, but when it does, it is a pain to debug. And when I see this, I wonder: can there still be a bug in Thinking out loud. We could add some envvar which causes the Or we extend the event source logging so it contains the |
Do you know @tmds its our tracing actually logs the actual OS error or only the translated one? This is reason why I suggested 'strace' as that gives the raw value from kernel. |
I don't think so. The |
For name resolution the tracing should be sufficient AFAIK. |
Triage: This is not the first time we see problems like these. However, we need more info (logs) to make it actionable. |
This issue has been marked |
So... in our testing, the When I shelled into one of the containers and tried running So I scoured our code again from top to bottom, left to right and made sure every call was using the Next, I dug into New Relic's .Net Agent source code. We are using So I commented out every call to the NewRelic Agent, every tldr; New Relic's .Net Agent is using So what's left here that maybe deserves some further scrutiny?
Providing a full repro that doesn't expose custom logic/credentials may be hard for me to get figured out. We are doing heavy interaction with AWS as mentioned. A Kestrel app, with an Sorry this is so long. Let me know what I can do further. |
may be related to #61798. @MihaZupan was looking into that. I don't see direct correlation but they certainly do many unusual things. |
Tagging subscribers to this area: @dotnet/area-system-runtime Issue DetailsDescriptionWe have an application based on template dotnet create webapi project template and just adds an IHostedService to do background processing. Kestrel is used in order to be able to answer health checks from Kubernetes, as well as feed new requests into the service to be queued. Our background processing logic involves pretty heavy interaction with AWS resources (SQS, S3, STS), and we are encountering SocketExceptions that have been very difficult to pin down. The canon guidance to use a static HttpClient instance application-wide does not work well when using the AWS clients in Amazon's SDK packages. The various AWS client constructors, and the config objects you can hand in, do not accept an instance of HttpClient. Instead, you must either rely on their built-in caching of HttpClient instance(s) that are created internally, or you have to provide an implementation that derives from their HttpClientFactory. This is fairly straight-forward if you use IHttpClientFactory in .Net. The Create() method override that you implement simply calls .Create() on an instance of IHttpClientFactory provided via DI. However, even when providing this HttpClientFactory to all instances of any AWS client that is instantiated anywhere and everywhere in this service, it does not solve the SocketException problem. I will provide the stack trace below, but the properties of the actual SocketException object are extremely unhelpful:
The twist here is that these errors have only been encountered on Linux/in containers. Extended running tests on Windows cannot reproduce the problem when tested under load. The other twist is that our Kestrel calls are producing The fact that the runtime is throwing a SocketException with an error code that isn't even in the spec for TCP errors (nice article here) is the reason I am bringing it to this group. The runtime is burping up what would appear to be the equivalent of the Any help or guidance here is appreciated. Reproduction StepsThe basic structure is a Web API project created with dotnet create webapi with an Expected behaviorInteraction with AWS resources from a web application should not cause SocketExceptions. Actual behaviorWe are encountering SocketExceptions that have error codes that aren't even part of the TCP specs:
Full stack trace:
Regression?The application in its current form is a .Net 6 solution. In a prior version, it was .Net 5 and the SocketExceptions were very rare if not non-existent. However, even in the .Net 5 solution Kestrel was throwing the Known WorkaroundsWe have implemented heavy retry policies using Polly, which allows the application to move past the errors, but it quite often takes a near complete restart of the background processing logic to clear up the SocketExceptions and allow the application to restart a job and create new connections to the AWS resources being used. Configuration
The errors only appear running in containers on Linux. Other informationN/A
|
Also, is there a use case for SafeFileHandle(IntPtr.Zero, true)? You established there's one for SafeFileHandle(IntPtr.Zero, false). Given how much the questionable documentation has spread around, it would be great to have a code solution. I'd be happy to write something, but it does sound like based on this conversation that it would just be rejected. |
If it is caused by
In practice, I don't think so.
You mean a better example for the The |
I mean in the constructor for SafeFileHandle, if the first parameter is 0 and the second parameter is true, throw an exception. Or can you do warnings like that (based on parameters)? I'm looking through private repositories I have access to and I see this in 1 other unrelated project, which is not great. I agree with you that the documentation for IDisposable shouldn't mention SafeFileHandle at all. Very dangerous function calls don't have a place anywhere near a 'best practices how to implement IDisposable tutorial'. I see it copied around the internet to other places on how to implement the IDisposable pattern, and it's like they spread around a kind of knowledge infection where they are instead teaching how to ruin your program on Linux. The irony level for a code example teaching best practices is through the roof. I'm going around to a few of these places and pointing it out. Considering that the 2nd example doesn't even contain a disposable object (it's just the code needed to dispose one if one did exist), I don't know that you need to have any representation of a disposable object at all rather than like a comment saying that this is where you'd place one. Do you have a preference? Is it standard to include a representative example line in docs like this or just use a comment (or nothing)? |
Now that we've identified |
The SafeFileHandle stuff is not my issue, no. We are using a third-party document processing engine that does use unmanaged code, and I do need to verify 100% that they are not actually doing this. But until then, we need to consider that my issue is caused by something else and I still need to do a thorough investigation. My time at work has not been able to put this up in the priority stack high enough to get to that just yet. I apologize. Can we please leave this issue open a bit longer? |
Yes, we'll keep it open until you had some time to investigate and check whether the root cause is also the IntPtr.Zero SafeFileHandle. |
The problem remains in .net6 |
@SpringHgui is it possible that your code or a library you rely on instantiates a |
The problem remains in .net8 |
do you have repro you can share @SpringHgui ? The linked repo above has way too much stuff. Alternatively can you do the |
I encountered this error because I used the |
I encountered this socket error when working on a MAUI app. For me the error only happened on some old Android devices and not consistently. What seemed to fix it for me was to explicitly close my socket instead of relying on dispose being called for me. Broken code
Working code
|
interesting. how many destinations are you trying to scan @icefire1 ? I'm wondering if you run out of max open file descriptors in this case... |
@wfurt difficult to say exactly as it happened inconsistently, but I would guess somewhere around 10-40 scans over a duration of a few seconds to a few minutes. Perhaps it's because too many sockets where opened at once. I would add, however, that I cannot replicate it by simply opening a lot of socket like so:
Above code would indeed give me an exception, but it would be ErrorCode 23 instead. With a message |
Makes me wonder if it is related to the |
Hey guys. I am running into this exact issue. I have an app that I migrated from Xamarin to .net MAUI. I am on .net 9.0.200 currently with maui 9.0.40. I have tried .net 8 as well. My app uses quite a bit of HttpClient via the HttpClientFactory. I also use the Preference API and read and write some json files as well. Something in the app is closing FD 0 (stdin). The next time something opens a file descriptor (httpclient, File API, Preference Api, Sqlite, etc...) it will pick up FD 0. When later it is closed, the app container OS Watchdog sends a SIGKILL to the process leaving me with no information to troubleshoot. If it is HttpClient that picks it up with HttpSocketsHandler it manifests in the Unkown Socket Error but also sometimes directly ends in a SIGKILL. Being how this is compiled and ran on iOS and Android in a signed package makes it even more abstract to troubleshoot. Can you guys provide any guidance on how to run this down. I have spent months trying to figure this out. I am at a loss trying to figure out "What" is closing FD0. It is almost always in the finalizer thread. Here are some examples of what I see: HttpClient w/ HttpSocketsHandler
Randomly I see this behavior where iOS kills the app pid due to:
|
After reading through some of the comments above I Searched my codebase for SafeHandle. I found one instance in a low level base class for our implimentation of PropertyChangedBase. It looks like we had dispose methods grabbing a SafeHandle with IntPtr.Zero. looks like it was straight out of the documentation from here: This caused a lot of really bad problems. I wonder how many people have grabbed that out of the documentation and added specifics without removing the SafeHandle(IntPtr.Zero, true) code. |
Description
We have an application based on template dotnet create webapi project template and just adds an IHostedService to do background processing. Kestrel is used in order to be able to answer health checks from Kubernetes, as well as feed new requests into the service to be queued. Our background processing logic involves pretty heavy interaction with AWS resources (SQS, S3, STS), and we are encountering SocketExceptions that have been very difficult to pin down.
The canon guidance to use a static HttpClient instance application-wide does not work well when using the AWS clients in Amazon's SDK packages. The various AWS client constructors, and the config objects you can hand in, do not accept an instance of HttpClient. Instead, you must either rely on their built-in caching of HttpClient instance(s) that are created internally, or you have to provide an implementation that derives from their HttpClientFactory. This is fairly straight-forward if you use IHttpClientFactory in .Net. The Create() method override that you implement simply calls .Create() on an instance of IHttpClientFactory provided via DI.
However, even when providing this HttpClientFactory to all instances of any AWS client that is instantiated anywhere and everywhere in this service, it does not solve the SocketException problem. I will provide the stack trace below, but the properties of the actual SocketException object are extremely unhelpful:
Unknown socket error; ErrorCode: -131074; SocketErrorCode: SocketError; NativeErrorCode: -131074
The twist here is that these errors have only been encountered on Linux/in containers. Extended running tests on Windows cannot reproduce the problem when tested under load.
The other twist is that our Kestrel calls are producing
System.InvalidOperationException: Handle is already used by another Socket.
errors (they are logged as Warning level) in Kestrel's handling pipeline. It would appear that Kestrel, and the HttpWebRequest pool used underneath IHttpClientFactory are stomping on each other. If I am mischaracterizing the issue and there is a different cause, please let me know.The fact that the runtime is throwing a SocketException with an error code that isn't even in the spec for TCP errors (nice article here) is the reason I am bringing it to this group. The runtime is burping up what would appear to be the equivalent of the
default
case in aswitch
statement.Any help or guidance here is appreciated.
Reproduction Steps
The basic structure is a Web API project created with dotnet create webapi with an
IHostedService
doing background processing interacting heavily with AWS resources. I can provide application code privately, as needed, so as to not have to share proprietary/private logic and access keys publicly.Expected behavior
Interaction with AWS resources from a web application should not cause SocketExceptions.
Actual behavior
We are encountering SocketExceptions that have error codes that aren't even part of the TCP specs:
Unknown socket error; ErrorCode: -131074; SocketErrorCode: SocketError; NativeErrorCode: -131074
Full stack trace:
Regression?
The application in its current form is a .Net 6 solution. In a prior version, it was .Net 5 and the SocketExceptions were very rare if not non-existent.
However, even in the .Net 5 solution Kestrel was throwing the
System.InvalidOperationException: Handle is already used by another Socket.
warnings when handling basic health check calls.Known Workarounds
We have implemented heavy retry policies using Polly, which allows the application to move past the errors, but it quite often takes a near complete restart of the background processing logic to clear up the SocketExceptions and allow the application to restart a job and create new connections to the AWS resources being used.
Configuration
The errors only appear running in containers on Linux.
Other information
N/A
The text was updated successfully, but these errors were encountered: