Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change SocketException to ServerNotAvailable + misc test fixes/logging #5354

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

noahfalk
Copy link
Member

@noahfalk noahfalk commented Mar 18, 2025

This started out as a bit of test logging. Then the unrelated GetProcessInfo test failed in CI so the change grew into a small product fix + more test improvements.

Product changes:

  • Connecting to the IPC channel might fail when the runtime is starting up, shutting down, or is overloaded. If so the socket.Connect() call is going to throw a SocketException. In order to avoid having callers to DiagnosticClient APIs understand all the implrementation details and have to use catch clauses for a wide variety of different exceptions we now catch this exception and rethrow it wrapped as a ServerNotAvailableException.

Test changes:

  • Add logging to the DuplicateMetricNames test to improve diagnostics when it hangs.
  • Add retries to the GetProcessInfo requests that are running while the target process is starting up. There is no guarantee the runtime is listening to the socket at this point and connection requests may be refused. This may resolve one cause of flakiness noted in BasicProcessInfoNoSuspendTest and BasicProcessInfoNoSuspendTestAsync are flaky #2760. The issue also shows a 'ReadPastEndOfStream' problem that is not addressed by this fix.
  • Add verification that the ExePath returned by DebuggeeCompiler actually exists on disk. This catches some build/config issues in the test setup more eagerly.
  • Add verification that the BuildProjectFramework and TargetConfiguration used by SdkPrebuiltDebuggeeCompiler are populated. This catches some specific test configuration errors more eagerly.
  • Tweaked some out-of-date comments

Trying to diagnose why the test isn't completing and the issue isn't reproing locally.
@noahfalk noahfalk requested a review from a team as a code owner March 18, 2025 07:19
Product changes:
- Connecting to the IPC channel might fail when the runtime is starting up, shutting down, or is overloaded. If so the socket.Connect() call is going to throw a SocketException. In order to avoid having callers to DiagnosticClient APIs understand all the implrementation details and have to use catch clauses for a wide variety of different exceptions we now catch this exception and rethrow it wrapped as a ServerNotAvailableException.

Test changes:
- Add retries to the GetProcessInfo requests that are running while the target process is starting up. There is no guarantee the runtime is listening to the socket at this point and connection requests may be refused. This may resolve one cause of flakiness noted in dotnet#2760. The issue also shows a 'ReadPastEndOfStream' problem that is not addressed by this fix.
- Add verification that the ExePath returned by DebuggeeCompiler actually exists on disk. This catches some build/config issues in the test setup more eagerly.
- Add verification that the BuildProjectFramework and TargetConfiguration used by SdkPrebuiltDebuggeeCompiler are populated. This catches some specific test configuration errors more eagerly.
- Tweaked some out-of-date comments
@noahfalk noahfalk changed the title Add some logging the DuplicateMetricName test Change SocketException to ServerNotAvailable + misc test fixes/logging Mar 19, 2025
@noahfalk
Copy link
Member Author

PTAL @dotnet/dotnet-diag

}
catch (ServerNotAvailableException) when (i < retryCount-1)
{
_output.WriteLine($"Failed to connect to the IPC channel as the process is starting up. Attempt {i} of {retryCount}. Waiting 0.1 seconds, then retrying.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the last log would have i and retryCount off by 2, suggesting 2 more retries.

Suggested change
_output.WriteLine($"Failed to connect to the IPC channel as the process is starting up. Attempt {i} of {retryCount}. Waiting 0.1 seconds, then retrying.");
_output.WriteLine($"Failed to connect to the IPC channel as the process is starting up. Attempt {i+1} of {retryCount}. Waiting 0.1 seconds, then retrying.");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants