Skip to content

[✨ Triage] dotnet/runtime#126251 by mus65 - Runtime logs to stderr if libgssapi_krb5 is not available #176

@MihuBot

Description

@MihuBot

Triage for dotnet/runtime#126251.
Repo filter: All networking issues.
MihuBot version: 246635.
Ping MihaZupan for any issues.

This is a test triage report generated by AI, aimed at helping the triage team quickly identify past issues/PRs that may be related.
Take any conclusions with a large grain of salt.

Tool logs
dotnet/runtime#126251: Runtime logs to stderr if libgssapi_krb5 is not available by mus65
Extracted 5 search queries: Cannot load library libgssapi_krb5.so.2 Error: libgssapi_krb5.so.2: cannot open shared object file: No such file or directory, dotnet runtime writes native library load errors to stderr when libgssapi_krb5 is missing, pal_gssapi.c System.Net.Security.Native logs libgssapi_krb5 load failure to stderr, Npgsql GSSAPI fallback causes spurious stderr messages when libgssapi_krb5 not installed, TypeInitializationException from missing libgssapi_krb5 but runtime still prints native load error to stderr
Found 21 candidate issues

Below are the most relevant prior PRs / issues I found and a short summary of each (what was discussed and any conclusions that matter for the new report about pal_gssapi.c writing to stderr when libgssapi_krb5 is missing).

  • PR #55037 (July 2021) - "Shim gss api on Linux to delay loading libgssapi_krb5.so"

    • Summary: Introduced a native shim + on-demand dlopen() so the runtime no longer has a static link-time dependency on libgssapi_krb5. The goal was to tolerate containers that don't have krb5 installed and to avoid single-file regressions where an app would fail at process start even if it never used GSSAPI. The PR used a managed-side static constructor pattern (so initialization is delayed until the managed API is touched) and discussed concurrency/initialization approaches. This is the main change that removed the requirement for the library to be present at process startup; it is directly relevant because it’s the code-path that now does dynamic load attempts when the API is used.
    • Conclusion relevant to new issue: the runtime moved to lazy-loading, so missing libgssapi_krb5 is expected in many container scenarios. However, the PR focused on delaying the crash/regression — it did not (in the PR discussion) eliminate diagnostic printing to stderr coming from the native shim when dlopen fails.
  • PR #59526 (Sept 2021) - "Fix krb5 library SO name in the gss api shim"

    • Summary: Fixed the library name probed by the shim to use the runtime SONAME (libgssapi_krb5.so.2) rather than the versionless build-time .so. The discussion covered probe order and distro differences (MIT vs Heimdal), and trade-offs of probing the versionless .so.
    • Conclusion relevant to new issue: changed which filenames the shim attempts, which affects when/why dlopen fails and thus what error text appears. The change was merged.
  • Issue #45720 (Dec 2020) - "Publishing release as single file does not include all libraries (libgssapi_krb5)"

    • Summary: Users saw single-file apps failing at start with "libgssapi_krb5.so.2: cannot open shared object file" because single-file superhost linked native shims statically and pulled in dependencies eagerly. Discussion: single-file behavior vs. non-single-file; the resolution path was to delay-load such native dependencies. PR #55037 was later referenced as the fix for this class of single-file problem.
    • Conclusion relevant to new issue: the single-file/startup failure was addressed by the lazy-loading shim; but the underlying dlopen failure message (or stderr output) remained a point of UX friction.
  • Issue #45682 (Dec 2020) - "Is it possible to remove unused native dll references from the published executable?"

    • Summary: Same root cause area — native shims statically linked into single-file executables caused unexpected native dependencies (libgssapi_krb5). Discussion suggested delaying native initialization or making managed code trigger native init (static constructor pattern used in other shims).
    • Conclusion relevant to new issue: team had the same intent (delay init) — PR #55037 implemented that for GSSAPI — but the logging/noise behavior when dlopen fails was not explicitly removed here.
  • Issue #11891 (Jan 2019) - "Better error message when not loading native shared library"

    • Summary: Long discussion about native load diagnostics on Unix/macOS and improving error messages (use dlerror, recommend LD_DEBUG / DYLD_PRINT_LIBRARIES). The coreclr/runtime were changed to include better diagnostic guidance (e.g., suggestion to set LD_DEBUG) and to incorporate dlerror output into the managed DllNotFoundException message in the Unix path.
    • Conclusion relevant to new issue: the runtime already improved the managed exception messages to include OS loader diagnostics. That helps callers, but it does not directly address native code printing to stderr (fprintf) — using dlerror in exception messages is a different mechanism than suppressing native stderr output.
  • Issue #82945 (Mar 2023) - "Alpine System.Net.Security.Tests failing because of 'Cannot load library libgssapi_krb5.so.2'"

    • Summary: CI failure on Alpine where tests/logs contained "Cannot load library libgssapi_krb5.so.2". The test/CI noise was caused by missing libs in the test image; the team determined a prereq image update fixed the problem. This shows that the same stderr message (or similar) surfaces in CI logs and can be confusing/noisy.
    • Conclusion relevant to new issue: there is precedent for the environment producing that exact message in logs and that it can cause CI/test noise/confusion.
  • Issue #109236 (Oct 2024) - "The type initializer for 'NetSecurityNative' throws meaningless exception rendering corrective action impossible."

    • Summary: User hit TypeInitializationException coming from NetSecurityNative/GssInitializer on Unix; the stack showed an invalid state but message lacked actionable guidance. Discussion pointed to platform-specific causes and that GSS is OS-provided and troubleshooting depends on distro/package. That issue highlights the developer confusion when the managed exception is generic and the native diagnostics are sparse or ambiguous.
    • Conclusion relevant to new issue: emphasizes that noisy stderr output + an unhelpful TypeInitializationException is a poor DX. It supports the request in the new issue to avoid printing confusing stderr messages when the absence of the library is an expected fallback condition.
  • PR #68253 (Apr 2022) - "Delete libkrb5-dev from NativeAOT prereqs"

    • Summary: Because the runtime moved to dynamic loading of libgssapi_krb5 (PR #55037), the build prereq libkrb5-dev was removed from NativeAOT prerequisites. This confirms the dynamic-loading design decision is used to reduce hard build-time/runtime dependencies.
    • Conclusion relevant to new issue: reinforces that the runtime now expects the library to possibly be missing on many systems, strengthening the case that noisily printing to stderr about it being missing is undesirable.
  • PR #70723 (June 2022) - "Fix compilation without HAVE_GSS_KRB5_CRED_NO_CI_FLAGS_X"

    • Summary: Native build fixes / defensive code for platforms lacking a particular GSS/KRB5 feature macro. The PR and subsequent discussion also surfaced some platform-specific crashes in gss/ntlm stacks on some distros (not directly about logging).
    • Conclusion relevant to new issue: shows ongoing platform-specific GSSAPI fragility and that native side can be sensitive to distro-specific behavior — another reason to avoid emitting confusing diagnostic noise in common scenarios.
  • EFCore issue / discussion: microsoft/efcore#33271 (Mar 2024) - "Running on kubernetes: Cannot load library libgssapi_krb5.so.2 ..."

    • Summary: User saw the same "Cannot load library libgssapi_krb5.so.2" log lines in a Kubernetes pod which led to confusion while debugging their app; the root cause was unrelated (appsettings/cascading config or integrated-security setting). The issue was closed after the reporter clarified root cause. This is an example of how those stderr messages can mislead users during diagnosis.
    • Conclusion relevant to new issue: stderr noise from GSSAPI can mislead users — another data point for suppressing or demoting such messages.

Overall findings and takeaways from the above:

  • The runtime intentionally moved to lazy dlopen() for libgssapi_krb5 (PR #55037) so that missing GSSAPI libraries are a normal, expected condition on many minimal/container systems (rather than a hard startup failure). This is already in place.
  • The shim probes the versioned SONAME (libgssapi_krb5.so.2) (PR #59526), so dlopen failures commonly report that exact filename.
  • The runtime improved managed-side error messages to suggest diagnostic env vars (LD_DEBUG / DYLD_PRINT_LIBRARIES / dlerror) when a native load fails (issue #11891), but that does not suppress native code printing to stderr.
  • Multiple user reports / CI hits show the exact stderr output ("Cannot load library libgssapi_krb5.so.2\nError: ... cannot open shared object file: No such file or directory") appears in logs and causes confusion (issues #45720, #82945, EFCore #33271).
  • I did not find an issue/PR that specifically removed or changed the native shim's fprintf-to-stderr behavior in pal_gssapi.c. The prior discussions focused on lazy loading, filename probing, and improving managed exception text; none explicitly resolved the "noisy stderr print on dlopen fail" question.

If you want next steps for triage:

  • The new issue points to pal_gssapi.c line(s) that print to stderr on dlopen failure — since the team already expects dlopen to commonly fail, it’s reasonable to change the native shim to avoid unconditional fprintf(stderr). Two obvious alternatives discussed in the historical threads are:
    • Do not write to stderr from the native shim; instead capture dlerror() and surface it in the managed exception (like other DllNotFound changes did), or
    • Make the fprintf a debug-only log (only when a debug/tracing flag is set) so normal containers aren't polluted.
  • The lazy-loading PRs (PR #55037 and PR #59526) are the best reference for how the shim is structured; use them to implement/locate change points to suppress stderr writes while still preserving useful diagnostics for developers (e.g., include dlerror in the managed exception text).

If you'd like, I can:

  • point to the exact lines in pal_gssapi.c that call fprintf / write to stderr (the issue referenced line ~125 already) and draft a minimal patch that uses dlerror() and returns an error string to managed code instead of printing, or
  • open a short follow-up PR suggestion referencing the above prior work (PR #55037 / PR #59526 / issue #11891) recommending removing the stderr writes and propagating dlerror into the managed exception.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions