fix(csharp): emit db.error.kind tag on Status=Error spans#58
Merged
CurtHagenlocher merged 2 commits intoMay 29, 2026
Merged
Conversation
Adds three RED tests for adbc-drivers/databricks#481 covering the classifications that are tractable to mock through the in-process HiveServer2 test server: - OperationError_TagsServerError: GetOperationStatus reports ERROR_STATE -> assert "server_error". - OpenSessionError_TagsServerError: OpenSession returns ERROR_STATUS -> assert "server_error". - Connect_RefusedTcp_TagsNetwork: connect to a closed loopback port -> assert "network". Tests fail on origin/main (no db.error.kind tag exists) and will turn GREEN once the catch sites are wired up in the follow-up commit.
Adds OTel-compatible db.error.kind taxonomy on failure spans so dashboards can group/filter by error category instead of relying solely on exception.type (which is ambiguous between, e.g., TaskCanceledException for timeouts vs user cancels). Implemented classifications: - server_error: Thrift TStatus != SUCCESS, polled operation in ERROR_STATE, OpenSession failure paths, or direct-result error display messages. - network: transport-level (HttpRequestException without status, SocketException, IOException, TTransportException without a more specific inner exception). - auth_failed: HTTP 401 / 403 (typed StatusCode on net5+, message scan on netstandard2.0 / net472). - protocol_error: TProtocolException (Thrift framing/encoding). - query_timeout: cancellation in scopes where the only CTS in scope had a CancelAfter timer (OpenAsync, the metadata-query paths in HiveServer2Connection). NOT applied at the Statement-level catches because the parent driver can also call Statement.Cancel(), and disambiguating user_cancel from query_timeout there requires cross-layer state. Deferred: user_cancel classification — left for a follow-up that also touches the parent driver. Statement-level cancellations remain UNTAGGED for now rather than risk mislabeling. The tag is set at the originating catch/throw sites (innermost wins, via a guard in ErrorKindClassifier.Tag), so the OTel Status=Error span where the failure actually occurred carries the classification. Outer wrapper spans inherit Status=Error from System.Diagnostics but do not duplicate the kind. Refs adbc-drivers/databricks#481
CurtHagenlocher
approved these changes
May 28, 2026
| @@ -0,0 +1,184 @@ | |||
| /* | |||
| * Copyright (c) 2025 ADBC Drivers Contributors | |||
Collaborator
There was a problem hiding this comment.
Suggested change
| * Copyright (c) 2025 ADBC Drivers Contributors | |
| * Copyright (c) 2026 ADBC Drivers Contributors |
Collaborator
There was a problem hiding this comment.
(This has been throwing me off, too; I have to make an effort to remember.)
| @@ -0,0 +1,182 @@ | |||
| /* | |||
| * Copyright (c) 2025 ADBC Drivers Contributors | |||
Collaborator
There was a problem hiding this comment.
Suggested change
| * Copyright (c) 2025 ADBC Drivers Contributors | |
| * Copyright (c) 2026 ADBC Drivers Contributors |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Fixes adbc-drivers/databricks#481. When a query fails today the only error classification on the OTel span is
exception.type, which is too ambiguous for automated dashboards / alerts to group on:TaskCanceledException: client-side timeout? user-cancel viaCancel()? mid-flight network drop?TTransportException: server-side cancel? network drop? Thrift framing error?The driver already knows which of these it is internally (it set the timeout, it called Cancel, it received the Thrift error status, etc.); this PR surfaces that knowledge as a
db.error.kindtag on the originating Activity.What changed
A new internal helper
ErrorKindClassifier(csharp/src/AdbcDrivers.HiveServer2/ErrorKindClassifier.cs) maps a caught exception to one of the OTel taxonomy values inActivityKeys.Db.ErrorKindValues:server_errorTStatus != SUCCESS(viaHandleThriftResponse->ThrowErrorResponse), polled operation inERROR_STATE(inPollForResponseAsync),OpenSessionfailure paths, or direct-result error display messages fromExecuteStatementAsync.networkHttpRequestExceptionwithout a typed status code,SocketException,IOException, or aTTransportExceptionwith no more-specific inner exception.auth_failedStatusCodeon net5+, message scan fallback on netstandard2.0 / net472 (the same fallback the driver already uses inIsUnauthorized).protocol_errorTProtocolException(Thrift framing/encoding).query_timeoutCancelAftertimer (HiveServer2Connection.OpenAsync,TryOpenSessionWithFallbackAsync, the metadata-query catch sites).The classifier is invoked at the catch/throw sites in
HiveServer2ConnectionandHiveServer2Statementso the tag lives on the Activity where the failure actually originated.ErrorKindClassifier.Tagis idempotent — if an inner catch site already tagged a kind, an outer one will not overwrite — so the innermost classification wins.Deferred:
user_cancelPer the parent issue this PR intentionally does NOT emit
user_cancel. At the Statement-level catch sites (HiveServer2Statement.ExecuteQueryAsyncetc.) the local CTS has BOTH aCancelAftertimer (fromQueryTimeoutSeconds) AND a manualCancel()path (the parent driver'sStatement.Cancel()). Disambiguating those two requires cross-layer state (a flag set byCancel()that the catch site can read) and is its own design question. Rather than risk mislabeling a user-cancel as a timeout, Statement-level cancellations are left UNTAGGED here, anduser_cancelis left for a follow-up that touches the parent driver. The Connection-level cancel paths only have a timer, so they ARE tagged asquery_timeout.Tests
Adds
MockServerErrorKindTests(csharp/test/AdbcDrivers.Tests.HiveServer2/Hive2/MockServer/MockServerErrorKindTests.cs) following theActivityListener-scoped-to-source pattern PR #57 established. Three failure modes are mockable through the in-process HiveServer2 server:OperationError_TagsServerError—GetOperationStatusreportsERROR_STATE-> assertdb.error.kind = "server_error"on the originating Activity.OpenSessionError_TagsServerError—OpenSessionreturnsERROR_STATUS-> same.Connect_RefusedTcp_TagsNetwork— connect to a closed loopback port -> assertdb.error.kind = "network".Behavior was verified RED-then-GREEN: all three fail on the test-only commit and pass after the implementation commit. All 119 MockServer tests still pass; remaining failures in the full suite are the pre-existing integration tests gated on
{HIVE,IMPALA,SPARK}_TEST_CONFIG_FILEenv vars (same caveat as PR #57).auth_failedandprotocol_errorare exercised by the classifier code paths but not by a mock test in this PR — mocking an HTTP 401 / Thrift framing error from the in-process test server is a non-trivial change to the test infrastructure. A parent-repo E2E test against a real server can validate those once the parent bumps the submodule pointer.Refs adbc-drivers/databricks#481
This pull request and its description were written by Isaac.