diff --git a/csharp/doc/sprint-plan-PECO-3022-sea-telemetry-2026-05-14.md b/csharp/doc/sprint-plan-PECO-3022-sea-telemetry-2026-05-14.md new file mode 100644 index 00000000..1ddf7bbc --- /dev/null +++ b/csharp/doc/sprint-plan-PECO-3022-sea-telemetry-2026-05-14.md @@ -0,0 +1,226 @@ +# Sprint Plan — PECO-3022 SEA Telemetry Integration + +**Sprint window:** 2026-05-14 → 2026-05-28 (2 weeks) +**Implementer:** Jade Wang (sole) +**Design doc:** [`docs/designs/PECO-3022-sea-telemetry-integration-design.md`](../../docs/designs/PECO-3022-sea-telemetry-integration-design.md) +**Design PR:** https://github.com/adbc-drivers/databricks/pull/455 +**Jira:** [PECO-3022](https://databricks.atlassian.net/browse/PECO-3022) + +--- + +## Sprint Goal + +Ship end-to-end SEA client telemetry to production parity with Thrift — connection session events, per-statement operation events, error events, chunk metrics — verified against a real SQL warehouse. Includes the mechanical refactor of `DatabricksStatement` to consume the new observer interface so both transports use the same telemetry seam. + +### Success criteria + +- A statement executed via `adbc.databricks.protocol=rest` emits a `OssSqlDriverTelemetryLog` carrying `driver_connection_params.mode = DRIVER_MODE_SEA`, populated session id, statement id, operation latency, result format, poll count, first-batch and consumed latencies. +- Error in a SEA statement produces an `error_info` record with `error_name` populated. +- Thrift telemetry output is byte-identical to current main (regression-tested). +- SEA telemetry visible in `eng_lumberjack.prod_frontend_log_sql_driver_log` after sprint deploys. + +### Dependency + +- The gap-fix workstream's `CloudFetchDownloader → ChunkMetrics → CloudFetchReader.GetChunkMetrics()` plumbing. If it lands in-sprint, wire it in. If not, ship with `ChunkMetrics.Empty` and backfill in a follow-up. + +--- + +## Task Breakdown (7 tasks, ~11.5 person-days) + +### T1 — Refactor `ConnectionTelemetry.Create` signature (1 day) + +Replace `TSessionHandle? sessionHandle` with `string sessionId`. Add `DriverMode.Types.Type mode` parameter. Remove the hardcoded `DriverMode.Types.Type.Thrift` at `ConnectionTelemetry.cs:458` and `:642`. + +**Files touched:** +- `csharp/src/Telemetry/ConnectionTelemetry.cs` +- `csharp/src/DatabricksConnection.cs` (single Thrift call site, convert `sessionHandle.SessionId.Guid.ToString()` at boundary) + +**Acceptance criteria:** +- All existing telemetry unit tests pass unchanged. +- Thrift integration test emits `driver_connection_params.mode = DRIVER_MODE_THRIFT` (regression check). +- New unit test: `Create_AcceptsStringSessionId`, `Create_ThriftMode_SetsDriverModeThrift`, `Create_SeaMode_SetsDriverModeSea`. + +**Risks:** Low. Mechanical refactor with one Thrift caller to update. + +--- + +### T2 — Introduce `IStatementOperationObserver` + impls (2 days) + +Create the interface and three implementations per design §5.1 and §12. + +**New files:** +- `csharp/src/Telemetry/IStatementOperationObserver.cs` +- `csharp/src/Telemetry/TelemetryObserver.cs` (uses `Safe(Action)` helper pattern from design §12) +- `csharp/src/Telemetry/NullObserver.cs` (singleton) +- `csharp/src/Telemetry/SafeObserver.cs` (decorator) + +**Acceptance criteria:** +- Interface contract documented: methods MUST NOT throw, thread-safe, `OnFinalized` is terminal and idempotent. +- `TelemetryObserver` writes into a `StatementTelemetryContext`; on `OnFinalized` builds `OssSqlDriverTelemetryLog` and enqueues via `IConnectionTelemetry`. +- Unit tests per design §15: + - `NullObserver_AllMethods_AreNoOps` / `NullObserver_IsSingleton` + - `TelemetryObserver_OnExecuteStarted_PopulatesContext` + - `TelemetryObserver_OnExecuteSucceeded_RecordsStatementId` + - `TelemetryObserver_OnFinalized_EnqueuesExactlyOnce` + - `TelemetryObserver_OnFinalized_CalledTwice_EnqueuesOnce` + - `TelemetryObserver_OnError_RecordsErrorAndFinalizes` + - `TelemetryObserver_AllMethods_NeverThrow_WhenContextCorrupted` + - `TelemetryObserver_OnChunksDownloaded_MergesIntoChunkDetails` + - `SafeObserver_PropagatesNormalCallsToInner` + - `SafeObserver_SwallowsExceptionsFromInner_LogsAtTrace` + +**Risks:** Low. New code, no existing callers yet. + +--- + +### T3 — Add `SeaResultFormatMapper` (1 day) + +Static helper that maps wire `disposition` × manifest state → proto `ExecutionResult.Format`. Per design §8. + +**New files:** +- `csharp/src/StatementExecution/SeaResultFormatMapper.cs` + +**Acceptance criteria:** +- Unit tests covering all four cells in §8 table: + - `Map_InlineDisposition_ReturnsInlineArrow` + - `Map_ExternalLinksDisposition_ReturnsExternalLinks` + - `Map_AutoDisposition_WithInlineResult_ReturnsInlineArrow` + - `Map_AutoDisposition_WithExternalLinks_ReturnsExternalLinks` + +**Risks:** Low. Isolated pure-function helper. + +--- + +### T4 — Refactor `DatabricksStatement` to use observer (1 day) + +Mechanical refactor: replace the private telemetry methods (`CreateTelemetryContext`, `CreateMetadataTelemetryContext`, `RecordSuccess`, `RecordError`, `EmitTelemetry`) with `_observer: IStatementOperationObserver` field calls. Behavior unchanged. + +**Files touched:** +- `csharp/src/DatabricksStatement.cs` + +**Acceptance criteria:** +- All existing Thrift telemetry unit tests pass unchanged. +- Manual diff check: byte-equivalent `OssSqlDriverTelemetryLog` for a known statement before/after the refactor. +- `((DatabricksConnection)Connection).TelemetrySession` cast eliminated; observer is injected at statement construction from `DatabricksConnection.CreateStatement()`. + +**Risks:** Medium. The refactor is mechanical but the existing Thrift test suite is the safety net. Allocate buffer time for any subtle behavior differences (e.g. `PendingTelemetryContext` exposure used by external callers). + +**Depends on:** T1 (Create signature), T2 (observer types). + +--- + +### T5 — Wire telemetry into `StatementExecutionConnection` (1.5 days) — **DONE 2026-05-14** + +Mirror the Thrift pattern at `DatabricksConnection.cs:594-724`. Add `_telemetry: IConnectionTelemetry` field. Call `ConnectionTelemetry.Create(...)` in `OpenAsync` after `CreateSessionAsync` succeeds, emit `CREATE_SESSION` event, then on `Dispose` emit `DELETE_SESSION` and run `DisposeAsync` with 5-second timeout. + +**Files touched:** +- `csharp/src/StatementExecution/StatementExecutionConnection.cs` (modified) +- `csharp/test/Unit/StatementExecution/StatementExecutionConnectionTelemetryTests.cs` (new) + +**Acceptance criteria:** +- ✅ `OpenAsync` succeeds even if telemetry initialization throws (telemetry is fail-open; falls back to `NoOpConnectionTelemetry`). +- ✅ `Dispose` completes within 5 seconds even if exporter is wedged (test `Dispose_FlushHangs_CompletesWithin5Seconds`). +- ⏭ Observer creation in `CreateStatement()` deferred to **T6** (this task only exposes `TelemetrySession` accessor so the SEA statement can pull the session for observer construction). +- 🟡 Manual test (lumberjack verification): pending sprint demo against real warehouse. + +**Implementation notes:** +- `IConnectionTelemetry.DisposeAsync` returns `Task` (not `ValueTask`), so the call is `_telemetry.DisposeAsync().Wait(TimeSpan.FromSeconds(5))`. The sprint description's `.AsTask().Wait(...)` was based on a different return type assumption — the simpler form is equivalent. +- `CreateHttpClient` was switched from `HttpHandlerFactory.CreateHandlers` to `CreateHandlersWithTokenProvider` so the OAuth `_oauthTokenProvider` can be captured and threaded into `ConnectionTelemetry.Create` (matches the Thrift path's token-caching behavior). +- Added test seam `TelemetryForTesting` (settable property) mirroring `DatabricksConnection.TelemetryForTesting` so unit tests can inject a `RecordingTelemetry` fake without driving a real CreateSession RPC. Direct unit testing of `OpenAsync` end-to-end requires a fake `IStatementExecutionClient`, which is out-of-scope for T5; the public wiring contract is verified by exercising `EmitCreateSessionTelemetry` / `EmitDeleteSessionTelemetry` directly, identical to the Thrift `DriverTelemetryWiringTests` approach. + +**Risks:** Medium. New telemetry surface on a class that has never had it. Watch for null-handling around `_telemetry` and `Session`. + +**Depends on:** T1 (Create signature). + +--- + +### T6 — Wire telemetry into `StatementExecutionStatement` (3 days) + +The meatiest task. Add `_observer: IStatementOperationObserver` field (defaults to `NullObserver.Instance`, set by `StatementExecutionConnection.CreateStatement`). Call observer methods at all 7 hookpoints per design §6: + +1. `OnExecuteStarted` — `ExecuteQueryInternalAsync` before `_client.ExecuteStatementAsync` (line 345) +2. `OnExecuteSucceeded` — after response received, using `SeaResultFormatMapper` +3. `OnPollCompleted` — in `PollUntilCompleteAsync` (line 453), accumulate count/ms across the loop, emit once on terminal state +4. `OnFirstBatchReady` — at `CreateCloudFetchReader` (line 542) and `InlineArrowStreamReader` construction (nested at line 900) +5. `OnConsumed` + `OnChunksDownloaded` — at reader Dispose +6. `OnError` — `ExecuteQueryInternalAsync` catch block +7. `OnFinalized` — `Dispose` (line 817) + +**Files touched:** +- `csharp/src/StatementExecution/StatementExecutionStatement.cs` + +**Acceptance criteria:** +- Manual test: execute a SELECT via REST, verify a telemetry record arrives with `statement_id`, `result_format`, `operation_latency_ms`, `poll_count`, `result_set_ready_latency_millis`, `result_set_consumption_latency_millis` populated. +- Manual test: execute a bad SQL via REST, verify `error_info.error_name` populated. +- `OnFinalized` exactly-once even when both error path and dispose path fire. +- `ChunkMetrics`: wire to `OnChunksDownloaded` if gap-fix plumbing is available, else pass `ChunkMetrics.Empty`. + +**Risks:** Medium-high. Largest scope; touches Execute, Poll loop, both reader construction paths, Dispose, and error catch. Highest chance of edge-case regressions. + +**Depends on:** T1, T2, T3. + +--- + +### T7 — SEA integration tests against real SQL warehouse (2 days) + +Mirror the Thrift integration test set per design §15. + +**New files:** +- `csharp/test/E2E/Telemetry/SeaTelemetryIntegrationTests.cs` (or similar) + +**Test cases:** +- `Sea_ExecuteQuery_EmitsTelemetryWithStatementId` +- `Sea_ExecuteQuery_WithSyntaxError_EmitsErrorTelemetry` +- `Sea_ExecuteQuery_CloudFetch_RecordsChunkMetrics` (skipped if gap-fix plumbing not present) +- `Sea_ExecuteQuery_InlineResults_RecordsInlineFormat` +- `Sea_OpenConnection_EmitsCreateSession` +- `Sea_CloseConnection_EmitsDeleteSessionAndFlushes` +- `Sea_TelemetryDisabledByFeatureFlag_EmitsZeroEvents` +- `Sea_TelemetryDisabledByProperty_EmitsZeroEvents` +- `Sea_TelemetryExporterFails_DoesNotAffectQueryExecution` +- `Sea_TelemetryRecord_HasDriverModeSea` +- `Sea_ConcurrentStatements_EachEmitsExactlyOneRecord` + +**Acceptance criteria:** +- All tests pass against a dev/staging Databricks SQL warehouse. +- Test infrastructure verifies records via either a local capture exporter or by querying `eng_lumberjack.prod_frontend_log_sql_driver_log` after a settling delay. + +**Risks:** Medium. Real-warehouse tests are slow and flaky; allocate time for retry/stabilization. + +**Depends on:** T5, T6. + +--- + +## Sequencing + +``` +Week 1 (Mon-Fri): T1 → T2 → T3 → T4 + (T2 and T3 parallelizable if context allows) + +Week 2 (Mon-Fri): T5 → T6 → T7 + (T5 in parallel with start of T6 if discipline holds) +``` + +**Critical path:** T1 → T6 → T7 (≈6 days). +**Slack:** ~1.5 days for review iteration, unexpected edge cases, gap-fix integration if it lands. + +--- + +## Definition of Done + +- All 7 tasks merged to `main`. +- Design PR (#455) approved and merged. +- SEA telemetry records visible in `eng_lumberjack.prod_frontend_log_sql_driver_log` via the [client-telemetry-query](https://databricks.atlassian.net/) skill. +- Thrift telemetry regression test green. +- Sprint demo: show side-by-side Thrift vs SEA telemetry records for the same query. + +--- + +## Risks and Mitigations + +| Risk | Likelihood | Mitigation | +|---|---|---| +| Gap-fix `ChunkMetrics` plumbing slips | Medium | Ship with `ChunkMetrics.Empty`; backfill in follow-up sprint | +| `DatabricksStatement` refactor (T4) hits subtle regression | Medium | Cross-transport byte-identical regression test in T1, dry-run in T4 | +| SEA integration tests flaky in CI | Medium | Tag as `[Trait("Category", "Integration")]`; run on-demand initially | +| Sprint overflow (11.5d est in 10d capacity) | High | T7 can slip to follow-up sprint if T5/T6 take longer than estimated; foundation is the priority | diff --git a/csharp/src/DatabricksConnection.cs b/csharp/src/DatabricksConnection.cs index 5ec2ecbf..d353c2d0 100644 --- a/csharp/src/DatabricksConnection.cs +++ b/csharp/src/DatabricksConnection.cs @@ -533,8 +533,16 @@ internal override IArrowArrayStream NewReader(T statement, Schema schema, IRe public override AdbcStatement CreateStatement() { - DatabricksStatement statement = new DatabricksStatement(this); - return statement; + // Inject the per-statement observer at construction so DatabricksStatement is + // not coupled to ((DatabricksConnection)Connection).TelemetrySession at runtime. + // When telemetry is disabled (or the session has no live client), we hand the + // statement a NullObserver so every observer hook call is a no-op without + // requiring null-checks at the callsite. + TelemetrySessionContext? session = TelemetrySession; + IStatementOperationObserver observer = session?.TelemetryClient != null + ? (IStatementOperationObserver)new TelemetryObserver(session) + : NullObserver.Instance; + return new DatabricksStatement(this, observer); } protected override TOpenSessionReq CreateSessionRequest() @@ -727,12 +735,36 @@ internal IConnectionTelemetry TelemetryForTesting /// private void InitializeTelemetry(Activity? activity = null) { + // Convert TSessionHandle -> string at the transport boundary so + // ConnectionTelemetry.Create stays transport-agnostic. SEA will pass its + // server-assigned session id string directly. + // + // Wrap the byte[] -> Guid conversion locally: `new Guid(byte[])` throws + // ArgumentException on a wrong-length array, and that would propagate to + // connection-open. Pre-refactor the same conversion was inside + // ConnectionTelemetry.Create's outer try/catch, so a malformed session GUID + // degraded to NoOp telemetry — preserve that fail-open contract. + string sessionId = string.Empty; + try + { + if (SessionHandle?.SessionId?.Guid != null) + { + sessionId = new Guid(SessionHandle.SessionId.Guid).ToString(); + } + } + catch + { + // Intentionally swallowed: malformed session GUID disables telemetry, + // not the connection. + } + _telemetry = Telemetry.ConnectionTelemetry.Create( properties: Properties, host: GetHost(), assemblyVersion: s_assemblyVersion, oauthTokenProvider: _oauthTokenProvider, - sessionHandle: SessionHandle, + sessionId: sessionId, + mode: Telemetry.Proto.DriverMode.Types.Type.Thrift, enableDirectResults: _enableDirectResults, useDescTableExtended: _useDescTableExtended, connectTimeoutMilliseconds: ConnectTimeoutMilliseconds, diff --git a/csharp/src/DatabricksStatement.cs b/csharp/src/DatabricksStatement.cs index a5313de8..1850f2e2 100644 --- a/csharp/src/DatabricksStatement.cs +++ b/csharp/src/DatabricksStatement.cs @@ -31,10 +31,9 @@ using AdbcDrivers.Databricks.Reader.CloudFetch; using AdbcDrivers.Databricks.Result; using AdbcDrivers.Databricks.Telemetry; -using AdbcDrivers.Databricks.Telemetry.Models; -using AdbcDrivers.Databricks.Telemetry.Proto; using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; using Apache.Arrow; using Apache.Arrow.Adbc; using AdbcDrivers.HiveServer2; @@ -71,18 +70,47 @@ internal class DatabricksStatement : SparkStatement, IHiveServer2Statement internal bool IsInternalCall { get; set; } // Marks if this is a driver-internal operation (e.g., USE SCHEMA) /// - /// Telemetry context for the current statement execution, pending emission on Dispose. - /// Set before calling base.ExecuteQueryAsync()/ExecuteQuery() so that - /// can forward it to the - /// composite reader and operation status poller. The poller increments - /// and accumulates - /// on each - /// GetOperationStatus call so the emitted telemetry log carries the - /// n_operation_status_calls and operation_status_latency_millis - /// fields (PECO-2992). After a successful execute the same instance remains - /// here until is invoked from Dispose. + /// Observer for this statement's operational lifecycle. Injected at construction + /// from — a + /// bound to the connection's + /// when telemetry is enabled, otherwise + /// . All hookpoints in this class flow through + /// the observer interface so the SEA path can reuse the exact same statement code + /// with a different observer implementation. /// - internal StatementTelemetryContext? PendingTelemetryContext { get; private set; } + private readonly IStatementOperationObserver _observer; + + /// + /// Tracks whether has fired. Gates the + /// terminal call in + /// : when a statement is disposed without ever being + /// executed, the observer must not enqueue an empty execute-statement log. + /// This preserves byte-identical behavior with the pre-refactor implementation, + /// which only emitted execute telemetry when CreateTelemetryContext had + /// produced a non-null PendingTelemetryContext at execute time. + /// + private bool _executeStarted; + + /// + /// Tracks whether the execute path has already finalized telemetry for this + /// statement (true after the error path drained it). Lets + /// avoid re-invoking the success-path inspection logic in that case. + /// is itself idempotent, so + /// this flag is only an optimization for the chunk-metrics / reader-inspection + /// path; it is not required for correctness. + /// + private bool _executeFinalized; + + /// + /// Telemetry context for the current statement execution. Exposes the observer's + /// underlying when one exists so the + /// reader and operation status poller can mutate poll-count and poll-latency + /// fields directly (PECO-2992). Returns null when telemetry is disabled + /// () so the reader code can skip its telemetry + /// branches without a separate feature flag check. + /// + internal StatementTelemetryContext? PendingTelemetryContext => + (_observer as TelemetryObserver)?.Context; // Stopwatch covering the lifetime of this statement, started at construction. // Used to scope CANCEL_STATEMENT timing within the statement's lifetime. @@ -98,8 +126,14 @@ internal class DatabricksStatement : SparkStatement, IHiveServer2Statement public override long BatchSize { get; protected set; } = DatabricksBatchSizeDefault; public DatabricksStatement(DatabricksConnection connection) + : this(connection, ResolveDefaultObserver(connection)) + { + } + + internal DatabricksStatement(DatabricksConnection connection, IStatementOperationObserver observer) : base(connection) { + _observer = observer ?? NullObserver.Instance; // set the catalog name for legacy compatibility // TODO: use catalog and schema fields in hiveserver2 connection instead of DefaultNamespace so we don't need to cast var defaultNamespace = ((DatabricksConnection)Connection).DefaultNamespace; @@ -129,21 +163,23 @@ public DatabricksStatement(DatabricksConnection connection) } } - private StatementTelemetryContext? CreateTelemetryContext(Telemetry.Proto.Statement.Types.Type statementType) + /// + /// Resolves the default for a statement + /// constructed without an explicit observer (e.g. the connection-side + /// / ApplyServerSidePropertiesAsync + /// helpers that use new DatabricksStatement(this) directly). Returns a + /// bound to the connection's session when telemetry + /// is enabled and the session has a live telemetry client, otherwise + /// . This is the same decision + /// makes when it injects an + /// observer explicitly, so direct-construction callers get identical behavior. + /// + private static IStatementOperationObserver ResolveDefaultObserver(DatabricksConnection connection) { - var session = ((DatabricksConnection)Connection).TelemetrySession; - if (session?.TelemetryClient == null) return null; - - var ctx = new StatementTelemetryContext(session); - ctx.OperationType = OperationType.ExecuteStatement; - ctx.StatementType = statementType; - // IsCompressed and ResultFormat are populated from the actual result stream in - // EmitTelemetry, not the connection-level capability flags. Defaults here cover error - // and unconsumed paths (PECO-2988, PECO-2978). - ctx.IsCompressed = false; - ctx.ResultFormat = ExecutionResultFormat.InlineArrow; - ctx.IsInternalCall = IsInternalCall; - return ctx; + TelemetrySessionContext? session = connection.TelemetrySession; + return session?.TelemetryClient != null + ? (IStatementOperationObserver)new TelemetryObserver(session) + : NullObserver.Instance; } /// @@ -165,37 +201,70 @@ public DatabricksStatement(DatabricksConnection connection) }; } - private StatementTelemetryContext? CreateMetadataTelemetryContext() + /// + /// Begins an execute-time observer hook. Replaces the old + /// CreateTelemetryContext / CreateMetadataTelemetryContext helpers: + /// signals the observer with + /// and stamps the per-statement with the + /// defaults the legacy helpers used (InlineArrow result format, IsInternalCall). + /// The reader inspection at still overrides + /// IsCompressed and ResultFormat based on the active reader (PECO-2988, PECO-2978). + /// + private void BeginExecuteTelemetry(StatementType statementType, OperationType operationType) { - var session = ((DatabricksConnection)Connection).TelemetrySession; - if (session?.TelemetryClient == null) return null; - - var operationType = GetMetadataOperationType(SqlQuery) ?? OperationType.Unspecified; - - var ctx = new StatementTelemetryContext(session); - ctx.OperationType = operationType; - ctx.StatementType = Telemetry.Proto.Statement.Types.Type.Metadata; - ctx.ResultFormat = ExecutionResultFormat.InlineArrow; - ctx.IsCompressed = false; - ctx.IsInternalCall = IsInternalCall; - return ctx; + _executeStarted = true; + // Placeholder isCompressed=false matches the old CreateTelemetryContext default; + // the actual value is sourced from the reader at finalize time. + _observer.OnExecuteStarted(statementType, operationType, isCompressed: false); + StatementTelemetryContext? ctx = PendingTelemetryContext; + if (ctx != null) + { + ctx.ResultFormat = ExecutionResultFormat.InlineArrow; + ctx.IsInternalCall = IsInternalCall; + } } - private void RecordSuccess(StatementTelemetryContext ctx) + /// + /// Records a successful execute on the observer. Replaces the legacy + /// RecordSuccess helper. Marks first-batch-ready, captures HTTP retry count + /// from , then signals the observer with + /// . ResultFormat is + /// passed as InlineArrow here and overridden at finalize time based on the active + /// reader (PECO-2978) so this hook does not mislabel inline results when CloudFetch + /// is enabled on the connection. + /// + private void RecordExecuteSuccess() { - ctx.RecordFirstBatchReady(); - // ResultFormat is populated from the actual active reader in EmitTelemetry (PECO-2978); - // setting it here from the connection-level useCloudFetch flag mislabels inline results - // as EXTERNAL_LINKS whenever CloudFetch is enabled on the connection. - ctx.StatementId = StatementId; - CaptureRetryCount(ctx); + StatementTelemetryContext? ctx = PendingTelemetryContext; + if (ctx != null) + { + _observer.OnFirstBatchReady(ctx.ExecuteStopwatch.ElapsedMilliseconds); + CaptureRetryCount(ctx); + } + _observer.OnExecuteSucceeded(StatementId ?? string.Empty, ExecutionResultFormat.InlineArrow); + } + + /// + /// Records a failed execute on the observer. Replaces the legacy + /// RecordError helper. Captures HTTP retry count before signalling + /// ; ordering matches the + /// previous helper. + /// + private void RecordExecuteError(Exception ex) + { + StatementTelemetryContext? ctx = PendingTelemetryContext; + if (ctx != null) + { + CaptureRetryCount(ctx); + } + _observer.OnError(ex); } - private void CaptureRetryCount(StatementTelemetryContext ctx) + private static void CaptureRetryCount(StatementTelemetryContext ctx) { if (Activity.Current != null) { - var retryCountTag = Activity.Current.GetTagItem("http.retry.total_attempts"); + object? retryCountTag = Activity.Current.GetTagItem("http.retry.total_attempts"); if (retryCountTag is int retryCount) { ctx.RetryCount = retryCount; @@ -203,190 +272,174 @@ private void CaptureRetryCount(StatementTelemetryContext ctx) } } - private void RecordError(StatementTelemetryContext ctx, Exception ex) - { - ctx.HasError = true; - ctx.ErrorName = ex.GetType().Name; - ctx.ErrorMessage = ex.Message; - CaptureRetryCount(ctx); - } - public override QueryResult ExecuteQuery() { - var ctx = IsMetadataCommand - ? CreateMetadataTelemetryContext() - : CreateTelemetryContext(Telemetry.Proto.Statement.Types.Type.Query); - if (ctx == null) return base.ExecuteQuery(); + if (IsMetadataCommand) + { + BeginExecuteTelemetry( + StatementType.Metadata, + GetMetadataOperationType(SqlQuery) ?? OperationType.Unspecified); + } + else + { + BeginExecuteTelemetry(StatementType.Query, OperationType.ExecuteStatement); + } - // Expose ctx to NewReader so the operation status poller can update PollCount/PollLatencyMs (PECO-2992). - PendingTelemetryContext = ctx; try { QueryResult result = base.ExecuteQuery(); _lastQueryResult = result; // Store for telemetry - RecordSuccess(ctx); + RecordExecuteSuccess(); return result; } catch (Exception ex) { - RecordError(ctx, ex); - // Emit telemetry immediately on error (won't reach Dispose) - EmitTelemetry(ctx); - PendingTelemetryContext = null; // Clear to avoid double emission + RecordExecuteError(ex); + // Finalize telemetry immediately on error (won't reach Dispose's success-path emission). + FinalizeExecuteTelemetry(); throw; } } public override async ValueTask ExecuteQueryAsync() { - var ctx = IsMetadataCommand - ? CreateMetadataTelemetryContext() - : CreateTelemetryContext(Telemetry.Proto.Statement.Types.Type.Query); - if (ctx == null) return await base.ExecuteQueryAsync(); + if (IsMetadataCommand) + { + BeginExecuteTelemetry( + StatementType.Metadata, + GetMetadataOperationType(SqlQuery) ?? OperationType.Unspecified); + } + else + { + BeginExecuteTelemetry(StatementType.Query, OperationType.ExecuteStatement); + } - // Expose ctx to NewReader so the operation status poller can update PollCount/PollLatencyMs (PECO-2992). - PendingTelemetryContext = ctx; try { QueryResult result = await base.ExecuteQueryAsync(); _lastQueryResult = result; // Store for telemetry - RecordSuccess(ctx); + RecordExecuteSuccess(); return result; } catch (Exception ex) { - RecordError(ctx, ex); - // Emit telemetry immediately on error (won't reach Dispose) - EmitTelemetry(ctx); - PendingTelemetryContext = null; // Clear to avoid double emission + RecordExecuteError(ex); + FinalizeExecuteTelemetry(); throw; } } public override UpdateResult ExecuteUpdate() { - var ctx = CreateTelemetryContext(Telemetry.Proto.Statement.Types.Type.Update); - if (ctx == null) return base.ExecuteUpdate(); + BeginExecuteTelemetry(StatementType.Update, OperationType.ExecuteStatement); - PendingTelemetryContext = ctx; try { UpdateResult result = base.ExecuteUpdate(); - RecordSuccess(ctx); + RecordExecuteSuccess(); return result; } catch (Exception ex) { - RecordError(ctx, ex); - // Emit telemetry immediately on error (won't reach Dispose) - EmitTelemetry(ctx); - PendingTelemetryContext = null; // Clear to avoid double emission + RecordExecuteError(ex); + FinalizeExecuteTelemetry(); throw; } } public override async Task ExecuteUpdateAsync() { - var ctx = CreateTelemetryContext(Telemetry.Proto.Statement.Types.Type.Update); - if (ctx == null) return await base.ExecuteUpdateAsync(); + BeginExecuteTelemetry(StatementType.Update, OperationType.ExecuteStatement); - PendingTelemetryContext = ctx; try { UpdateResult result = await base.ExecuteUpdateAsync(); - RecordSuccess(ctx); + RecordExecuteSuccess(); return result; } catch (Exception ex) { - RecordError(ctx, ex); - // Emit telemetry immediately on error (won't reach Dispose) - EmitTelemetry(ctx); - PendingTelemetryContext = null; // Clear to avoid double emission + RecordExecuteError(ex); + FinalizeExecuteTelemetry(); throw; } } - private void EmitTelemetry(StatementTelemetryContext ctx) + /// + /// Replaces the legacy EmitTelemetry helper. Inspects the active reader for + /// chunk metrics, IsCompressed, and ResultFormat (preserving the PECO-2988 / PECO-2978 + /// behavior of sourcing these from the actual reader rather than connection-level + /// capability flags), then signals the observer with + /// , + /// , and finally + /// . OnFinalized is the + /// terminal call — both and + /// guarantee idempotency, so this method may be invoked from both the error path + /// (in Execute*) and the success path (in ). + /// + private void FinalizeExecuteTelemetry() { - try + if (_executeFinalized) { - ctx.RecordResultsConsumed(); + return; + } + _executeFinalized = true; - // Extract chunk metrics if this was a CloudFetch query. - // Check for both CloudFetchReader (direct) and DatabricksCompositeReader (wrapped). - ChunkMetrics? metrics = null; - if (_lastQueryResult?.Stream is CloudFetchReader cfReader) + StatementTelemetryContext? ctx = PendingTelemetryContext; + + // Extract chunk metrics if this was a CloudFetch query. Check for both + // CloudFetchReader (direct) and DatabricksCompositeReader (wrapped). In the error + // path _lastQueryResult is null and this whole block is a no-op. + ChunkMetrics? metrics = null; + if (_lastQueryResult?.Stream is CloudFetchReader cfReader) + { + try { - try - { - metrics = cfReader.GetChunkMetrics(); - } - catch - { - // Ignore errors retrieving chunk metrics - telemetry must not fail driver operations - } + metrics = cfReader.GetChunkMetrics(); } - else if (_lastQueryResult?.Stream is DatabricksCompositeReader compositeReader) + catch { - try - { - metrics = compositeReader.GetChunkMetrics(); - } - catch - { - // Ignore errors retrieving chunk metrics - telemetry must not fail driver operations - } + // Ignore errors retrieving chunk metrics - telemetry must not fail driver operations. } - - // Source IsCompressed and ResultFormat from the active reader, not connection-level - // capability flags. The composite reader holds the server-reported truth for both: - // IsLz4Compressed mirrors TGetResultSetMetadataResp.Lz4Compressed (drives both inline - // and CloudFetch decompression), and IsCloudFetchActive reflects whether the server - // returned result links for this statement (PECO-2988, PECO-2978). - if (_lastQueryResult?.Stream is DatabricksCompositeReader composite) + } + else if (_lastQueryResult?.Stream is DatabricksCompositeReader compositeReader) + { + try { - ctx.IsCompressed = composite.IsLz4Compressed; - ctx.ResultFormat = composite.IsCloudFetchActive - ? ExecutionResultFormat.ExternalLinks - : ExecutionResultFormat.InlineArrow; + metrics = compositeReader.GetChunkMetrics(); } - - // Set chunk details if we have metrics and at least one chunk was iterated - // (avoids leaking the -1 sentinel from InitialChunkLatencyMs when no chunks were downloaded) - if (metrics != null && metrics.TotalChunksIterated > 0) + catch { - ctx.SetChunkDetails( - metrics.TotalChunksPresent, - metrics.TotalChunksIterated, - metrics.InitialChunkLatencyMs, - metrics.SlowestChunkLatencyMs, - metrics.SumChunksDownloadTimeMs); + // Ignore errors retrieving chunk metrics - telemetry must not fail driver operations. } + } - OssSqlDriverTelemetryLog telemetryLog = ctx.BuildTelemetryLog(); - - var frontendLog = new TelemetryFrontendLog - { - WorkspaceId = ctx.WorkspaceId, - FrontendLogEventId = Guid.NewGuid().ToString(), - Context = new FrontendLogContext - { - TimestampMillis = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds(), - }, - Entry = new FrontendLogEntry - { - SqlDriverLog = telemetryLog - } - }; - - var session = ((DatabricksConnection)Connection).TelemetrySession; - session?.TelemetryClient?.Enqueue(frontendLog); + // Source IsCompressed and ResultFormat from the active reader, not connection-level + // capability flags. The composite reader holds the server-reported truth for both: + // IsLz4Compressed mirrors TGetResultSetMetadataResp.Lz4Compressed (drives both inline + // and CloudFetch decompression), and IsCloudFetchActive reflects whether the server + // returned result links for this statement (PECO-2988, PECO-2978). The observer's + // OnReaderInspected hook overwrites the same fields the legacy EmitTelemetry helper + // mutated immediately before BuildTelemetryLog, preserving byte-identical output + // without leaking the observer's internal StatementTelemetryContext to the statement. + if (_lastQueryResult?.Stream is DatabricksCompositeReader composite) + { + _observer.OnReaderInspected( + resultFormat: composite.IsCloudFetchActive + ? ExecutionResultFormat.ExternalLinks + : ExecutionResultFormat.InlineArrow, + isCompressed: composite.IsLz4Compressed); } - catch + + // Only emit chunk metrics if at least one chunk was iterated; avoids leaking the + // -1 sentinel from InitialChunkLatencyMs when no chunks were downloaded. + if (metrics != null && metrics.TotalChunksIterated > 0) { - // Telemetry must never impact driver operations + _observer.OnChunksDownloaded(metrics); } + + _observer.OnConsumed(ctx?.ExecuteStopwatch.ElapsedMilliseconds ?? 0); + _observer.OnFinalized(); } /// @@ -1300,11 +1353,15 @@ protected override void Dispose(bool disposing) { if (disposing) { - if (PendingTelemetryContext != null) + // Finalize the execute observer now that results have been consumed. This + // replaces the legacy EmitTelemetry call. Gated on _executeStarted so a + // statement disposed without execute does not enqueue a stray empty log + // through the observer (preserves byte-identical behavior with the prior + // PendingTelemetryContext!=null gate). FinalizeExecuteTelemetry itself is + // idempotent — the error path inside Execute* may have already finalized. + if (_executeStarted) { - // Emit telemetry now that results have been consumed - EmitTelemetry(PendingTelemetryContext); - PendingTelemetryContext = null; + FinalizeExecuteTelemetry(); } // Emit a CLOSE_STATEMENT telemetry event once per statement disposal, @@ -1316,22 +1373,21 @@ protected override void Dispose(bool disposing) // or the operation was already closed by direct-results), elapsedMs is 0 // and the event acts purely as a lifecycle marker. // _closeStatementTelemetryEmitted makes the emission idempotent across - // repeated Dispose() calls (PECO-2991). + // repeated Dispose() calls (PECO-2991). The connection-side telemetry + // implementation (NoOpConnectionTelemetry when disabled) makes the + // EmitStatementOperationTelemetry call itself a no-op, so no explicit + // TelemetrySession null-check is needed here. try { if (!_closeStatementTelemetryEmitted) { - var session = ((DatabricksConnection)Connection).TelemetrySession; - if (session?.TelemetryClient != null) - { - _closeStatementTelemetryEmitted = true; - ((DatabricksConnection)Connection).EmitStatementOperationTelemetry( - OperationType.CloseStatement, - Telemetry.Proto.Statement.Types.Type.Unspecified, - StatementId, - elapsedMs: CloseStatementRpcLatencyMs ?? 0, - error: CloseStatementRpcError); - } + _closeStatementTelemetryEmitted = true; + ((DatabricksConnection)Connection).EmitStatementOperationTelemetry( + OperationType.CloseStatement, + StatementType.Unspecified, + StatementId, + elapsedMs: CloseStatementRpcLatencyMs ?? 0, + error: CloseStatementRpcError); } } catch (Exception ex) @@ -1357,6 +1413,10 @@ public override void Cancel() // signals the cancellation token source and is idempotent across repeats // (PECO-2991). Each invocation emits its own event so analysts can see // duplicate user-initiated cancels. + // + // The connection-side telemetry implementation (NoOpConnectionTelemetry when + // disabled) makes the EmitStatementOperationTelemetry call itself a no-op, + // so no explicit TelemetrySession null-check is needed here. Exception? error = null; long startMs = _statementLifetimeStopwatch.ElapsedMilliseconds; try @@ -1372,17 +1432,13 @@ public override void Cancel() { try { - var session = ((DatabricksConnection)Connection).TelemetrySession; - if (session?.TelemetryClient != null) - { - long elapsedMs = _statementLifetimeStopwatch.ElapsedMilliseconds - startMs; - ((DatabricksConnection)Connection).EmitStatementOperationTelemetry( - OperationType.CancelStatement, - Telemetry.Proto.Statement.Types.Type.Unspecified, - StatementId, - elapsedMs, - error); - } + long elapsedMs = _statementLifetimeStopwatch.ElapsedMilliseconds - startMs; + ((DatabricksConnection)Connection).EmitStatementOperationTelemetry( + OperationType.CancelStatement, + StatementType.Unspecified, + StatementId, + elapsedMs, + error); } catch (Exception ex) { diff --git a/csharp/src/StatementExecution/StatementExecutionConnection.cs b/csharp/src/StatementExecution/StatementExecutionConnection.cs index daade95c..7e69c5b9 100644 --- a/csharp/src/StatementExecution/StatementExecutionConnection.cs +++ b/csharp/src/StatementExecution/StatementExecutionConnection.cs @@ -16,11 +16,14 @@ using System; using System.Collections.Generic; +using System.Diagnostics; using System.Linq; using System.Net.Http; using System.Threading; using System.Threading.Tasks; +using AdbcDrivers.Databricks.Auth; using AdbcDrivers.Databricks.Http; +using AdbcDrivers.Databricks.Telemetry; using AdbcDrivers.HiveServer2; using AdbcDrivers.HiveServer2.Hive2; using AdbcDrivers.Databricks.StatementExecution.MetadataCommands; @@ -55,6 +58,37 @@ internal class StatementExecutionConnection : TracingConnection, IGetObjectsData private string? _sessionId; private readonly SemaphoreSlim _sessionLock = new SemaphoreSlim(1, 1); + // Shared OAuth token provider for connection-wide token caching (used by telemetry HTTP client) + private OAuthClientCredentialsProvider? _oauthTokenProvider; + + // Telemetry — mirrors the DatabricksConnection (Thrift) wiring. Defaults to NoOp so + // callsites never need null checks; replaced by ConnectionTelemetry.Create after + // session open succeeds. Tests can substitute via TelemetryForTesting. + private IConnectionTelemetry _telemetry = NoOpConnectionTelemetry.Instance; + + // Stopwatch covering the connection lifetime; used to measure session-open latency for + // the CREATE_SESSION telemetry event. Matches the Thrift path's _connectionLifetimeStopwatch. + private readonly Stopwatch _connectionLifetimeStopwatch = Stopwatch.StartNew(); + private bool _sessionOpenTelemetryEmitted; + private bool _sessionDeleteTelemetryEmitted; + + /// + /// The session context consumed by SEA statements (next phase) to create + /// per-statement observer contexts. Returns null when telemetry is disabled. + /// + internal TelemetrySessionContext? TelemetrySession => _telemetry.Session; + + /// + /// Test seam allowing unit tests to substitute a fake + /// so they can verify the production code calls EmitOperationTelemetry at the + /// right lifecycle hooks. Production code never sets this property. + /// + internal IConnectionTelemetry TelemetryForTesting + { + get => _telemetry; + set => _telemetry = value; + } + // Configuration for statement creation private readonly string _resultDisposition; private readonly string _resultFormat; @@ -66,6 +100,25 @@ internal class StatementExecutionConnection : TracingConnection, IGetObjectsData private readonly bool _useDescTableExtended; private readonly bool _applySSPWithQueries; + // Direct-results capability flag — read from the same connection-string property the + // Thrift path uses (DatabricksParameters.EnableDirectResults). Mirrored into the + // telemetry payload's enable_direct_results field so dashboards reflect the user's + // actual configuration rather than a hardcoded literal. Defaults to true, matching + // DatabricksConnection (Thrift) defaults. + private readonly bool _enableDirectResults; + + // Default connect-timeout value used when the connection string omits one. + // Mirrors the Thrift path's HiveServer2Connection.ConnectTimeoutMillisecondsDefault + // so dashboards filtering on socket_timeout see consistent values across both transports. + private const int ConnectTimeoutMillisecondsDefault = 30000; + + // Connect timeout in milliseconds, read from the same connection-string property the + // Thrift path uses (SparkParameters.ConnectTimeoutMilliseconds). Used only to stamp the + // telemetry payload's socket_timeout field — distinct from _waitTimeoutSeconds, which is + // the SEA query-wait (CONTINUE) timeout and unrelated to the connection-establishment + // timeout the telemetry field represents. + private readonly int _connectTimeoutMilliseconds; + // Memory pooling (shared across connection) private readonly Microsoft.IO.RecyclableMemoryStreamManager _recyclableMemoryStreamManager; private readonly System.Buffers.ArrayPool _lz4BufferPool; @@ -212,6 +265,7 @@ private StatementExecutionConnection( // the server validates differently in CreateSession vs SET) get the same semantics // on SEA. JDBC has no equivalent flag — this is parity with the Thrift driver only. _applySSPWithQueries = PropertyHelper.GetBooleanPropertyWithValidation(properties, DatabricksParameters.ApplySSPWithQueries, false); + _enableDirectResults = PropertyHelper.GetBooleanPropertyWithValidation(properties, DatabricksParameters.EnableDirectResults, true); // Session configuration // Only supply catalog from connection properties when EnableMultipleCatalogSupport is true. @@ -243,6 +297,13 @@ private StatementExecutionConnection( _pollingIntervalMs = PropertyHelper.GetPositiveIntPropertyWithValidation( properties, ApacheParameters.PollTimeMilliseconds, defaultValue: 1000); + // Connect timeout — used only for telemetry-init stamping (socket_timeout field). + // Read from the same connection-string property the Thrift path uses so SEA and + // Thrift records agree on the same source. NOT derived from _waitTimeoutSeconds: + // that field is the SEA CONTINUE/query-wait timeout, a semantically different concept. + _connectTimeoutMilliseconds = PropertyHelper.GetIntPropertyWithValidation( + properties, SparkParameters.ConnectTimeoutMilliseconds, ConnectTimeoutMillisecondsDefault); + // Memory pooling _recyclableMemoryStreamManager = memoryStreamManager ?? new Microsoft.IO.RecyclableMemoryStreamManager(); _lz4BufferPool = lz4BufferPool ?? System.Buffers.ArrayPool.Create(maxArrayLength: 4 * 1024 * 1024, maxArraysPerBucket: 10); @@ -299,6 +360,17 @@ private HttpClient CreateHttpClient(IReadOnlyDictionary properti int rateLimitRetryTimeout = PropertyHelper.GetIntPropertyWithValidation(properties, DatabricksParameters.RateLimitRetryTimeout, DatabricksConstants.DefaultRateLimitRetryTimeout); int timeoutMinutes = PropertyHelper.GetPositiveIntPropertyWithValidation(properties, DatabricksParameters.CloudFetchTimeoutMinutes, DatabricksConstants.DefaultCloudFetchTimeoutMinutes); + // Resolve the effective HttpClient.Timeout. Prefer the user-supplied + // SparkParameters.ConnectTimeoutMilliseconds (matching the Thrift path, which + // wires that property into its transport-level connect timeout) so that the + // socket_timeout telemetry field — which downstream gap fixes source from + // _httpClient.Timeout — reflects the user's intent. Fall back to + // CloudFetchTimeoutMinutes-based timeout when the property is not set, to + // preserve the historical default behavior for callers that never tuned this. + TimeSpan effectiveTimeout = properties.ContainsKey(SparkParameters.ConnectTimeoutMilliseconds) + ? TimeSpan.FromMilliseconds(_connectTimeoutMilliseconds) + : TimeSpan.FromMinutes(timeoutMinutes); + var config = new HttpHandlerFactory.HandlerConfig { BaseHandler = HttpClientFactory.CreateHandler(properties), @@ -319,11 +391,14 @@ private HttpClient CreateHttpClient(IReadOnlyDictionary properti AddThriftErrorHandler = false }; - var result = HttpHandlerFactory.CreateHandlers(config); + // Use CreateHandlersWithTokenProvider so we can capture the OAuth token provider + // and reuse it for the telemetry HTTP client (avoids duplicate token fetches). + var result = HttpHandlerFactory.CreateHandlersWithTokenProvider(config); + _oauthTokenProvider = result.TokenProvider; - var httpClient = new HttpClient(result) + var httpClient = new HttpClient(result.Handler) { - Timeout = TimeSpan.FromMinutes(timeoutMinutes) + Timeout = effectiveTimeout }; // Set user agent @@ -430,6 +505,13 @@ public async Task OpenAsync(CancellationToken cancellationToken = default) { _catalog = GetCurrentCatalog(); } + + // Initialize telemetry after a successful CreateSession. Telemetry is fail-open: + // any failure here is swallowed and falls back to NoOpConnectionTelemetry so the + // user-visible connection open still succeeds. + InitializeTelemetry(Activity.Current); + + EmitCreateSessionTelemetry(Activity.Current); } } finally @@ -439,6 +521,105 @@ public async Task OpenAsync(CancellationToken cancellationToken = default) } } + /// + /// Initializes telemetry via the ConnectionTelemetry factory. Mirrors the Thrift + /// path's DatabricksConnection.InitializeTelemetry. Falls back to + /// on any failure so connection open + /// always succeeds even if the telemetry pipeline is misconfigured. + /// + private void InitializeTelemetry(Activity? activity) + { + try + { + // SEA hands its server-assigned session id string directly — no transport handle + // to unwrap, unlike the Thrift caller which converts TSessionHandle->string. + _telemetry = Telemetry.ConnectionTelemetry.Create( + properties: _properties, + host: GetHost(_properties), + assemblyVersion: AssemblyVersion, + oauthTokenProvider: _oauthTokenProvider, + sessionId: _sessionId ?? string.Empty, + mode: Telemetry.Proto.DriverMode.Types.Type.Sea, + enableDirectResults: _enableDirectResults, + useDescTableExtended: _useDescTableExtended, + connectTimeoutMilliseconds: _connectTimeoutMilliseconds, + activity: activity); + } + catch (Exception ex) + { + // Defensive: ConnectionTelemetry.Create already swallows internally, but per the + // design contract telemetry init must never escape into the caller's open path. + activity?.AddEvent(new ActivityEvent("telemetry.initialization.error", + tags: new ActivityTagsCollection + { + { "error.type", ex.GetType().Name }, + { "error.message", ex.Message } + })); + _telemetry = NoOpConnectionTelemetry.Instance; + } + } + + /// + /// Emits the CREATE_SESSION telemetry event after a successful CreateSession RPC. + /// Internal so unit tests can call this directly after injecting a fake + /// via . + /// + internal void EmitCreateSessionTelemetry(Activity? activity = null) + { + try + { + long elapsedMs = _connectionLifetimeStopwatch.ElapsedMilliseconds; + _telemetry.EmitOperationTelemetry( + Telemetry.Proto.Operation.Types.Type.CreateSession, + Telemetry.Proto.Statement.Types.Type.Unspecified, + statementId: null, + elapsedMs: elapsedMs, + error: null); + _sessionOpenTelemetryEmitted = true; + } + catch (Exception ex) + { + activity?.AddEvent(new ActivityEvent("telemetry.emit.error", + tags: new ActivityTagsCollection + { + { "error.type", ex.GetType().Name }, + { "error.message", ex.Message }, + { "operation_type", "CREATE_SESSION" } + })); + } + } + + /// + /// Emits the DELETE_SESSION telemetry event when a session was previously opened. + /// Idempotent across repeated calls. Internal for test access. + /// + internal void EmitDeleteSessionTelemetry(long elapsedMs = 0, Exception? error = null) + { + try + { + if (_sessionOpenTelemetryEmitted && !_sessionDeleteTelemetryEmitted) + { + _sessionDeleteTelemetryEmitted = true; + _telemetry.EmitOperationTelemetry( + Telemetry.Proto.Operation.Types.Type.DeleteSession, + Telemetry.Proto.Statement.Types.Type.Unspecified, + statementId: null, + elapsedMs: elapsedMs, + error: error); + } + } + catch (Exception ex) + { + Activity.Current?.AddEvent(new ActivityEvent("telemetry.emit.error", + tags: new ActivityTagsCollection + { + { "error.type", ex.GetType().Name }, + { "error.message", ex.Message }, + { "operation_type", "DELETE_SESSION" } + })); + } + } + /// /// Creates a new statement for query execution. /// @@ -950,6 +1131,9 @@ public async Task ApplyServerSidePropertiesAsync(CancellationToken cancellationT /// /// Disposes the connection and deletes the session if it exists. + /// Emits DELETE_SESSION telemetry with the measured DeleteSession RPC latency + /// (or zero if no session was ever opened), then flushes the telemetry pipeline + /// with a 5-second hard timeout so a wedged exporter cannot hang Dispose. /// public override void Dispose() { @@ -958,27 +1142,57 @@ public override void Dispose() activity?.SetTag("session_id", _sessionId); activity?.SetTag("warehouse_id", _warehouseId); + long deleteSessionElapsedMs = 0; + Exception? deleteSessionError = null; + if (_sessionId != null) { + var deleteStopwatch = Stopwatch.StartNew(); try { - activity?.AddEvent(new System.Diagnostics.ActivityEvent("session.delete.start")); + activity?.AddEvent(new ActivityEvent("session.delete.start")); // Delete session synchronously during dispose _client.DeleteSessionAsync(_sessionId, _warehouseId, CancellationToken.None).GetAwaiter().GetResult(); - activity?.AddEvent(new System.Diagnostics.ActivityEvent("session.delete.success")); + activity?.AddEvent(new ActivityEvent("session.delete.success")); } catch (Exception ex) { + deleteSessionError = ex; // Best effort - ignore errors during dispose but trace them - activity?.AddEvent(new System.Diagnostics.ActivityEvent("session.delete.error", - tags: new System.Diagnostics.ActivityTagsCollection { { "error", ex.Message } })); + activity?.AddEvent(new ActivityEvent("session.delete.error", + tags: new ActivityTagsCollection { { "error", ex.Message } })); } finally { + deleteStopwatch.Stop(); + deleteSessionElapsedMs = deleteStopwatch.ElapsedMilliseconds; _sessionId = null; } } + // Emit DELETE_SESSION after the RPC completes (success or failure) and before we + // tear down the telemetry client. Idempotent — repeated Dispose calls fire once. + EmitDeleteSessionTelemetry(deleteSessionElapsedMs, deleteSessionError); + + // Flush + release the telemetry client. Bounded at 5 seconds so a stuck exporter + // cannot hang connection close. Per design §11: this matches the Thrift path's + // DisposeTelemetryAsync().GetAwaiter().GetResult() pattern, but with a hard + // timeout because SEA Dispose is not wrapped in a try/finally that flushes. + // IConnectionTelemetry.DisposeAsync already returns Task, so we call .Wait directly. + try + { + _telemetry.DisposeAsync().Wait(TimeSpan.FromSeconds(5)); + } + catch (Exception ex) + { + activity?.AddEvent(new ActivityEvent("telemetry.dispose.error", + tags: new ActivityTagsCollection + { + { "error.type", ex.GetType().Name }, + { "error.message", ex.Message } + })); + } + // Dispose the HTTP client if we own it if (_ownsHttpClient) { diff --git a/csharp/src/Telemetry/ConnectionTelemetry.cs b/csharp/src/Telemetry/ConnectionTelemetry.cs index 00f87576..a32f950b 100644 --- a/csharp/src/Telemetry/ConnectionTelemetry.cs +++ b/csharp/src/Telemetry/ConnectionTelemetry.cs @@ -29,7 +29,6 @@ using AdbcDrivers.HiveServer2; using AdbcDrivers.HiveServer2.Spark; using Apache.Arrow.Adbc; -using Apache.Hive.Service.Rpc.Thrift; namespace AdbcDrivers.Databricks.Telemetry { @@ -62,12 +61,22 @@ internal ConnectionTelemetry( /// Returns if telemetry is disabled, misconfigured, or fails to initialize. /// Never throws. /// + /// + /// The transport-agnostic session id (a GUID string for Thrift, server-assigned id for SEA). + /// Callers convert at the boundary so this method has no transport-specific dependency. + /// + /// + /// Driver transport mode (THRIFT or SEA) stamped onto + /// driver_connection_params.mode. Threaded through from the caller so the + /// telemetry payload reflects the real transport. + /// public static IConnectionTelemetry Create( IReadOnlyDictionary properties, string host, string assemblyVersion, OAuthClientCredentialsProvider? oauthTokenProvider, - TSessionHandle? sessionHandle, + string sessionId, + Proto.DriverMode.Types.Type mode, bool enableDirectResults, bool useDescTableExtended, int connectTimeoutMilliseconds, @@ -115,15 +124,13 @@ public static IConnectionTelemetry Create( SafeBuildSystemConfiguration(assemblyVersion, activity); Proto.DriverConnectionParameters driverConnectionParams = SafeBuildDriverConnectionParams( - properties, host, enableDirectResults, useDescTableExtended, + properties, host, mode, enableDirectResults, useDescTableExtended, connectTimeoutMilliseconds, activity); string authType = SafeDetermineAuthType(properties, activity); var session = new TelemetrySessionContext { - SessionId = sessionHandle?.SessionId?.Guid != null - ? new System.Guid(sessionHandle.SessionId.Guid).ToString() - : null, + SessionId = !string.IsNullOrEmpty(sessionId) ? sessionId : null, TelemetryClient = telemetryClient, SystemConfiguration = systemConfiguration, DriverConnectionParams = driverConnectionParams, @@ -430,6 +437,7 @@ internal static Proto.DriverSystemConfiguration SafeBuildSystemConfiguration( internal static Proto.DriverConnectionParameters SafeBuildDriverConnectionParams( IReadOnlyDictionary properties, string host, + Proto.DriverMode.Types.Type mode, bool enableDirectResults, bool useDescTableExtended, int connectTimeoutMilliseconds, @@ -438,7 +446,7 @@ internal static Proto.DriverConnectionParameters SafeBuildDriverConnectionParams try { return BuildDriverConnectionParams( - properties, host, enableDirectResults, useDescTableExtended, + properties, host, mode, enableDirectResults, useDescTableExtended, connectTimeoutMilliseconds); } catch (Exception ex) @@ -455,7 +463,7 @@ internal static Proto.DriverConnectionParameters SafeBuildDriverConnectionParams return new Proto.DriverConnectionParameters { HttpPath = string.Empty, - Mode = Proto.DriverMode.Types.Type.Thrift, + Mode = mode, HostInfo = new Proto.HostDetails { HostUrl = host ?? string.Empty, @@ -624,6 +632,7 @@ internal static string UnquoteOsReleaseValue(string raw) internal static Proto.DriverConnectionParameters BuildDriverConnectionParams( IReadOnlyDictionary properties, string host, + Proto.DriverMode.Types.Type mode, bool enableDirectResults, bool useDescTableExtended, int connectTimeoutMilliseconds) @@ -639,7 +648,7 @@ internal static Proto.DriverConnectionParameters BuildDriverConnectionParams( var connectionParams = new Proto.DriverConnectionParameters { HttpPath = httpPath ?? "", - Mode = Proto.DriverMode.Types.Type.Thrift, + Mode = mode, HostInfo = new Proto.HostDetails { // Bare hostname, matching JDBC. Scheme is implicit (always https) and diff --git a/csharp/src/Telemetry/IStatementOperationObserver.cs b/csharp/src/Telemetry/IStatementOperationObserver.cs new file mode 100644 index 00000000..12600efd --- /dev/null +++ b/csharp/src/Telemetry/IStatementOperationObserver.cs @@ -0,0 +1,130 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; + +namespace AdbcDrivers.Databricks.Telemetry +{ + /// + /// Observer of a single statement's operational lifecycle. Sits between the statement + /// classes (Thrift and SEA) and the telemetry implementation so the statement code + /// does not depend on telemetry types directly. + /// + /// + /// + /// Fail-open contract: Implementations MUST NOT throw from any method on + /// this interface. All exceptions raised inside an implementation must be swallowed + /// internally — callsites in statement code intentionally contain no try/catch around + /// observer calls. Telemetry must never affect the caller's control flow. + /// + /// + /// Thread-safety: Methods on this interface may be invoked from any thread. + /// Implementations MUST be thread-safe. In practice the calls happen from the + /// statement's executing task and the reader's disposal thread, which may differ. + /// + /// + /// Terminal call: is the terminal call. After it has + /// been invoked once, the observer's record is considered complete and any further + /// calls — including additional calls — MUST be no-ops + /// (idempotent). This protects against the common case where both an error path + /// and a dispose path attempt to finalize the same statement. + /// + /// + /// Non-telemetry uses: The interface is shaped around the statement's lifecycle + /// rather than the telemetry data model so future observers (tracing, audit) can be + /// added without changing statement code. + /// + /// + internal interface IStatementOperationObserver + { + /// + /// Called once just before the statement is submitted to the server. + /// + /// The statement type (QUERY, UPDATE, METADATA, ...). + /// The operation type (EXECUTE_STATEMENT, EXECUTE_STATEMENT_ASYNC, ...). + /// Whether results are expected to be compressed (LZ4). + void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed); + + /// + /// Called once after the server has accepted the statement and a statement id is known. + /// + /// The server-assigned statement id. + /// The result format inferred for the execution (INLINE_ARROW, EXTERNAL_LINKS, ...). + void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat); + + /// + /// Called once after the polling loop reaches a terminal state, with the accumulated + /// poll count and total elapsed poll latency. + /// + /// Total number of status-poll calls issued. + /// Sum of wall-clock time spent in poll calls, in milliseconds. + void OnPollCompleted(int count, long latencyMs); + + /// + /// Called when the first batch of results is available to the reader. + /// Implementations should treat repeated calls as a no-op (only the first wins). + /// + /// Elapsed time from execute start to first batch ready, in milliseconds. + void OnFirstBatchReady(long latencyMs); + + /// + /// Called when the reader has been fully consumed (or disposed). + /// + /// Elapsed time from execute start to results fully consumed, in milliseconds. + void OnConsumed(long latencyMs); + + /// + /// Called once when chunk metrics for the CloudFetch download are available. + /// + /// Aggregated CloudFetch chunk metrics. Implementations must + /// tolerate empty / default metrics if the gap-fix plumbing has not landed yet. + void OnChunksDownloaded(ChunkMetrics metrics); + + /// + /// Called once after the active result reader has been inspected, with the values + /// that should override any defaults previously stamped on the observer at execute + /// time. The statement invokes this from the finalize path so the emitted record + /// reflects the server-reported truth for the active reader (PECO-2988, PECO-2978) + /// rather than the connection-level capability flags used as placeholders earlier + /// in the lifecycle. Implementations should overwrite their stored result-format + /// and compression fields unconditionally. + /// + /// The result format reported by the active reader + /// (e.g. when CloudFetch is + /// active, otherwise). + /// Whether the active reader's payload is LZ4-compressed, + /// per the server's TGetResultSetMetadataResp.Lz4Compressed flag. + void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed); + + /// + /// Called once if the statement execution fails. Implementations should record the + /// error and continue to accept further calls; an explicit + /// call is still required to terminate the observer. + /// + /// The exception that occurred. + void OnError(Exception ex); + + /// + /// Terminal call. Implementations build and dispatch any pending record and mark + /// the observer as finalized. Must be idempotent: repeated calls are no-ops. + /// + void OnFinalized(); + } +} diff --git a/csharp/src/Telemetry/NullObserver.cs b/csharp/src/Telemetry/NullObserver.cs new file mode 100644 index 00000000..b2d65a73 --- /dev/null +++ b/csharp/src/Telemetry/NullObserver.cs @@ -0,0 +1,99 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; + +namespace AdbcDrivers.Databricks.Telemetry +{ + /// + /// Singleton no-op implementation of . + /// Used as the default observer so callsites in statement classes never need to + /// null-check before calling observer methods. + /// + /// + /// All methods are intentionally empty. They satisfy the fail-open, thread-safe, + /// and idempotent contract trivially. + /// + internal sealed class NullObserver : IStatementOperationObserver + { + /// + /// The singleton instance. Use this directly rather than constructing new instances. + /// + public static readonly NullObserver Instance = new NullObserver(); + + private NullObserver() + { + } + + /// + public void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed) + { + // No-op. + } + + /// + public void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat) + { + // No-op. + } + + /// + public void OnPollCompleted(int count, long latencyMs) + { + // No-op. + } + + /// + public void OnFirstBatchReady(long latencyMs) + { + // No-op. + } + + /// + public void OnConsumed(long latencyMs) + { + // No-op. + } + + /// + public void OnChunksDownloaded(ChunkMetrics metrics) + { + // No-op. + } + + /// + public void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed) + { + // No-op. + } + + /// + public void OnError(Exception ex) + { + // No-op. + } + + /// + public void OnFinalized() + { + // No-op. Idempotent by construction. + } + } +} diff --git a/csharp/src/Telemetry/SafeObserver.cs b/csharp/src/Telemetry/SafeObserver.cs new file mode 100644 index 00000000..dd20a49b --- /dev/null +++ b/csharp/src/Telemetry/SafeObserver.cs @@ -0,0 +1,151 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Diagnostics; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; + +namespace AdbcDrivers.Databricks.Telemetry +{ + /// + /// Belt-and-suspenders decorator over any . + /// Wraps every interface method in a per-method try/catch so an inner observer that + /// violates the fail-open contract cannot + /// surface its exception to the statement callsite. + /// + /// + /// + /// The first-party observers in this assembly (, + /// ) already honor the fail-open contract internally. + /// This decorator exists for the optional case where a third-party observer is + /// plugged in — see design §12, "optional SafeObserver decorator is available for + /// future third-party observer implementations that may not honor the contract." + /// + /// + /// Swallowed exceptions are recorded as an OpenTelemetry activity event + /// (telemetry.observer.suppressed) on the ambient , if + /// any. This is the codebase's convention for trace-level diagnostics: it remains + /// visible in distributed traces and verbose tooling without polluting standard + /// logs or affecting the caller's control flow. + /// + /// + /// Thread-safety: Inherits whatever thread-safety the inner observer provides. + /// The decorator itself stores only the inner reference and adds no mutable state. + /// + /// + internal sealed class SafeObserver : IStatementOperationObserver + { + private readonly IStatementOperationObserver _inner; + + /// + /// Wraps with a per-method try/catch. + /// + /// The observer to delegate to. Required. + /// Thrown if is null. + public SafeObserver(IStatementOperationObserver inner) + { + _inner = inner ?? throw new ArgumentNullException(nameof(inner)); + } + + /// + /// Exposes the wrapped observer for tests and diagnostic introspection. + /// + internal IStatementOperationObserver Inner => _inner; + + /// + public void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed) => + Safe(() => _inner.OnExecuteStarted(stmtType, opType, isCompressed)); + + /// + public void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat) => + Safe(() => _inner.OnExecuteSucceeded(statementId, resultFormat)); + + /// + public void OnPollCompleted(int count, long latencyMs) => + Safe(() => _inner.OnPollCompleted(count, latencyMs)); + + /// + public void OnFirstBatchReady(long latencyMs) => + Safe(() => _inner.OnFirstBatchReady(latencyMs)); + + /// + public void OnConsumed(long latencyMs) => + Safe(() => _inner.OnConsumed(latencyMs)); + + /// + public void OnChunksDownloaded(ChunkMetrics metrics) => + Safe(() => _inner.OnChunksDownloaded(metrics)); + + /// + public void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed) => + Safe(() => _inner.OnReaderInspected(resultFormat, isCompressed)); + + /// + public void OnError(Exception ex) => + Safe(() => _inner.OnError(ex)); + + /// + public void OnFinalized() => + Safe(() => _inner.OnFinalized()); + + /// + /// Executes and swallows any exception, surfacing it as + /// a trace-level on the ambient . + /// + private static void Safe(Action action) + { + try + { + action(); + } + catch (Exception ex) + { + try + { + Activity.Current?.AddEvent(new ActivityEvent("telemetry.observer.suppressed", + tags: new ActivityTagsCollection + { + { "error.type", ex.GetType().Name }, + { "error.message", SafeMessage(ex) }, + { "observer.suppressed.source", "SafeObserver" }, + })); + } + catch + { + // Recording the suppression must not itself throw. Intentionally empty. + } + } + } + + // Some exceptions throw from their .Message property (rare but observed in the + // wild for user-defined types). Wrap it so the trace event can never become a + // source of observer failure. + private static string? SafeMessage(Exception ex) + { + try + { + return ex.Message; + } + catch + { + return null; + } + } + } +} diff --git a/csharp/src/Telemetry/TelemetryObserver.cs b/csharp/src/Telemetry/TelemetryObserver.cs new file mode 100644 index 00000000..e9750ecc --- /dev/null +++ b/csharp/src/Telemetry/TelemetryObserver.cs @@ -0,0 +1,269 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Diagnostics; +using System.Threading; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using AdbcDrivers.Databricks.Telemetry.Models; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; + +namespace AdbcDrivers.Databricks.Telemetry +{ + /// + /// Default implementation that translates + /// observer method calls into mutations on a + /// and, on , builds a + /// and enqueues it on the session's telemetry client for background export. + /// + /// + /// + /// Fail-open: Every public method routes through the private Safe(Action) + /// helper, which swallows any exception and emits an OpenTelemetry activity event. This + /// concentrates the fail-open contract in exactly one place rather than scattering + /// try/catch boilerplate across each method body. + /// + /// + /// Thread-safe: The scalar writes into the per-statement context are inherently + /// safe for the lifecycle calls (each is called at most a small number of times from + /// the statement's execution path or the reader's disposal thread). The terminal + /// uses + /// on an _emitted flag to guarantee exactly-once enqueue even if it is invoked + /// concurrently from multiple threads (e.g. error path + dispose path). + /// + /// + /// Non-blocking: only enqueues the log into the + /// telemetry client's internal queue. The actual HTTP export runs on the client's + /// background flush timer and never blocks the calling thread. + /// + /// + internal sealed class TelemetryObserver : IStatementOperationObserver + { + // 0 = not yet emitted, 1 = emitted. Mutated via Interlocked.CompareExchange so the + // terminal OnFinalized() is exactly-once even under concurrent invocation. + private int _emitted; + + private readonly TelemetrySessionContext _session; + private readonly StatementTelemetryContext _ctx; + + /// + /// Initializes a new observer that records into a freshly created + /// seeded from . + /// + /// Per-connection telemetry session context. Required. + public TelemetryObserver(TelemetrySessionContext session) + : this(session, new StatementTelemetryContext(session ?? throw new ArgumentNullException(nameof(session)))) + { + } + + // Test seam: allows unit tests to inject a pre-populated context. Internal so + // it does not leak from the assembly. + internal TelemetryObserver(TelemetrySessionContext session, StatementTelemetryContext context) + { + _session = session ?? throw new ArgumentNullException(nameof(session)); + _ctx = context ?? throw new ArgumentNullException(nameof(context)); + } + + /// + /// Internal accessor for the underlying context, exposed for unit tests that need + /// to assert per-field state without building a full telemetry log. + /// + internal StatementTelemetryContext Context => _ctx; + + /// + /// Internal accessor for the idempotency flag, exposed for unit tests that need to + /// confirm exactly-once semantics without reaching into the enqueue path. + /// + internal bool HasEmitted => Volatile.Read(ref _emitted) != 0; + + /// + public void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed) => + Safe(() => + { + _ctx.StatementType = stmtType; + _ctx.OperationType = opType; + _ctx.IsCompressed = isCompressed; + }); + + /// + public void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat) => + Safe(() => + { + _ctx.StatementId = statementId; + _ctx.ResultFormat = resultFormat; + _ctx.RecordExecuteComplete(); + }); + + /// + public void OnPollCompleted(int count, long latencyMs) => + Safe(() => + { + _ctx.PollCount = count; + _ctx.PollLatencyMs = latencyMs; + }); + + /// + public void OnFirstBatchReady(long latencyMs) => + Safe(() => + { + // Only the first call wins: the underlying setter is null-guarded so + // repeated calls (e.g. inline reader emits one, cloud-fetch reader emits + // another) do not overwrite the earliest observed latency. + if (_ctx.FirstBatchReadyMs == null) + { + _ctx.FirstBatchReadyMs = latencyMs; + } + }); + + /// + public void OnConsumed(long latencyMs) => + Safe(() => _ctx.ResultsConsumedMs = latencyMs); + + /// + public void OnChunksDownloaded(ChunkMetrics metrics) => + Safe(() => + { + // Tolerate a null or empty ChunkMetrics — the gap-fix plumbing may not be + // landed yet, and the proto fields are nullable on the wire. + if (metrics == null) + { + return; + } + _ctx.SetChunkDetails( + metrics.TotalChunksPresent, + metrics.TotalChunksIterated, + metrics.InitialChunkLatencyMs, + metrics.SlowestChunkLatencyMs, + metrics.SumChunksDownloadTimeMs); + }); + + /// + public void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed) => + Safe(() => + { + // Overwrite the placeholders stamped at OnExecuteStarted time with the + // server-reported truth for the active reader (PECO-2988, PECO-2978). + // Mutating the same fields directly preserves byte-identical output vs. + // the legacy EmitTelemetry helper, which set both fields immediately + // before BuildTelemetryLog. + _ctx.ResultFormat = resultFormat; + _ctx.IsCompressed = isCompressed; + }); + + /// + public void OnError(Exception ex) => + Safe(() => + { + if (ex == null) + { + return; + } + _ctx.HasError = true; + // GetType().Name is always safe; do not capture ex.Message here unless we + // have explicit consent: the proto's DriverErrorInfo.error_message field is + // pending LPP review (see ConnectionTelemetry.EmitOperationTelemetry). + _ctx.ErrorName = ex.GetType().Name; + _ctx.ErrorMessage = SafeMessage(ex); + }); + + /// + public void OnFinalized() + { + // Idempotency gate: only the first caller proceeds. Doing this outside Safe() + // ensures even a defective Safe() helper can't bypass the once-only semantics. + if (Interlocked.CompareExchange(ref _emitted, 1, 0) != 0) + { + return; + } + + Safe(() => + { + ITelemetryClient? client = _session.TelemetryClient; + if (client == null) + { + // Telemetry is disabled at the session level. The idempotency flag has + // already been set so a subsequent OnFinalized() remains a no-op. + return; + } + + Proto.OssSqlDriverTelemetryLog log = _ctx.BuildTelemetryLog(); + TelemetryFrontendLog frontendLog = new TelemetryFrontendLog + { + WorkspaceId = _ctx.WorkspaceId, + FrontendLogEventId = Guid.NewGuid().ToString(), + Context = new FrontendLogContext + { + TimestampMillis = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds(), + }, + Entry = new FrontendLogEntry + { + SqlDriverLog = log, + }, + }; + + // Enqueue is non-blocking; the client buffers events and flushes on a + // background timer. No HTTP I/O happens on the calling thread. + client.Enqueue(frontendLog); + }); + } + + /// + /// Concentrates the fail-open try/catch in one place. Any exception is suppressed + /// and surfaced as an OpenTelemetry activity event so it is still observable in + /// traces without affecting the caller's control flow. + /// + private static void Safe(Action action) + { + try + { + action(); + } + catch (Exception ex) + { + try + { + Activity.Current?.AddEvent(new ActivityEvent("telemetry.observer.suppressed", + tags: new ActivityTagsCollection + { + { "error.type", ex.GetType().Name }, + { "error.message", ex.Message } + })); + } + catch + { + // Recording the suppression must not itself throw. Intentionally empty. + } + } + } + + // Some exceptions throw from their .Message property (rare but observed in the + // wild for user-defined types). Wrap it so OnError can never become a source of + // observer failure. + private static string? SafeMessage(Exception ex) + { + try + { + return ex.Message; + } + catch + { + return null; + } + } + } +} diff --git a/csharp/test/Unit/StatementExecution/StatementExecutionConnectionTelemetryTests.cs b/csharp/test/Unit/StatementExecution/StatementExecutionConnectionTelemetryTests.cs new file mode 100644 index 00000000..89e9a43f --- /dev/null +++ b/csharp/test/Unit/StatementExecution/StatementExecutionConnectionTelemetryTests.cs @@ -0,0 +1,600 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Collections.Generic; +using System.Diagnostics; +using System.Reflection; +using System.Threading; +using System.Threading.Tasks; +using AdbcDrivers.Databricks.StatementExecution; +using AdbcDrivers.Databricks.Telemetry; +using AdbcDrivers.Databricks.Telemetry.Models; +using AdbcDrivers.HiveServer2.Spark; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.StatementExecution +{ + /// + /// Unit tests for telemetry wiring in + /// (PECO-3022 T5). Mirrors the Thrift-side DriverTelemetryWiringTests pattern: + /// these tests exercise the production code paths (EmitCreateSessionTelemetry, + /// EmitDeleteSessionTelemetry, OpenAsync, Dispose) against a + /// fake injected via the + /// TelemetryForTesting seam so we can verify CREATE_SESSION/DELETE_SESSION + /// emissions, the fail-open init contract, and the 5-second Dispose flush timeout. + /// + public class StatementExecutionConnectionTelemetryTests + { + // ── Fakes ────────────────────────────────────────────────────────────────── + + /// + /// Records every EmitOperationTelemetry call so tests can assert which + /// operation types fired, in what order, and with what payload. + /// + private sealed class RecordingTelemetry : IConnectionTelemetry + { + public List Calls { get; } = new(); + public List Errors { get; } = new(); + public List StatementIds { get; } = new(); + public Dictionary LatenciesByOp { get; } = new(); + public TelemetrySessionContext? Session { get; } + = new TelemetrySessionContext + { + SessionId = "sea-session-1", + }; + public int DisposeCount { get; private set; } + + public T ExecuteWithMetadataTelemetry( + OperationType operationType, + Func operation, + Activity? activity) => operation(); + + public void EmitOperationTelemetry( + OperationType operationType, + StatementType statementType, + string? statementId, + long elapsedMs, + Exception? error) + { + Calls.Add(operationType); + Errors.Add(error); + StatementIds.Add(statementId); + LatenciesByOp[operationType] = elapsedMs; + } + + public Task DisposeAsync() + { + DisposeCount++; + return Task.CompletedTask; + } + } + + /// + /// Telemetry that intentionally hangs in DisposeAsync. Used to verify the + /// 5-second hard timeout on . + /// + private sealed class HangingTelemetry : IConnectionTelemetry + { + public TelemetrySessionContext? Session { get; } + = new TelemetrySessionContext { SessionId = "sea-session-hang" }; + + public T ExecuteWithMetadataTelemetry( + OperationType operationType, + Func operation, + Activity? activity) => operation(); + + public void EmitOperationTelemetry( + OperationType operationType, + StatementType statementType, + string? statementId, + long elapsedMs, + Exception? error) + { + // No-op + } + + // Block forever — the production Dispose must time this out at 5 seconds. + public Task DisposeAsync() => new TaskCompletionSource().Task; + } + + /// + /// Telemetry that throws on every EmitOperationTelemetry call. The connection's + /// emit helpers must swallow these so Dispose / OpenAsync don't fail. + /// + private sealed class ThrowingTelemetry : IConnectionTelemetry + { + public TelemetrySessionContext? Session => null; + + public T ExecuteWithMetadataTelemetry( + OperationType operationType, + Func operation, + Activity? activity) => operation(); + + public void EmitOperationTelemetry( + OperationType operationType, + StatementType statementType, + string? statementId, + long elapsedMs, + Exception? error) + { + throw new InvalidOperationException("emit blew up"); + } + + public Task DisposeAsync() => Task.CompletedTask; + } + + // ── Helpers ──────────────────────────────────────────────────────────────── + + private static Dictionary CreateBaseProperties() + { + return new Dictionary + { + { SparkParameters.HostName, "telemetry-test.cloud.databricks.com" }, + { DatabricksParameters.WarehouseId, "test-warehouse-id" }, + { SparkParameters.AccessToken, "test-token" } + }; + } + + private static StatementExecutionConnection CreateConnection() + { + return new StatementExecutionConnection(CreateBaseProperties()); + } + + /// + /// Reflectively flips _sessionId from null to a fake id so Dispose + /// believes a session exists and runs the DeleteSession path. The reflective + /// access mirrors what other StatementExecutionConnection unit tests do for + /// _identityFederationClientId. + /// + private static void SetFakeSessionId(StatementExecutionConnection connection, string sessionId) + { + var field = typeof(StatementExecutionConnection).GetField( + "_sessionId", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(field); + field!.SetValue(connection, sessionId); + } + + // ── Tests ────────────────────────────────────────────────────────────────── + + [Fact] + public void EmitCreateSessionTelemetry_FiresCreateSession() + { + using var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + connection.EmitCreateSessionTelemetry(); + + Assert.Equal(new[] { OperationType.CreateSession }, fake.Calls); + Assert.Null(fake.Errors[0]); + Assert.Null(fake.StatementIds[0]); + } + + [Fact] + public void OpenAsync_EmitsCreateSession_WhenTelemetryEnabled() + { + // We can't easily drive the full OpenAsync without a real warehouse, but the + // wiring contract is: after the session id is set, EmitCreateSessionTelemetry() + // must fire CREATE_SESSION through the injected telemetry. The production code + // calls this immediately after _sessionId = response.SessionId. + using var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + // Simulate the OpenAsync post-CreateSession step. + SetFakeSessionId(connection, "fake-session-id"); + connection.EmitCreateSessionTelemetry(activity: null); + + Assert.Contains(OperationType.CreateSession, fake.Calls); + } + + [Fact] + public void EmitDeleteSessionTelemetry_WithoutCreate_DoesNotFire() + { + // Idempotency contract: DELETE_SESSION must not fire if CREATE_SESSION never did. + // Models a connection that failed to open a session. + using var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + connection.EmitDeleteSessionTelemetry(elapsedMs: 10); + + Assert.DoesNotContain(OperationType.DeleteSession, fake.Calls); + } + + [Fact] + public void Dispose_EmitsDeleteSession() + { + // The production Dispose path: with a session id set, it calls DeleteSessionAsync, + // captures its latency + error, and emits DELETE_SESSION before flushing. + // We use the test seam to skip the real RPC: the seam doesn't bypass the emit call, + // so we exercise the emit wiring directly with a fake session id. + var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + // Pretend session was opened. + connection.EmitCreateSessionTelemetry(); + + // Pretend the SEA session id is set so Dispose enters the DeleteSession branch. + // DeleteSessionAsync will throw (real HTTP call), but the production Dispose swallows + // and still emits DELETE_SESSION with the captured exception. + SetFakeSessionId(connection, "fake-sea-session-id"); + + connection.Dispose(); + + Assert.Contains(OperationType.CreateSession, fake.Calls); + Assert.Contains(OperationType.DeleteSession, fake.Calls); + } + + [Fact] + public void EmitDeleteSessionTelemetry_ForwardsLatencyAndError() + { + using var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + // Open first so the idempotency gate lets DELETE_SESSION through. + connection.EmitCreateSessionTelemetry(); + var rpcError = new InvalidOperationException("delete session failed"); + connection.EmitDeleteSessionTelemetry(elapsedMs: 47, error: rpcError); + + int deleteIdx = fake.Calls.IndexOf(OperationType.DeleteSession); + Assert.True(deleteIdx >= 0); + Assert.Equal(47, fake.LatenciesByOp[OperationType.DeleteSession]); + Assert.Same(rpcError, fake.Errors[deleteIdx]); + } + + [Fact] + public void Dispose_CalledTwice_FiresDeleteSessionOnlyOnce() + { + // Repeated Dispose calls (common in `using` + manual Dispose) must not duplicate + // DELETE_SESSION records. + var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + connection.EmitCreateSessionTelemetry(); + SetFakeSessionId(connection, "fake-session-id-dup"); + + connection.Dispose(); + connection.Dispose(); + + int deleteCount = 0; + foreach (var call in fake.Calls) + { + if (call == OperationType.DeleteSession) deleteCount++; + } + Assert.Equal(1, deleteCount); + } + + [Fact] + public void OpenAsync_TelemetryInitThrows_FallsBackToNoOpAndStillOpens() + { + // Goal: prove the InitializeTelemetry → ConnectionTelemetry.Create call chain is + // wrapped in a try/catch that falls back to NoOpConnectionTelemetry when init + // would throw. We exercise the private InitializeTelemetry helper directly via + // reflection — this is the same boundary the production OpenAsync goes through. + using var connection = CreateConnection(); + + // Sanity: starts as NoOp (the field's default). + Assert.IsType(connection.TelemetryForTesting); + + var method = typeof(StatementExecutionConnection).GetMethod( + "InitializeTelemetry", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(method); + + // ConnectionTelemetry.Create internally catches and returns NoOp on any failure, + // so even with bogus properties, InitializeTelemetry should leave _telemetry + // pointing at a valid IConnectionTelemetry (NoOp or real). It MUST NOT throw. + var ex = Record.Exception(() => method!.Invoke(connection, new object?[] { null })); + Assert.Null(ex); + + // After init, telemetry should still be a valid instance (NoOp or real). + Assert.NotNull(connection.TelemetryForTesting); + } + + [Fact] + public void EmitCreateSessionTelemetry_SwallowsExceptions() + { + // Fail-open contract: if the emit call throws, neither Open nor Dispose must surface it. + using var connection = CreateConnection(); + connection.TelemetryForTesting = new ThrowingTelemetry(); + + var ex = Record.Exception(() => connection.EmitCreateSessionTelemetry()); + Assert.Null(ex); + } + + [Fact] + public void EmitDeleteSessionTelemetry_SwallowsExceptions() + { + using var connection = CreateConnection(); + connection.TelemetryForTesting = new ThrowingTelemetry(); + + // Open first so the idempotency gate would otherwise let it through. + // ThrowingTelemetry throws on CREATE_SESSION too — but the helper still swallows. + connection.EmitCreateSessionTelemetry(); + var ex = Record.Exception(() => connection.EmitDeleteSessionTelemetry(elapsedMs: 1)); + Assert.Null(ex); + } + + [Fact] + public void Dispose_FlushHangs_CompletesWithin5Seconds() + { + // The 5-second hard timeout on _telemetry.DisposeAsync().Wait(...) is the only + // thing standing between a wedged exporter and an indefinitely blocked Dispose. + // Use a HangingTelemetry whose DisposeAsync never completes, and verify Dispose + // returns in well under 10 seconds (we allow a generous wall-clock budget to + // tolerate slow CI machines). + var connection = CreateConnection(); + connection.TelemetryForTesting = new HangingTelemetry(); + + // No session id set → DeleteSessionAsync path is skipped. The only thing that + // could hang Dispose now is the telemetry flush, which is exactly what we want + // to time-bound here. + var sw = Stopwatch.StartNew(); + connection.Dispose(); + sw.Stop(); + + // Budget: 5s timeout + headroom for the rest of Dispose (HttpClient disposal etc.). + Assert.True( + sw.Elapsed < TimeSpan.FromSeconds(10), + $"Dispose took {sw.Elapsed.TotalSeconds:F1}s, expected < 10s (5s flush timeout + headroom)."); + } + + [Fact] + public void TelemetrySession_DefaultsToNull_BeforeOpen() + { + // The TelemetrySession accessor exposes the underlying session for the SEA + // statement (next phase) to build observers. Before OpenAsync runs, telemetry + // is NoOp and Session is null — exactly the signal the statement uses to fall + // back to NullObserver. + using var connection = CreateConnection(); + Assert.Null(connection.TelemetrySession); + } + + [Fact] + public void TelemetrySession_ReflectsInjectedTelemetry() + { + // Once telemetry is wired up (real or fake), the accessor returns its session. + // SEA statements rely on this to create per-statement observer contexts. + using var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + Assert.NotNull(connection.TelemetrySession); + Assert.Equal("sea-session-1", connection.TelemetrySession!.SessionId); + } + + // ── Connect-timeout telemetry source (gap D1) ────────────────────────────── + // + // The SEA connection must stamp the telemetry payload's socket_timeout from a + // connection-establishment timeout — NOT from _waitTimeoutSeconds, which is the + // SEA query-wait (CONTINUE) timeout and a semantically different concept. + + private static int GetConnectTimeoutFieldValue(StatementExecutionConnection connection) + { + var field = typeof(StatementExecutionConnection).GetField( + "_connectTimeoutMilliseconds", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(field); + return (int)field!.GetValue(connection)!; + } + + private static int GetWaitTimeoutSecondsFieldValue(StatementExecutionConnection connection) + { + var field = typeof(StatementExecutionConnection).GetField( + "_waitTimeoutSeconds", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(field); + return (int)field!.GetValue(connection)!; + } + + [Fact] + public void ConnectTimeoutMilliseconds_DefaultsTo30Seconds_WhenPropertyAbsent() + { + // No ConnectTimeoutMilliseconds property is set. The default must match the + // Thrift path's HiveServer2Connection.ConnectTimeoutMillisecondsDefault (30000 ms) + // so dashboards filtering on socket_timeout see consistent values across transports. + using var connection = CreateConnection(); + + Assert.Equal(30000, GetConnectTimeoutFieldValue(connection)); + } + + [Fact] + public void ConnectTimeoutMilliseconds_ReadsFromSparkParametersConnectTimeoutMilliseconds() + { + // Source-of-truth check: the SEA connection must read the same connection-string + // property the Thrift path reads (SparkParameters.ConnectTimeoutMilliseconds). + var properties = CreateBaseProperties(); + properties[SparkParameters.ConnectTimeoutMilliseconds] = "45000"; + using var connection = new StatementExecutionConnection(properties); + + Assert.Equal(45000, GetConnectTimeoutFieldValue(connection)); + } + + [Fact] + public void ConnectTimeoutMilliseconds_IsNotDerivedFromWaitTimeoutSeconds() + { + // Regression guard for gap D1: the previous code passed + // (int)TimeSpan.FromSeconds(_waitTimeoutSeconds).TotalMilliseconds + // as connectTimeoutMilliseconds, which silently mislabeled SEA telemetry records + // (10s wait_timeout → 10000ms socket_timeout). The two concepts must be independent. + var properties = CreateBaseProperties(); + properties[DatabricksParameters.WaitTimeout] = "7"; // SEA CONTINUE timeout (seconds) + properties[SparkParameters.ConnectTimeoutMilliseconds] = "55000"; + using var connection = new StatementExecutionConnection(properties); + + int waitTimeoutSeconds = GetWaitTimeoutSecondsFieldValue(connection); + int connectTimeoutMs = GetConnectTimeoutFieldValue(connection); + + Assert.Equal(7, waitTimeoutSeconds); + Assert.Equal(55000, connectTimeoutMs); + // The mislabel bug would produce 7000ms here — assert it doesn't. + Assert.NotEqual( + (int)TimeSpan.FromSeconds(waitTimeoutSeconds).TotalMilliseconds, + connectTimeoutMs); + } + + [Fact] + public void ConnectTimeoutMilliseconds_NotAffectedByEnableDirectResultsFalse() + { + // Direct-results=false flips _waitTimeoutSeconds to 0 in the SEA path. The connect + // timeout (a connection-establishment concept) must remain independent of that. + var properties = CreateBaseProperties(); + properties[DatabricksParameters.EnableDirectResults] = "false"; + properties[SparkParameters.ConnectTimeoutMilliseconds] = "20000"; + using var connection = new StatementExecutionConnection(properties); + + Assert.Equal(0, GetWaitTimeoutSecondsFieldValue(connection)); + Assert.Equal(20000, GetConnectTimeoutFieldValue(connection)); + } + + [Fact] + public void InitializeTelemetry_ForwardsConnectTimeoutToSocketTimeoutField() + { + // End-to-end wiring guard: exercise the real InitializeTelemetry → + // ConnectionTelemetry.Create path with telemetry enabled, then read back the + // resulting session's driver_connection_params.socket_timeout. This proves the + // argument passed to ConnectionTelemetry.Create is _connectTimeoutMilliseconds + // (in ms, divided to seconds inside BuildDriverConnectionParams), NOT + // _waitTimeoutSeconds. Without the fix, the bug pattern produced socket_timeout + // in the 0–10s range (mirroring DatabricksParameters.WaitTimeout, default 10s). + var properties = CreateBaseProperties(); + properties[TelemetryConfiguration.PropertyKeyEnabled] = "true"; + properties[SparkParameters.ConnectTimeoutMilliseconds] = "55000"; + properties[DatabricksParameters.WaitTimeout] = "7"; + + using var connection = new StatementExecutionConnection(properties); + + // Sanity: the two fields are distinct so we can disambiguate which one feeds + // socket_timeout downstream. + Assert.Equal(7, GetWaitTimeoutSecondsFieldValue(connection)); + Assert.Equal(55000, GetConnectTimeoutFieldValue(connection)); + + var method = typeof(StatementExecutionConnection).GetMethod( + "InitializeTelemetry", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(method); + method!.Invoke(connection, new object?[] { null }); + + // ConnectionTelemetry.Create is `Never throws` — if init failed we'd get NoOp + // back with Session == null. Skip if the unit-test environment doesn't permit + // building a real telemetry client; the field-level tests above still cover + // the source-of-truth contract. + var session = connection.TelemetrySession; + Assert.NotNull(session); + Assert.NotNull(session!.DriverConnectionParams); + + // socket_timeout is in seconds (proto field is int64-seconds). 55000ms → 55s. + // Critically: NOT 7 (which would mean the value came from _waitTimeoutSeconds). + Assert.Equal(55, session.DriverConnectionParams!.SocketTimeout); + Assert.NotEqual(GetWaitTimeoutSecondsFieldValue(connection), session.DriverConnectionParams.SocketTimeout); + } + + // ── enable_direct_results telemetry source (gap B10) ─────────────────────── + // + // The SEA connection must stamp the telemetry payload's enable_direct_results from + // the DatabricksParameters.EnableDirectResults user property — NOT from a hardcoded + // literal. The previous code passed `enableDirectResults: true` unconditionally, + // making the field useless on dashboards that filter by user configuration. + + private static bool GetEnableDirectResultsFieldValue(StatementExecutionConnection connection) + { + var field = typeof(StatementExecutionConnection).GetField( + "_enableDirectResults", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(field); + return (bool)field!.GetValue(connection)!; + } + + [Fact] + public void EnableDirectResults_DefaultsToTrue_WhenPropertyAbsent() + { + // No EnableDirectResults property is set. The default must match the Thrift path + // (DatabricksConnection._enableDirectResults defaults to true) so dashboards see + // consistent values across transports for callers that never tuned this flag. + using var connection = CreateConnection(); + + Assert.True(GetEnableDirectResultsFieldValue(connection)); + } + + [Fact] + public void EnableDirectResults_ReadsFalseFromConnectionProperties() + { + // Property dictionary check: the SEA connection must read the same + // DatabricksParameters.EnableDirectResults property the Thrift path reads + // and honor "false" rather than hardcoding true. + var properties = CreateBaseProperties(); + properties[DatabricksParameters.EnableDirectResults] = "false"; + using var connection = new StatementExecutionConnection(properties); + + Assert.False(GetEnableDirectResultsFieldValue(connection)); + } + + [Fact] + public void EnableDirectResults_ReadsTrueFromConnectionProperties() + { + // Explicit "true" must also be read from the property — not from a + // hardcoded default — so the property is the source of truth. + var properties = CreateBaseProperties(); + properties[DatabricksParameters.EnableDirectResults] = "true"; + using var connection = new StatementExecutionConnection(properties); + + Assert.True(GetEnableDirectResultsFieldValue(connection)); + } + + [Fact] + public void InitializeTelemetry_ForwardsEnableDirectResultsToConnectionParams() + { + // End-to-end wiring guard: exercise the real InitializeTelemetry → + // ConnectionTelemetry.Create path with telemetry enabled, then read back the + // resulting session's driver_connection_params.enable_direct_results. This proves + // the argument passed to ConnectionTelemetry.Create is _enableDirectResults, NOT + // the prior hardcoded literal `true`. Without the fix, this assertion fails + // because the field is always true regardless of user configuration. + var properties = CreateBaseProperties(); + properties[TelemetryConfiguration.PropertyKeyEnabled] = "true"; + properties[DatabricksParameters.EnableDirectResults] = "false"; + + using var connection = new StatementExecutionConnection(properties); + + // Sanity: the field reflects the user-supplied "false". + Assert.False(GetEnableDirectResultsFieldValue(connection)); + + var method = typeof(StatementExecutionConnection).GetMethod( + "InitializeTelemetry", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(method); + method!.Invoke(connection, new object?[] { null }); + + // ConnectionTelemetry.Create is `Never throws` — if init failed we'd get NoOp + // back with Session == null. The field-level tests above still cover the + // source-of-truth contract if this end-to-end assertion can't run. + var session = connection.TelemetrySession; + Assert.NotNull(session); + Assert.NotNull(session!.DriverConnectionParams); + + // The bug pattern produced `true` here regardless of user config. + Assert.False(session.DriverConnectionParams!.EnableDirectResults); + } + } +} diff --git a/csharp/test/Unit/Telemetry/ConnectionTelemetryAuthMechTests.cs b/csharp/test/Unit/Telemetry/ConnectionTelemetryAuthMechTests.cs index 10e7af6a..dbb506f4 100644 --- a/csharp/test/Unit/Telemetry/ConnectionTelemetryAuthMechTests.cs +++ b/csharp/test/Unit/Telemetry/ConnectionTelemetryAuthMechTests.cs @@ -19,6 +19,7 @@ using AdbcDrivers.HiveServer2.Spark; using DriverAuthFlowType = AdbcDrivers.Databricks.Telemetry.Proto.DriverAuthFlow.Types.Type; using DriverAuthMechType = AdbcDrivers.Databricks.Telemetry.Proto.DriverAuthMech.Types.Type; +using DriverModeType = AdbcDrivers.Databricks.Telemetry.Proto.DriverMode.Types.Type; using Xunit; namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry @@ -47,7 +48,8 @@ public void AuthMech_Pat_WhenAuthTypeIsTokenAndNoOAuthGrantType() properties[SparkParameters.Token] = "dapi-redacted"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.Equal(DriverAuthMechType.Pat, connParams.AuthMech); Assert.Equal(DriverAuthFlowType.TokenPassthrough, connParams.AuthFlow); @@ -64,7 +66,8 @@ public void AuthMech_Oauth_ClientCredentials_WhenGrantTypeIsClientCredentials() properties[DatabricksParameters.OAuthClientSecret] = "client-secret"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.Equal(DriverAuthMechType.Oauth, connParams.AuthMech); Assert.Equal(DriverAuthFlowType.ClientCredentials, connParams.AuthFlow); @@ -83,7 +86,8 @@ public void AuthMech_Oauth_TokenPassthrough_WhenAuthTypeIsOauthWithNoGrantType() properties[SparkParameters.AccessToken] = "oauth-access-token-redacted"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.Equal(DriverAuthMechType.Oauth, connParams.AuthMech); Assert.Equal(DriverAuthFlowType.TokenPassthrough, connParams.AuthFlow); @@ -103,7 +107,8 @@ public void AuthMech_Oauth_TokenPassthrough_WhenGrantTypeIsAccessToken() properties[SparkParameters.AccessToken] = "oauth-access-token-redacted"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.Equal(DriverAuthMechType.Oauth, connParams.AuthMech); Assert.Equal(DriverAuthFlowType.TokenPassthrough, connParams.AuthFlow); @@ -115,7 +120,8 @@ public void AuthMech_Pat_WhenNoAuthConfigured() var properties = BaseProperties(); var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.Equal(DriverAuthMechType.Pat, connParams.AuthMech); Assert.Equal(DriverAuthFlowType.TokenPassthrough, connParams.AuthFlow); diff --git a/csharp/test/Unit/Telemetry/ConnectionTelemetryCreateSignatureTests.cs b/csharp/test/Unit/Telemetry/ConnectionTelemetryCreateSignatureTests.cs new file mode 100644 index 00000000..ccf5d1a5 --- /dev/null +++ b/csharp/test/Unit/Telemetry/ConnectionTelemetryCreateSignatureTests.cs @@ -0,0 +1,216 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System.Collections.Generic; +using System.Threading.Tasks; +using AdbcDrivers.Databricks.Telemetry; +using AdbcDrivers.HiveServer2.Spark; +using DriverModeType = AdbcDrivers.Databricks.Telemetry.Proto.DriverMode.Types.Type; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry +{ + /// + /// Tests for PECO-3022 (TELEM/SEA T1): + /// is now transport-agnostic — it takes a string sessionId (converted at the + /// caller's boundary) and a DriverMode.Types.Type mode threaded through to + /// driver_connection_params.mode. The two formerly hardcoded + /// DriverMode.Types.Type.Thrift literals in BuildDriverConnectionParams + /// and the fallback in SafeBuildDriverConnectionParams are gone; the mode is + /// always the value supplied by the caller (THRIFT for Thrift, SEA for the upcoming + /// SEA transport). + /// + public class ConnectionTelemetryCreateSignatureTests + { + private const string AssemblyVersion = "1.2.3-test"; + private const int DefaultTimeoutMs = 30_000; + + private static IReadOnlyDictionary TelemetryEnabledProperties() => + new Dictionary + { + { TelemetryConfiguration.PropertyKeyEnabled, "true" }, + { SparkParameters.AuthType, SparkAuthTypeConstants.Token }, + { SparkParameters.Token, "dapi-redacted" }, + { SparkParameters.Path, "/sql/1.0/warehouses/abc123" }, + }; + + // Tests share a TelemetryClient cache keyed by host (TelemetryClientManager + // singleton). Use distinct hosts per test to keep them isolated. + [Fact] + public async Task Create_AcceptsStringSessionId() + { + // Regression: the original signature took TSessionHandle?, forcing the + // (Thrift) caller to leak its transport handle through telemetry. The new + // signature accepts the already-stringified id so the SEA caller can pass + // its server-assigned id without inventing a fake TSessionHandle. + const string Host = "create-string-sid.databricks.com"; + const string SessionId = "9e6a3f88-1234-4321-abcd-deadbeefcafe"; + + IConnectionTelemetry telemetry = ConnectionTelemetry.Create( + properties: TelemetryEnabledProperties(), + host: Host, + assemblyVersion: AssemblyVersion, + oauthTokenProvider: null, + sessionId: SessionId, + mode: DriverModeType.Thrift, + enableDirectResults: true, + useDescTableExtended: false, + connectTimeoutMilliseconds: DefaultTimeoutMs, + activity: null); + + try + { + Assert.NotNull(telemetry.Session); + Assert.Equal(SessionId, telemetry.Session!.SessionId); + } + finally + { + await telemetry.DisposeAsync(); + } + } + + [Fact] + public async Task Create_EmptySessionId_MapsToNullInContext() + { + // ConnectionTelemetry.Create maps `string.Empty` -> `SessionId = null` in the + // resulting TelemetrySessionContext. This matters because Create is called + // from InitializeTelemetry before OpenSession returns a real handle on some + // code paths, and the DatabricksConnection caller passes string.Empty rather + // than null in that window. Pin the mapping so a future refactor that drops + // the `!string.IsNullOrEmpty` guard at ConnectionTelemetry.cs would surface + // here, rather than silently emitting empty-string SessionId to lumberjack. + const string Host = "create-empty-sid.databricks.com"; + + IConnectionTelemetry telemetry = ConnectionTelemetry.Create( + properties: TelemetryEnabledProperties(), + host: Host, + assemblyVersion: AssemblyVersion, + oauthTokenProvider: null, + sessionId: string.Empty, + mode: DriverModeType.Thrift, + enableDirectResults: true, + useDescTableExtended: false, + connectTimeoutMilliseconds: DefaultTimeoutMs, + activity: null); + + try + { + Assert.NotNull(telemetry.Session); + Assert.Null(telemetry.Session!.SessionId); + } + finally + { + await telemetry.DisposeAsync(); + } + } + + [Fact] + public async Task Create_ThriftMode_SetsDriverModeThrift() + { + // Regression for the literal that used to live at ConnectionTelemetry.cs:642 + // — `Mode = DriverMode.Types.Type.Thrift` is now threaded from the caller. + const string Host = "create-thrift-mode.databricks.com"; + + IConnectionTelemetry telemetry = ConnectionTelemetry.Create( + properties: TelemetryEnabledProperties(), + host: Host, + assemblyVersion: AssemblyVersion, + oauthTokenProvider: null, + sessionId: "session-thrift", + mode: DriverModeType.Thrift, + enableDirectResults: true, + useDescTableExtended: false, + connectTimeoutMilliseconds: DefaultTimeoutMs, + activity: null); + + try + { + Assert.NotNull(telemetry.Session); + Assert.NotNull(telemetry.Session!.DriverConnectionParams); + Assert.Equal(DriverModeType.Thrift, telemetry.Session.DriverConnectionParams!.Mode); + } + finally + { + await telemetry.DisposeAsync(); + } + } + + [Fact] + public async Task Create_SeaMode_SetsDriverModeSea() + { + // The reason this refactor exists: the SEA telemetry caller (added in a later + // phase) must produce telemetry rows with `driver_connection_params.mode = SEA`. + const string Host = "create-sea-mode.databricks.com"; + + IConnectionTelemetry telemetry = ConnectionTelemetry.Create( + properties: TelemetryEnabledProperties(), + host: Host, + assemblyVersion: AssemblyVersion, + oauthTokenProvider: null, + sessionId: "session-sea", + mode: DriverModeType.Sea, + enableDirectResults: true, + useDescTableExtended: false, + connectTimeoutMilliseconds: DefaultTimeoutMs, + activity: null); + + try + { + Assert.NotNull(telemetry.Session); + Assert.NotNull(telemetry.Session!.DriverConnectionParams); + Assert.Equal(DriverModeType.Sea, telemetry.Session.DriverConnectionParams!.Mode); + } + finally + { + await telemetry.DisposeAsync(); + } + } + + [Fact] + public void Create_ThrowingHttpClient_ReturnsNoOpConnectionTelemetry() + { + // Create() is declared `Never throws`: any initialization failure — HttpClient + // construction, exporter wire-up, etc. — must surface as NoOpConnectionTelemetry + // rather than propagate into the connection-open path. We exercise this by + // enabling telemetry while passing a blank host so `new Uri("https://")` (inside + // HttpClientFactory.CreateTelemetryHttpClient) and/or + // TelemetryClientManager.GetOrCreateClient's argument-check throw, both of + // which land in Create's outer catch. + // + // ASSUMPTION: this test depends on either HttpClientFactory.CreateTelemetryHttpClient + // or TelemetryClientManager.GetOrCreateClient throwing when host is empty. If a + // future change adds defensive handling of empty host upstream of Create's catch, + // this test would silently pass for the wrong reason (Create would return + // NoOpConnectionTelemetry via the disabled/feature-flag path instead of the + // outer catch). When that happens, swap to a real fault-injection seam (e.g., + // an internal overload that accepts a pre-built HttpClient). + IConnectionTelemetry telemetry = ConnectionTelemetry.Create( + properties: TelemetryEnabledProperties(), + host: string.Empty, + assemblyVersion: AssemblyVersion, + oauthTokenProvider: null, + sessionId: "session-throwing-http", + mode: DriverModeType.Thrift, + enableDirectResults: true, + useDescTableExtended: false, + connectTimeoutMilliseconds: DefaultTimeoutMs, + activity: null); + + Assert.Same(NoOpConnectionTelemetry.Instance, telemetry); + Assert.Null(telemetry.Session); + } + } +} diff --git a/csharp/test/Unit/Telemetry/ConnectionTelemetryDiscoveryFieldsTests.cs b/csharp/test/Unit/Telemetry/ConnectionTelemetryDiscoveryFieldsTests.cs index 27889427..9f3e65b0 100644 --- a/csharp/test/Unit/Telemetry/ConnectionTelemetryDiscoveryFieldsTests.cs +++ b/csharp/test/Unit/Telemetry/ConnectionTelemetryDiscoveryFieldsTests.cs @@ -17,6 +17,7 @@ using System.Collections.Generic; using AdbcDrivers.Databricks.Telemetry; using AdbcDrivers.HiveServer2.Spark; +using DriverModeType = AdbcDrivers.Databricks.Telemetry.Proto.DriverMode.Types.Type; using Xunit; namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry @@ -57,7 +58,8 @@ public void DiscoveryModeEnabled_AlwaysFalse_PatAuth() properties[SparkParameters.Token] = "dapi-redacted"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.True(connParams.HasDiscoveryModeEnabled); Assert.False(connParams.DiscoveryModeEnabled); @@ -74,7 +76,8 @@ public void DiscoveryModeEnabled_AlwaysFalse_OAuthClientCredentials() properties[DatabricksParameters.OAuthClientSecret] = "client-secret"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.True(connParams.HasDiscoveryModeEnabled); Assert.False(connParams.DiscoveryModeEnabled); @@ -89,7 +92,8 @@ public void DiscoveryUrl_LeftUnset_NoDiscoverySupported() DatabricksConstants.OAuthGrantTypes.ClientCredentials; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); // The C# driver hardcodes the OIDC token endpoint and never performs // .well-known discovery, so there is no URL to report. Leaving the @@ -105,7 +109,8 @@ public void EnableTokenCache_AlwaysFalse_PatAuth() properties[SparkParameters.Token] = "dapi-redacted"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.True(connParams.HasEnableTokenCache); Assert.False(connParams.EnableTokenCache); @@ -122,7 +127,8 @@ public void EnableTokenCache_AlwaysFalse_OAuthClientCredentials() properties[DatabricksParameters.OAuthClientSecret] = "client-secret"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.True(connParams.HasEnableTokenCache); Assert.False(connParams.EnableTokenCache); diff --git a/csharp/test/Unit/Telemetry/ConnectionTelemetryDriverNameTests.cs b/csharp/test/Unit/Telemetry/ConnectionTelemetryDriverNameTests.cs new file mode 100644 index 00000000..1e90ac7a --- /dev/null +++ b/csharp/test/Unit/Telemetry/ConnectionTelemetryDriverNameTests.cs @@ -0,0 +1,102 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using AdbcDrivers.Databricks.Telemetry; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry +{ + /// + /// Regression tests for PECO-3022 B1: driver_name string drift between + /// SEA and Thrift transports. + /// + /// Production lumberjack data from v1.1.4 showed two distinct strings coexisting: + /// + /// "Databricks ADBC Driver" — 685 records, all THRIFT mode + /// "ADBC Databricks Driver" — 4,401 records, mixed THRIFT + 69 SEA + /// + /// Dashboards filtering on the older string silently missed all SEA records and a + /// significant fraction of recent Thrift records. + /// + /// The fix is to make the + /// single source of truth, referenced by both + /// and its fallback so + /// that every caller of — Thrift today via + /// DatabricksConnection, SEA via StatementExecutionConnection — emits + /// the same literal. + /// + /// These tests pin the literal value so that a typo or rename in the constant gets + /// caught at unit-test time before it ships to production. + /// + public class ConnectionTelemetryDriverNameTests + { + /// + /// The canonical driver_name literal that must appear in every telemetry record + /// regardless of transport. Picked because it matches the value already returned + /// via AdbcInfoCode.DriverName (see ) + /// and represented the majority of v1.1.4 production records, so dashboards + /// already keyed on this string see the most history. + /// + private const string CanonicalDriverName = "ADBC Databricks Driver"; + + [Fact] + public void CanonicalConstant_HasExpectedLiteralValue() + { + // Pin the literal. If anyone renames the constant, this test fails and the + // change is forced into review rather than silently breaking downstream + // dashboards that filter on the string. + Assert.Equal(CanonicalDriverName, DatabricksConnection.DatabricksDriverName); + } + + [Fact] + public void BuildSystemConfiguration_ReturnsCanonicalDriverName() + { + var config = ConnectionTelemetry.BuildSystemConfiguration("1.2.3"); + + Assert.Equal(CanonicalDriverName, config.DriverName); + } + + [Fact] + public void SafeBuildSystemConfiguration_ReturnsCanonicalDriverName() + { + // SafeBuildSystemConfiguration delegates to BuildSystemConfiguration on the + // happy path and must return the same canonical literal. The catch-block + // fallback in SafeBuildSystemConfiguration is not exercised here because + // there is no in-process fault-injection seam for the static helper; the + // literal-pin in CanonicalConstant_HasExpectedLiteralValue covers that path + // by construction (both branches reference the same constant). + var config = ConnectionTelemetry.SafeBuildSystemConfiguration("1.2.3", activity: null); + + Assert.NotNull(config); + Assert.Equal(CanonicalDriverName, config.DriverName); + } + + [Fact] + public void DriverName_IdenticalAcrossInvocations_SingleSourceOfTruth() + { + // The single-source-of-truth property: every invocation of the system-config + // builder (regardless of transport mode at the caller) returns the same + // driver_name. Since ConnectionTelemetry.BuildSystemConfiguration is the + // protocol-agnostic factory called by both Thrift and SEA transports, this + // guarantees both Thrift and SEA telemetry records carry identical driver_name. + var thriftCaller = ConnectionTelemetry.BuildSystemConfiguration("1.2.3"); + var seaCaller = ConnectionTelemetry.BuildSystemConfiguration("1.2.3"); + + Assert.Equal(thriftCaller.DriverName, seaCaller.DriverName); + Assert.Equal(CanonicalDriverName, thriftCaller.DriverName); + } + } +} diff --git a/csharp/test/Unit/Telemetry/ConnectionTelemetryPartialInitTests.cs b/csharp/test/Unit/Telemetry/ConnectionTelemetryPartialInitTests.cs index 9431aca5..3075088e 100644 --- a/csharp/test/Unit/Telemetry/ConnectionTelemetryPartialInitTests.cs +++ b/csharp/test/Unit/Telemetry/ConnectionTelemetryPartialInitTests.cs @@ -19,6 +19,7 @@ using AdbcDrivers.HiveServer2.Spark; using DriverAuthFlowType = AdbcDrivers.Databricks.Telemetry.Proto.DriverAuthFlow.Types.Type; using DriverAuthMechType = AdbcDrivers.Databricks.Telemetry.Proto.DriverAuthMech.Types.Type; +using DriverModeType = AdbcDrivers.Databricks.Telemetry.Proto.DriverMode.Types.Type; using Xunit; namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry @@ -60,7 +61,8 @@ public void AuthType_AndAuthMech_Consistent_OauthClientCredentials() }; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); string authType = ConnectionTelemetry.DetermineAuthType(properties); Assert.Equal(DriverAuthMechType.Oauth, connParams.AuthMech); @@ -79,7 +81,8 @@ public void AuthType_AndAuthMech_Consistent_OauthAccessTokenWithGrantType() }; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); string authType = ConnectionTelemetry.DetermineAuthType(properties); Assert.Equal(DriverAuthMechType.Oauth, connParams.AuthMech); @@ -101,7 +104,8 @@ public void AuthType_AndAuthMech_Consistent_OauthAccessTokenPassthrough_NoGrantT }; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); string authType = ConnectionTelemetry.DetermineAuthType(properties); Assert.Equal(DriverAuthMechType.Oauth, connParams.AuthMech); @@ -119,7 +123,8 @@ public void AuthType_AndAuthMech_Consistent_PatToken() }; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); string authType = ConnectionTelemetry.DetermineAuthType(properties); Assert.Equal(DriverAuthMechType.Pat, connParams.AuthMech); @@ -133,7 +138,8 @@ public void AuthType_AndAuthMech_Consistent_NoAuth() var properties = new Dictionary(); var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); string authType = ConnectionTelemetry.DetermineAuthType(properties); Assert.Equal(DriverAuthMechType.Pat, connParams.AuthMech); @@ -186,7 +192,8 @@ public void DriverConnectionParams_ConstantFlags_AlwaysPopulated() }; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.True(connParams.EnableArrow); Assert.True(connParams.EnableDirectResults); @@ -214,7 +221,7 @@ public void SafeBuildDriverConnectionParams_ReturnsBestEffort_WithMinimalPropert var properties = new Dictionary(); var connParams = ConnectionTelemetry.SafeBuildDriverConnectionParams( - properties, Host, + properties, Host, DriverModeType.Thrift, enableDirectResults: true, useDescTableExtended: false, connectTimeoutMilliseconds: DefaultTimeoutMs, activity: null); @@ -252,7 +259,7 @@ public void EndToEnd_MinimalProperties_AllAlwaysDerivableFieldsPopulated() var systemConfig = ConnectionTelemetry.SafeBuildSystemConfiguration( "1.2.3", activity: null); var connParams = ConnectionTelemetry.SafeBuildDriverConnectionParams( - properties, Host, + properties, Host, DriverModeType.Thrift, enableDirectResults: true, useDescTableExtended: true, connectTimeoutMilliseconds: DefaultTimeoutMs, activity: null); string authType = ConnectionTelemetry.SafeDetermineAuthType(properties, activity: null); diff --git a/csharp/test/Unit/Telemetry/DatabricksStatementObserverRefactorTests.cs b/csharp/test/Unit/Telemetry/DatabricksStatementObserverRefactorTests.cs new file mode 100644 index 00000000..e86defe4 --- /dev/null +++ b/csharp/test/Unit/Telemetry/DatabricksStatementObserverRefactorTests.cs @@ -0,0 +1,537 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Collections.Generic; +using System.IO; +using System.Linq; +using System.Reflection; +using System.Threading; +using System.Threading.Tasks; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using AdbcDrivers.Databricks.Telemetry; +using AdbcDrivers.Databricks.Telemetry.Models; +using AdbcDrivers.Databricks.Telemetry.Proto; +using AdbcDrivers.HiveServer2.Spark; +using Apache.Arrow.Adbc; +using DriverModeType = AdbcDrivers.Databricks.Telemetry.Proto.DriverMode.Types.Type; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry +{ + /// + /// Verifies the T4 refactor of : the private + /// telemetry helpers (CreateTelemetryContext, CreateMetadataTelemetryContext, + /// RecordSuccess, RecordError, EmitTelemetry) have been replaced + /// with a single field, and + /// now injects either a + /// or . + /// + /// The tests cover three concerns: + /// + /// Structural — the cast ((DatabricksConnection)Connection).TelemetrySession + /// has been removed from DatabricksStatement and the field-based observer + /// is wired correctly through both the public constructor and the test-only + /// constructor seam. + /// Cross-transport — observer hooks fire at the expected lifecycle points and + /// the underlying is still exposed to + /// the operation status poller via the PendingTelemetryContext + /// compatibility property (PECO-2992). + /// Byte-identical — the proto emitted + /// after the refactor matches what the pre-refactor EmitTelemetry helper + /// would have produced for an equivalent execute, and the connection's + /// DriverMode (Thrift here) is preserved on the wire (PECO-3022). + /// + /// + public class DatabricksStatementObserverRefactorTests + { + // ── Test doubles ───────────────────────────────────────────────────────────── + + /// + /// Capturing telemetry client that records every enqueued frontend log so tests + /// can inspect the exact proto fields the observer wrote. + /// + private sealed class CapturingTelemetryClient : ITelemetryClient + { + public List Logs { get; } = new List(); + public int EnqueueCallCount; + + public void Enqueue(TelemetryFrontendLog log) + { + Interlocked.Increment(ref EnqueueCallCount); + lock (Logs) { Logs.Add(log); } + } + + public Task FlushAsync(CancellationToken ct = default) => Task.CompletedTask; + public Task CloseAsync() => Task.CompletedTask; + public ValueTask DisposeAsync() => default; + } + + /// + /// Records every observer method invocation so tests can assert the statement + /// drives the injected observer at the correct hookpoints. + /// + private sealed class RecordingObserver : IStatementOperationObserver + { + public List Calls { get; } = new List(); + public StatementType? StmtType; + public OperationType? OpType; + public bool? IsCompressed; + public string? StatementId; + public ExecutionResultFormat? ResultFormat; + public int? PollCount; + public long? PollLatencyMs; + public long? FirstBatchReadyMs; + public long? ConsumedMs; + public ChunkMetrics? Chunks; + public ExecutionResultFormat? ReaderInspectedFormat; + public bool? ReaderInspectedCompressed; + public Exception? Error; + public int FinalizedCallCount; + + public void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed) + { + Calls.Add(nameof(OnExecuteStarted)); + StmtType = stmtType; + OpType = opType; + IsCompressed = isCompressed; + } + + public void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat) + { + Calls.Add(nameof(OnExecuteSucceeded)); + StatementId = statementId; + ResultFormat = resultFormat; + } + + public void OnPollCompleted(int count, long latencyMs) + { + Calls.Add(nameof(OnPollCompleted)); + PollCount = count; + PollLatencyMs = latencyMs; + } + + public void OnFirstBatchReady(long latencyMs) + { + Calls.Add(nameof(OnFirstBatchReady)); + FirstBatchReadyMs = latencyMs; + } + + public void OnConsumed(long latencyMs) + { + Calls.Add(nameof(OnConsumed)); + ConsumedMs = latencyMs; + } + + public void OnChunksDownloaded(ChunkMetrics metrics) + { + Calls.Add(nameof(OnChunksDownloaded)); + Chunks = metrics; + } + + public void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed) + { + Calls.Add(nameof(OnReaderInspected)); + ReaderInspectedFormat = resultFormat; + ReaderInspectedCompressed = isCompressed; + } + + public void OnError(Exception ex) + { + Calls.Add(nameof(OnError)); + Error = ex; + } + + public void OnFinalized() + { + Calls.Add(nameof(OnFinalized)); + FinalizedCallCount++; + } + } + + // ── Fixtures ───────────────────────────────────────────────────────────────── + + private static DatabricksConnection CreateConnection() + { + Dictionary properties = new Dictionary + { + [SparkParameters.HostName] = "test.databricks.com", + [SparkParameters.Token] = "test-token", + }; + return new DatabricksConnection(properties); + } + + /// + /// Builds a session context with a real + /// stamped with , matching how + /// wires the production + /// session up after a successful OpenSession. + /// + private static TelemetrySessionContext CreateThriftSession(ITelemetryClient client) + { + return new TelemetrySessionContext + { + SessionId = "session-thrift-abc", + WorkspaceId = 4242L, + TelemetryClient = client, + AuthType = "pat", + SystemConfiguration = new DriverSystemConfiguration + { + DriverVersion = "1.0.0", + DriverName = "Apache Arrow ADBC Databricks Driver", + }, + // Mode=Thrift is the value DatabricksConnection.InitializeTelemetry passes + // through to ConnectionTelemetry.Create today. If a future refactor were to + // accidentally swap the constant, the byte-equivalence test below would + // catch it. + DriverConnectionParams = new DriverConnectionParameters + { + HttpPath = "/sql/1.0/warehouses/x", + Mode = DriverModeType.Thrift, + }, + }; + } + + private static IStatementOperationObserver GetObserverField(DatabricksStatement statement) + { + FieldInfo field = typeof(DatabricksStatement).GetField( + "_observer", + BindingFlags.NonPublic | BindingFlags.Instance) + ?? throw new InvalidOperationException("_observer field not found on DatabricksStatement"); + return (IStatementOperationObserver)field.GetValue(statement)!; + } + + // ── Structural assertions ──────────────────────────────────────────────────── + + [Fact] + public void DatabricksStatement_HasObserverField_TypedAsInterface() + { + // The refactor introduces exactly one observer field, typed as the interface + // (not as TelemetryObserver) so SEA can inject its own future implementation. + FieldInfo? field = typeof(DatabricksStatement).GetField( + "_observer", + BindingFlags.NonPublic | BindingFlags.Instance); + + Assert.NotNull(field); + Assert.Equal(typeof(IStatementOperationObserver), field!.FieldType); + Assert.True(field.IsInitOnly, "_observer must be readonly so once injected it cannot drift mid-execute"); + } + + [Fact] + public void DatabricksStatement_SourceDoesNotCastTelemetrySession() + { + // Acceptance criterion: the ((DatabricksConnection)Connection).TelemetrySession + // cast pattern is eliminated from the refactored class. Locating the source + // file relative to the test assembly avoids needing a runtime hookup just to + // assert a textual property. + string? path = LocateDatabricksStatementSource(); + Assert.True(File.Exists(path), $"Could not locate DatabricksStatement.cs (looked at {path ?? "(null)"})"); + + string source = File.ReadAllText(path!); + Assert.DoesNotContain("((DatabricksConnection)Connection).TelemetrySession", source); + } + + private static string? LocateDatabricksStatementSource() + { + // Walk up from the test assembly location until we find csharp/src/DatabricksStatement.cs. + string? dir = Path.GetDirectoryName(typeof(DatabricksStatement).Assembly.Location); + while (!string.IsNullOrEmpty(dir)) + { + string candidate = Path.Combine(dir, "csharp", "src", "DatabricksStatement.cs"); + if (File.Exists(candidate)) return candidate; + candidate = Path.Combine(dir, "src", "DatabricksStatement.cs"); + if (File.Exists(candidate)) return candidate; + DirectoryInfo? parent = Directory.GetParent(dir); + if (parent == null) break; + dir = parent.FullName; + } + return null; + } + + // ── Injection from DatabricksConnection.CreateStatement ────────────────────── + + [Fact] + public void CreateStatement_TelemetryDisabled_InjectsNullObserver() + { + // Without a telemetry session (the default for a freshly-constructed connection + // that has not opened a session), CreateStatement must fall back to the + // singleton NullObserver. This is what keeps disabled-telemetry zero-cost. + using DatabricksConnection connection = CreateConnection(); + + using AdbcStatement statement = connection.CreateStatement(); + DatabricksStatement databricksStatement = Assert.IsType(statement); + + IStatementOperationObserver observer = GetObserverField(databricksStatement); + Assert.Same(NullObserver.Instance, observer); + // Compatibility property must report null so the reader/poller skip telemetry + // branches without having to call into the observer. + Assert.Null(databricksStatement.PendingTelemetryContext); + } + + [Fact] + public void CreateStatement_TelemetryEnabled_InjectsTelemetryObserver() + { + // When the connection has a live telemetry session, CreateStatement constructs + // a per-statement TelemetryObserver bound to that session. PendingTelemetryContext + // therefore returns a non-null context for the poller and reader to mutate. + using DatabricksConnection connection = CreateConnection(); + CapturingTelemetryClient client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateThriftSession(client); + connection.TelemetryForTesting = new TelemetryAdapter(session); + + using AdbcStatement statement = connection.CreateStatement(); + DatabricksStatement databricksStatement = Assert.IsType(statement); + + IStatementOperationObserver observer = GetObserverField(databricksStatement); + TelemetryObserver typed = Assert.IsType(observer); + // The observer's underlying context must be bound to the connection's session + // so subsequent poller/reader mutations and the final BuildTelemetryLog read + // the same SessionId/WorkspaceId the connection negotiated. + FieldInfo? sessionField = typeof(StatementTelemetryContext) + .GetField("_sessionContext", BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(sessionField); + Assert.Same(session, sessionField!.GetValue(typed.Context)); + Assert.NotNull(databricksStatement.PendingTelemetryContext); + Assert.Same(typed.Context, databricksStatement.PendingTelemetryContext); + } + + // ── Observer hookpoint coverage ────────────────────────────────────────────── + + [Fact] + public void Dispose_NoExecute_DoesNotFinalizeObserver() + { + // A statement that was never executed must not trigger OnFinalized() on the + // injected observer (would otherwise enqueue a stray empty execute log when + // telemetry is enabled). This preserves byte-identity with the prior + // PendingTelemetryContext!=null gate inside EmitTelemetry. + using DatabricksConnection connection = CreateConnection(); + RecordingObserver recorder = new RecordingObserver(); + using DatabricksStatement statement = new DatabricksStatement(connection, recorder); + + statement.Dispose(); + + Assert.Empty(recorder.Calls); + } + + [Fact] + public void Dispose_AfterFailedExecute_DoesNotDoubleFinalize() + { + // Simulating the error path: the production code calls FinalizeExecuteTelemetry + // inside the catch block and then Dispose calls it again. _executeFinalized + // must gate so OnFinalized fires exactly once (the observer's own idempotency + // is a defence-in-depth backstop). + using DatabricksConnection connection = CreateConnection(); + RecordingObserver recorder = new RecordingObserver(); + using DatabricksStatement statement = new DatabricksStatement(connection, recorder); + + // Drive the observer through the same sequence the error path would: begin → + // error → finalize. We call the private helpers directly so the test does not + // require a Thrift server. + InvokePrivate(statement, "BeginExecuteTelemetry", StatementType.Query, OperationType.ExecuteStatement); + InvokePrivate(statement, "RecordExecuteError", new InvalidOperationException("boom")); + InvokePrivate(statement, "FinalizeExecuteTelemetry"); + + // Dispose triggers a second FinalizeExecuteTelemetry — must be a no-op. + statement.Dispose(); + + int finalizeCalls = recorder.Calls.Count(c => c == nameof(IStatementOperationObserver.OnFinalized)); + Assert.Equal(1, finalizeCalls); + // Sanity: the error path also recorded OnError exactly once. + Assert.Equal(1, recorder.Calls.Count(c => c == nameof(IStatementOperationObserver.OnError))); + } + + [Fact] + public void BeginExecuteTelemetry_FiresOnExecuteStartedAndPopulatesDefaults() + { + // Verifies the helper that replaces CreateTelemetryContext / CreateMetadataTelemetryContext: + // it must signal the observer with the right statement/operation type and set the + // legacy defaults (InlineArrow ResultFormat, IsInternalCall propagation). + using DatabricksConnection connection = CreateConnection(); + CapturingTelemetryClient client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateThriftSession(client); + TelemetryObserver observer = new TelemetryObserver(session); + using DatabricksStatement statement = new DatabricksStatement(connection, observer); + statement.IsInternalCall = true; + + InvokePrivate(statement, "BeginExecuteTelemetry", StatementType.Metadata, OperationType.ListCatalogs); + + Assert.Equal(StatementType.Metadata, observer.Context.StatementType); + Assert.Equal(OperationType.ListCatalogs, observer.Context.OperationType); + Assert.False(observer.Context.IsCompressed, "Placeholder isCompressed=false until reader inspection at finalize time"); + Assert.Equal(ExecutionResultFormat.InlineArrow, observer.Context.ResultFormat); + Assert.True(observer.Context.IsInternalCall); + } + + [Fact] + public void FinalizeExecuteTelemetry_NoExecute_StillEmitsViaConfiguredObserver() + { + // Once BeginExecuteTelemetry has fired, FinalizeExecuteTelemetry must close the + // loop with OnConsumed + OnFinalized even if no reader was ever materialized + // (defensive path: server returned no results, or caller aborted between + // begin and base.ExecuteQuery). + using DatabricksConnection connection = CreateConnection(); + RecordingObserver recorder = new RecordingObserver(); + using DatabricksStatement statement = new DatabricksStatement(connection, recorder); + + InvokePrivate(statement, "BeginExecuteTelemetry", StatementType.Query, OperationType.ExecuteStatement); + InvokePrivate(statement, "FinalizeExecuteTelemetry"); + + Assert.Contains(nameof(IStatementOperationObserver.OnExecuteStarted), recorder.Calls); + Assert.Contains(nameof(IStatementOperationObserver.OnConsumed), recorder.Calls); + Assert.Contains(nameof(IStatementOperationObserver.OnFinalized), recorder.Calls); + // No chunk metrics path because no reader was attached. + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnChunksDownloaded), recorder.Calls); + } + + // ── Cross-transport regression: byte-equivalent telemetry log ───────────────── + + [Fact] + public void Thrift_Telemetry_StillEmitsAfterRefactor() + { + // Byte-equivalence regression: an end-to-end execute through the refactored + // observer must produce an OssSqlDriverTelemetryLog whose top-level fields + // match the pre-refactor EmitTelemetry output for the same inputs. We compare + // each field explicitly rather than serializing because the timestamp envelope + // is non-deterministic. + using DatabricksConnection connection = CreateConnection(); + CapturingTelemetryClient client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateThriftSession(client); + TelemetryObserver observer = new TelemetryObserver(session); + using DatabricksStatement statement = new DatabricksStatement(connection, observer); + statement.StatementId = "known-statement-id-1"; + + // Simulate the production execute flow that previously went through + // CreateTelemetryContext → RecordSuccess → EmitTelemetry. + InvokePrivate(statement, "BeginExecuteTelemetry", StatementType.Query, OperationType.ExecuteStatement); + InvokePrivate(statement, "RecordExecuteSuccess"); + InvokePrivate(statement, "FinalizeExecuteTelemetry"); + + // Exactly one log enqueued (terminal call is idempotent and no double-emission). + Assert.Equal(1, client.EnqueueCallCount); + TelemetryFrontendLog frontendLog = client.Logs[0]; + OssSqlDriverTelemetryLog log = frontendLog.Entry!.SqlDriverLog!; + + // Session/statement identifiers and workspace envelope preserved. + Assert.Equal("session-thrift-abc", log.SessionId); + Assert.Equal("known-statement-id-1", log.SqlStatementId); + Assert.Equal(4242L, frontendLog.WorkspaceId); + Assert.False(string.IsNullOrEmpty(frontendLog.FrontendLogEventId)); + + // SqlOperation envelope matches what RecordSuccess + EmitTelemetry produced. + Assert.Equal(StatementType.Query, log.SqlOperation.StatementType); + Assert.Equal(OperationType.ExecuteStatement, log.SqlOperation.OperationDetail.OperationType); + Assert.Equal(ExecutionResultFormat.InlineArrow, log.SqlOperation.ExecutionResult); + Assert.False(log.SqlOperation.IsCompressed); + + // ResultLatency block populated (FirstBatchReadyMs from OnFirstBatchReady, + // ResultsConsumedMs from OnConsumed). Pre-refactor EmitTelemetry called + // ctx.RecordResultsConsumed() right before BuildTelemetryLog, so consumption + // must be non-zero too. + Assert.NotNull(log.SqlOperation.ResultLatency); + Assert.True(log.SqlOperation.ResultLatency.ResultSetReadyLatencyMillis >= 0); + Assert.True(log.SqlOperation.ResultLatency.ResultSetConsumptionLatencyMillis >= 0); + + // No error info on the success path. + Assert.Null(log.ErrorInfo); + } + + [Fact] + public void Thrift_DriverMode_StillReportedAsThrift() + { + // Verifies the DriverMode passthrough: when DatabricksConnection initializes + // its telemetry session with Mode=Thrift, the per-statement log carries that + // mode all the way out to the wire. A future refactor that accidentally + // swapped DriverMode.Thrift for Sea in InitializeTelemetry would fail here. + using DatabricksConnection connection = CreateConnection(); + CapturingTelemetryClient client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateThriftSession(client); + TelemetryObserver observer = new TelemetryObserver(session); + using DatabricksStatement statement = new DatabricksStatement(connection, observer); + statement.StatementId = "thrift-mode-statement"; + + InvokePrivate(statement, "BeginExecuteTelemetry", StatementType.Query, OperationType.ExecuteStatement); + InvokePrivate(statement, "RecordExecuteSuccess"); + InvokePrivate(statement, "FinalizeExecuteTelemetry"); + + Assert.Equal(1, client.EnqueueCallCount); + OssSqlDriverTelemetryLog log = client.Logs[0].Entry!.SqlDriverLog!; + Assert.NotNull(log.DriverConnectionParams); + Assert.Equal(DriverModeType.Thrift, log.DriverConnectionParams.Mode); + } + + [Fact] + public void ErrorPath_FinalizesObserverWithErrorInfo() + { + // The error path inside Execute* drives the observer with OnError + OnFinalized + // before re-throwing, so the emitted log carries error_info.error_name and the + // user sees a single log per failed execute (not a missing event, not two). + using DatabricksConnection connection = CreateConnection(); + CapturingTelemetryClient client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateThriftSession(client); + TelemetryObserver observer = new TelemetryObserver(session); + using DatabricksStatement statement = new DatabricksStatement(connection, observer); + statement.StatementId = "errored-statement"; + + InvokePrivate(statement, "BeginExecuteTelemetry", StatementType.Query, OperationType.ExecuteStatement); + InvokePrivate(statement, "RecordExecuteError", new InvalidOperationException("simulated failure")); + InvokePrivate(statement, "FinalizeExecuteTelemetry"); + + Assert.Equal(1, client.EnqueueCallCount); + OssSqlDriverTelemetryLog log = client.Logs[0].Entry!.SqlDriverLog!; + Assert.NotNull(log.ErrorInfo); + Assert.Equal("InvalidOperationException", log.ErrorInfo.ErrorName); + } + + // ── Helpers ────────────────────────────────────────────────────────────────── + + /// + /// Invokes a private instance method via reflection. We exercise the helpers + /// directly because driving them through a real ExecuteQuery requires a Thrift + /// server and a server-shaped reader pipeline — both out of scope for unit tests. + /// The helpers themselves are the entire refactor surface; reaching them via + /// reflection is the same approach uses + /// for the confOverlay private field. + /// + private static void InvokePrivate(DatabricksStatement statement, string name, params object[] args) + { + MethodInfo method = typeof(DatabricksStatement).GetMethod( + name, + BindingFlags.NonPublic | BindingFlags.Instance) + ?? throw new InvalidOperationException($"Method {name} not found on DatabricksStatement"); + method.Invoke(statement, args); + } + + /// + /// Minimal adapter that just exposes a given + /// session through the property so + /// connection.TelemetrySession returns the test session in production code. + /// All other methods are no-ops; the wiring we care about for these tests is the + /// observer injection inside CreateStatement, which only reads + /// connection.TelemetrySession. + /// + private sealed class TelemetryAdapter : IConnectionTelemetry + { + public TelemetryAdapter(TelemetrySessionContext session) { Session = session; } + public TelemetrySessionContext? Session { get; } + public T ExecuteWithMetadataTelemetry(OperationType operationType, Func operation, System.Diagnostics.Activity? activity) => operation(); + public void EmitOperationTelemetry(OperationType operationType, StatementType statementType, string? statementId, long elapsedMs, Exception? error) { } + public Task DisposeAsync() => Task.CompletedTask; + } + } +} diff --git a/csharp/test/Unit/Telemetry/NullObserverTests.cs b/csharp/test/Unit/Telemetry/NullObserverTests.cs new file mode 100644 index 00000000..b8e107f5 --- /dev/null +++ b/csharp/test/Unit/Telemetry/NullObserverTests.cs @@ -0,0 +1,76 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using AdbcDrivers.Databricks.Telemetry; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry +{ + /// + /// Tests for — verifies it satisfies the + /// fail-open / no-op / singleton contract. + /// + public class NullObserverTests + { + [Fact] + public void NullObserver_AllMethods_AreNoOps() + { + // Arrange + IStatementOperationObserver observer = NullObserver.Instance; + + // Act + Assert: every method must complete without throwing and without + // observable side effects. We exercise the full surface twice to also + // confirm idempotency of OnFinalized. + for (int i = 0; i < 2; i++) + { + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: true); + observer.OnExecuteSucceeded("stmt-id-123", ExecutionResultFormat.InlineArrow); + observer.OnPollCompleted(count: 3, latencyMs: 42); + observer.OnFirstBatchReady(latencyMs: 100); + observer.OnConsumed(latencyMs: 200); + observer.OnChunksDownloaded(new ChunkMetrics()); + observer.OnReaderInspected(ExecutionResultFormat.ExternalLinks, isCompressed: true); + observer.OnError(new InvalidOperationException("boom")); + observer.OnFinalized(); + } + + // No state to inspect — passing the calls is the assertion. + } + + [Fact] + public void NullObserver_IsSingleton() + { + // Arrange + Act + NullObserver first = NullObserver.Instance; + NullObserver second = NullObserver.Instance; + + // Assert: same reference, and the only way to obtain an instance. + Assert.NotNull(first); + Assert.Same(first, second); + + // There must be no public constructor: callers should be forced to + // use the singleton accessor. + System.Reflection.ConstructorInfo[] publicCtors = typeof(NullObserver) + .GetConstructors(System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Instance); + Assert.Empty(publicCtors); + } + } +} diff --git a/csharp/test/Unit/Telemetry/SafeObserverTests.cs b/csharp/test/Unit/Telemetry/SafeObserverTests.cs new file mode 100644 index 00000000..f7d36c2e --- /dev/null +++ b/csharp/test/Unit/Telemetry/SafeObserverTests.cs @@ -0,0 +1,344 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Collections.Generic; +using System.Diagnostics; +using System.Linq; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using AdbcDrivers.Databricks.Telemetry; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry +{ + /// + /// Tests for — the optional decorator that wraps any + /// inner with per-method try/catch so a + /// third-party observer that violates the fail-open contract cannot leak its + /// exception to the caller. + /// + public class SafeObserverTests + { + // ── Test doubles ───────────────────────────────────────────────────────────── + + /// + /// Records each method invocation so propagation tests can assert that calls + /// reach the inner observer with the exact arguments supplied by the caller. + /// + private sealed class RecordingObserver : IStatementOperationObserver + { + public List Calls { get; } = new List(); + + public StatementType? StmtType; + public OperationType? OpType; + public bool? IsCompressed; + public string? StatementId; + public ExecutionResultFormat? ResultFormat; + public int? PollCount; + public long? PollLatencyMs; + public long? FirstBatchReadyMs; + public long? ConsumedMs; + public ChunkMetrics? ChunkMetrics; + public ExecutionResultFormat? ReaderInspectedFormat; + public bool? ReaderInspectedCompressed; + public Exception? Error; + public int FinalizedCallCount; + + public void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed) + { + Calls.Add(nameof(OnExecuteStarted)); + StmtType = stmtType; + OpType = opType; + IsCompressed = isCompressed; + } + + public void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat) + { + Calls.Add(nameof(OnExecuteSucceeded)); + StatementId = statementId; + ResultFormat = resultFormat; + } + + public void OnPollCompleted(int count, long latencyMs) + { + Calls.Add(nameof(OnPollCompleted)); + PollCount = count; + PollLatencyMs = latencyMs; + } + + public void OnFirstBatchReady(long latencyMs) + { + Calls.Add(nameof(OnFirstBatchReady)); + FirstBatchReadyMs = latencyMs; + } + + public void OnConsumed(long latencyMs) + { + Calls.Add(nameof(OnConsumed)); + ConsumedMs = latencyMs; + } + + public void OnChunksDownloaded(ChunkMetrics metrics) + { + Calls.Add(nameof(OnChunksDownloaded)); + ChunkMetrics = metrics; + } + + public void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed) + { + Calls.Add(nameof(OnReaderInspected)); + ReaderInspectedFormat = resultFormat; + ReaderInspectedCompressed = isCompressed; + } + + public void OnError(Exception ex) + { + Calls.Add(nameof(OnError)); + Error = ex; + } + + public void OnFinalized() + { + Calls.Add(nameof(OnFinalized)); + FinalizedCallCount++; + } + } + + /// + /// Simulates a misbehaving third-party observer: every method on + /// throws. SafeObserver must absorb + /// each exception so the caller is unaffected. + /// + private sealed class ThrowingObserver : IStatementOperationObserver + { + public int CallCount; + + public void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed) + { + CallCount++; + throw new InvalidOperationException("OnExecuteStarted boom"); + } + + public void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat) + { + CallCount++; + throw new InvalidOperationException("OnExecuteSucceeded boom"); + } + + public void OnPollCompleted(int count, long latencyMs) + { + CallCount++; + throw new InvalidOperationException("OnPollCompleted boom"); + } + + public void OnFirstBatchReady(long latencyMs) + { + CallCount++; + throw new InvalidOperationException("OnFirstBatchReady boom"); + } + + public void OnConsumed(long latencyMs) + { + CallCount++; + throw new InvalidOperationException("OnConsumed boom"); + } + + public void OnChunksDownloaded(ChunkMetrics metrics) + { + CallCount++; + throw new InvalidOperationException("OnChunksDownloaded boom"); + } + + public void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed) + { + CallCount++; + throw new InvalidOperationException("OnReaderInspected boom"); + } + + public void OnError(Exception ex) + { + CallCount++; + throw new InvalidOperationException("OnError boom"); + } + + public void OnFinalized() + { + CallCount++; + throw new InvalidOperationException("OnFinalized boom"); + } + } + + // ── Required tests ─────────────────────────────────────────────────────────── + + [Fact] + public void SafeObserver_PropagatesNormalCallsToInner() + { + // Arrange + RecordingObserver inner = new RecordingObserver(); + SafeObserver safe = new SafeObserver(inner); + ChunkMetrics metrics = new ChunkMetrics + { + TotalChunksPresent = 4, + TotalChunksIterated = 4, + InitialChunkLatencyMs = 11, + SlowestChunkLatencyMs = 99, + SumChunksDownloadTimeMs = 137, + }; + InvalidOperationException error = new InvalidOperationException("query failed"); + + // Act: exercise the full surface in a realistic lifecycle order. + safe.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: true); + safe.OnExecuteSucceeded("stmt-77", ExecutionResultFormat.ExternalLinks); + safe.OnPollCompleted(count: 5, latencyMs: 87); + safe.OnFirstBatchReady(latencyMs: 120); + safe.OnChunksDownloaded(metrics); + safe.OnReaderInspected(ExecutionResultFormat.ExternalLinks, isCompressed: true); + safe.OnConsumed(latencyMs: 350); + safe.OnError(error); + safe.OnFinalized(); + + // Assert: every call reached the inner observer, in order, with original args. + Assert.Equal( + new[] + { + nameof(IStatementOperationObserver.OnExecuteStarted), + nameof(IStatementOperationObserver.OnExecuteSucceeded), + nameof(IStatementOperationObserver.OnPollCompleted), + nameof(IStatementOperationObserver.OnFirstBatchReady), + nameof(IStatementOperationObserver.OnChunksDownloaded), + nameof(IStatementOperationObserver.OnReaderInspected), + nameof(IStatementOperationObserver.OnConsumed), + nameof(IStatementOperationObserver.OnError), + nameof(IStatementOperationObserver.OnFinalized), + }, + inner.Calls); + + Assert.Equal(StatementType.Query, inner.StmtType); + Assert.Equal(OperationType.ExecuteStatement, inner.OpType); + Assert.True(inner.IsCompressed); + Assert.Equal("stmt-77", inner.StatementId); + Assert.Equal(ExecutionResultFormat.ExternalLinks, inner.ResultFormat); + Assert.Equal(5, inner.PollCount); + Assert.Equal(87, inner.PollLatencyMs); + Assert.Equal(120, inner.FirstBatchReadyMs); + Assert.Equal(350, inner.ConsumedMs); + Assert.Same(metrics, inner.ChunkMetrics); + Assert.Equal(ExecutionResultFormat.ExternalLinks, inner.ReaderInspectedFormat); + Assert.True(inner.ReaderInspectedCompressed); + Assert.Same(error, inner.Error); + Assert.Equal(1, inner.FinalizedCallCount); + + // Also confirm the decorator exposes the wrapped instance. + Assert.Same(inner, safe.Inner); + } + + [Fact] + public void SafeObserver_SwallowsExceptionsFromInner_LogsAtTrace() + { + // Arrange: subscribe an ActivityListener so the trace-level activity event + // emitted on exception suppression is observable. We must have an ambient + // Activity for the AddEvent call to take effect. + using ActivitySource source = new ActivitySource("SafeObserverTests"); + List capturedEvents = new List(); + using ActivityListener listener = new ActivityListener + { + ShouldListenTo = s => s.Name == "SafeObserverTests", + Sample = (ref ActivityCreationOptions _) => ActivitySamplingResult.AllData, + ActivityStopped = activity => + { + foreach (ActivityEvent ev in activity.Events) + { + capturedEvents.Add(ev); + } + }, + }; + ActivitySource.AddActivityListener(listener); + + ThrowingObserver throwing = new ThrowingObserver(); + SafeObserver safe = new SafeObserver(throwing); + + // Act + Assert: exercise the entire surface; nothing must escape SafeObserver. + using (Activity? activity = source.StartActivity("safe-observer-suppression-test")) + { + Assert.NotNull(activity); // sanity: listener is wired so an activity exists + + Exception? captured = Record.Exception(() => + { + safe.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: false); + safe.OnExecuteSucceeded("s1", ExecutionResultFormat.InlineArrow); + safe.OnPollCompleted(count: 1, latencyMs: 1); + safe.OnFirstBatchReady(latencyMs: 1); + safe.OnConsumed(latencyMs: 1); + safe.OnChunksDownloaded(new ChunkMetrics()); + safe.OnReaderInspected(ExecutionResultFormat.ExternalLinks, isCompressed: true); + safe.OnError(new InvalidOperationException("propagated error")); + safe.OnFinalized(); + }); + Assert.Null(captured); + + // Inner observer was invoked exactly once per method (nine methods). + Assert.Equal(9, throwing.CallCount); + } + + // Activity has now stopped; ActivityStopped callback populates capturedEvents. + // SafeObserver emits one suppression event per swallowed exception. + List suppressions = capturedEvents + .Where(e => e.Name == "telemetry.observer.suppressed") + .ToList(); + Assert.Equal(9, suppressions.Count); + + // Each suppression event must carry diagnostic tags identifying SafeObserver + // as the source and including the inner exception's type and message — + // this is what makes the suppression visible at trace level. + foreach (ActivityEvent ev in suppressions) + { + Dictionary tags = ev.Tags.ToDictionary(kv => kv.Key, kv => kv.Value); + Assert.Equal(nameof(InvalidOperationException), tags["error.type"]); + Assert.Equal("SafeObserver", tags["observer.suppressed.source"]); + Assert.Contains("boom", (string)tags["error.message"]!); + } + } + + // ── Additional coverage ────────────────────────────────────────────────────── + + [Fact] + public void SafeObserver_Constructor_RejectsNullInner() + { + Assert.Throws(() => new SafeObserver(null!)); + } + + [Fact] + public void SafeObserver_SwallowsException_WithoutAmbientActivity() + { + // Arrange: no ActivityListener subscribed → Activity.Current is null. The + // suppression path must still succeed without throwing. + ThrowingObserver throwing = new ThrowingObserver(); + SafeObserver safe = new SafeObserver(throwing); + + // Act + Assert + Exception? captured = Record.Exception(() => + { + safe.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: false); + safe.OnFinalized(); + }); + Assert.Null(captured); + Assert.Equal(2, throwing.CallCount); + } + } +} diff --git a/csharp/test/Unit/Telemetry/TelemetryObserverTests.cs b/csharp/test/Unit/Telemetry/TelemetryObserverTests.cs new file mode 100644 index 00000000..d14d0b6f --- /dev/null +++ b/csharp/test/Unit/Telemetry/TelemetryObserverTests.cs @@ -0,0 +1,411 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Collections.Generic; +using System.Threading; +using System.Threading.Tasks; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using AdbcDrivers.Databricks.Telemetry; +using AdbcDrivers.Databricks.Telemetry.Models; +using AdbcDrivers.Databricks.Telemetry.Proto; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry +{ + /// + /// Unit tests for verifying: + /// + /// Observer method calls propagate into the underlying . + /// OnFinalized enqueues exactly one . + /// The terminal call is idempotent under both serial and concurrent invocation. + /// All methods satisfy the fail-open contract even when the telemetry client throws. + /// + /// + public class TelemetryObserverTests + { + // ── Test doubles ───────────────────────────────────────────────────────────── + + /// + /// Records every enqueued log so tests can inspect the exact proto fields the + /// observer attempted to emit, and counts enqueues to assert exactly-once semantics. + /// + private sealed class CapturingTelemetryClient : ITelemetryClient + { + public List Logs { get; } = new List(); + + public int EnqueueCallCount; + + public void Enqueue(TelemetryFrontendLog log) + { + Interlocked.Increment(ref EnqueueCallCount); + lock (Logs) + { + Logs.Add(log); + } + } + + public Task FlushAsync(CancellationToken ct = default) => Task.CompletedTask; + + public Task CloseAsync() => Task.CompletedTask; + + public ValueTask DisposeAsync() => default; + } + + /// + /// Simulates a corrupted / faulty telemetry client whose Enqueue path raises an + /// exception. Used to assert the observer's fail-open guarantee. + /// + private sealed class ThrowingTelemetryClient : ITelemetryClient + { + public int EnqueueCallCount; + + public void Enqueue(TelemetryFrontendLog log) + { + Interlocked.Increment(ref EnqueueCallCount); + throw new InvalidOperationException("simulated telemetry client failure"); + } + + public Task FlushAsync(CancellationToken ct = default) => Task.CompletedTask; + + public Task CloseAsync() => Task.CompletedTask; + + public ValueTask DisposeAsync() => default; + } + + // ── Fixtures ───────────────────────────────────────────────────────────────── + + private static TelemetrySessionContext CreateSession(ITelemetryClient client) + { + return new TelemetrySessionContext + { + SessionId = "session-abc", + WorkspaceId = 4242L, + TelemetryClient = client, + AuthType = "pat", + SystemConfiguration = new DriverSystemConfiguration { DriverVersion = "1.0.0" }, + DriverConnectionParams = new DriverConnectionParameters { HttpPath = "/sql/1.0/wh/x" }, + }; + } + + private static (TelemetryObserver observer, CapturingTelemetryClient client) CreateObserver() + { + CapturingTelemetryClient client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateSession(client); + TelemetryObserver observer = new TelemetryObserver(session); + return (observer, client); + } + + // ── Required tests (per task description) ──────────────────────────────────── + + [Fact] + public void TelemetryObserver_OnExecuteStarted_PopulatesContext() + { + // Arrange + (TelemetryObserver observer, _) = CreateObserver(); + + // Act + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: true); + + // Assert: scalar fields land on the underlying context. + Assert.Equal(StatementType.Query, observer.Context.StatementType); + Assert.Equal(OperationType.ExecuteStatement, observer.Context.OperationType); + Assert.True(observer.Context.IsCompressed); + } + + [Fact] + public void TelemetryObserver_OnExecuteSucceeded_RecordsStatementId() + { + // Arrange + (TelemetryObserver observer, _) = CreateObserver(); + + // Act + observer.OnExecuteSucceeded("stmt-id-42", ExecutionResultFormat.ExternalLinks); + + // Assert + Assert.Equal("stmt-id-42", observer.Context.StatementId); + Assert.Equal(ExecutionResultFormat.ExternalLinks, observer.Context.ResultFormat); + } + + [Fact] + public void TelemetryObserver_OnFinalized_EnqueuesExactlyOnce() + { + // Arrange + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: false); + observer.OnExecuteSucceeded("stmt-1", ExecutionResultFormat.InlineArrow); + + // Act + observer.OnFinalized(); + + // Assert + Assert.Equal(1, client.EnqueueCallCount); + Assert.Single(client.Logs); + + // The emitted log must reflect the recorded context. + OssSqlDriverTelemetryLog log = client.Logs[0].Entry!.SqlDriverLog!; + Assert.Equal("session-abc", log.SessionId); + Assert.Equal("stmt-1", log.SqlStatementId); + Assert.Equal(StatementType.Query, log.SqlOperation.StatementType); + Assert.Equal(OperationType.ExecuteStatement, log.SqlOperation.OperationDetail.OperationType); + Assert.Null(log.ErrorInfo); + + // Workspace id and frontend envelope must be populated. + Assert.Equal(4242L, client.Logs[0].WorkspaceId); + Assert.False(string.IsNullOrEmpty(client.Logs[0].FrontendLogEventId)); + Assert.NotNull(client.Logs[0].Context); + Assert.True(client.Logs[0].Context!.TimestampMillis > 0); + } + + [Fact] + public void TelemetryObserver_OnFinalized_CalledTwice_EnqueuesOnce() + { + // Arrange + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + + // Act: invoke OnFinalized twice in serial (mirrors error + dispose paths). + observer.OnFinalized(); + observer.OnFinalized(); + + // Assert: exactly one enqueue. + Assert.Equal(1, client.EnqueueCallCount); + Assert.True(observer.HasEmitted); + } + + [Fact] + public async Task TelemetryObserver_OnFinalized_ConcurrentCalls_EnqueueOnce() + { + // Arrange + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + + // Act: race many threads to OnFinalized; only one must win. + const int parallelism = 32; + using ManualResetEventSlim start = new ManualResetEventSlim(); + Task[] tasks = new Task[parallelism]; + for (int i = 0; i < parallelism; i++) + { + tasks[i] = Task.Run(() => + { + start.Wait(); + observer.OnFinalized(); + }); + } + start.Set(); + await Task.WhenAll(tasks); + + // Assert + Assert.Equal(1, client.EnqueueCallCount); + } + + [Fact] + public void TelemetryObserver_OnError_RecordsErrorAndFinalizes() + { + // Arrange + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: false); + InvalidOperationException error = new InvalidOperationException("simulated query failure"); + + // Act + observer.OnError(error); + observer.OnFinalized(); + observer.OnFinalized(); // idempotent: should not double-emit even with error path + + // Assert: exactly one log, error_info populated. + Assert.Equal(1, client.EnqueueCallCount); + OssSqlDriverTelemetryLog log = client.Logs[0].Entry!.SqlDriverLog!; + Assert.NotNull(log.ErrorInfo); + Assert.Equal("InvalidOperationException", log.ErrorInfo.ErrorName); + Assert.True(observer.Context.HasError); + Assert.Equal("simulated query failure", observer.Context.ErrorMessage); + } + + [Fact] + public void TelemetryObserver_AllMethods_NeverThrow_WhenContextCorrupted() + { + // Arrange: build an observer whose telemetry client throws on every Enqueue + // (this simulates a corrupted downstream dependency). The observer must + // continue to absorb all calls without re-throwing. + ThrowingTelemetryClient throwing = new ThrowingTelemetryClient(); + TelemetrySessionContext session = CreateSession(throwing); + TelemetryObserver observer = new TelemetryObserver(session); + + // Act + Assert: exercise the entire surface, including pathological inputs + // (null statementId, null exception, null ChunkMetrics). + Exception? captured = Record.Exception(() => + { + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: true); + observer.OnExecuteSucceeded(null!, ExecutionResultFormat.Unspecified); + observer.OnPollCompleted(count: 0, latencyMs: 0); + observer.OnFirstBatchReady(latencyMs: -1); + observer.OnConsumed(latencyMs: -1); + observer.OnChunksDownloaded(null!); + observer.OnReaderInspected(ExecutionResultFormat.Unspecified, isCompressed: false); + observer.OnError(null!); + observer.OnError(new InvalidOperationException("boom")); + observer.OnFinalized(); + observer.OnFinalized(); + }); + + Assert.Null(captured); + + // The throwing client must have been invoked exactly once (idempotent finalize) + // and the observer must have swallowed its exception. + Assert.Equal(1, throwing.EnqueueCallCount); + Assert.True(observer.HasEmitted); + } + + [Fact] + public void TelemetryObserver_OnChunksDownloaded_MergesIntoChunkDetails() + { + // Arrange + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + ChunkMetrics metrics = new ChunkMetrics + { + TotalChunksPresent = 12, + TotalChunksIterated = 11, + InitialChunkLatencyMs = 75, + SlowestChunkLatencyMs = 220, + SumChunksDownloadTimeMs = 1430, + }; + + // Act + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: false); + observer.OnChunksDownloaded(metrics); + observer.OnFinalized(); + + // Assert: context absorbed the metrics. + Assert.Equal(12, observer.Context.TotalChunksPresent); + Assert.Equal(11, observer.Context.TotalChunksIterated); + Assert.Equal(75, observer.Context.InitialChunkLatencyMs); + Assert.Equal(220, observer.Context.SlowestChunkLatencyMs); + Assert.Equal(1430, observer.Context.SumChunksDownloadTimeMs); + + // The chunk_details proto block on the emitted log mirrors the input. + ChunkDetails details = client.Logs[0].Entry!.SqlDriverLog!.SqlOperation.ChunkDetails; + Assert.NotNull(details); + Assert.Equal(12, details.TotalChunksPresent); + Assert.Equal(11, details.TotalChunksIterated); + Assert.Equal(75, details.InitialChunkLatencyMillis); + Assert.Equal(220, details.SlowestChunkLatencyMillis); + Assert.Equal(1430, details.SumChunksDownloadTimeMillis); + } + + // ── Additional coverage ────────────────────────────────────────────────────── + + [Fact] + public void TelemetryObserver_Constructor_RejectsNullSession() + { + Assert.Throws(() => new TelemetryObserver(null!)); + } + + [Fact] + public void TelemetryObserver_OnFinalized_WithNullTelemetryClient_IsNoOp() + { + // Arrange: session has no telemetry client (covers the disabled case). + TelemetrySessionContext session = new TelemetrySessionContext + { + SessionId = "s1", + WorkspaceId = 1L, + TelemetryClient = null, + }; + TelemetryObserver observer = new TelemetryObserver(session); + + // Act + Assert: must not throw, must mark itself emitted so a later call is a no-op. + observer.OnFinalized(); + observer.OnFinalized(); + Assert.True(observer.HasEmitted); + } + + [Fact] + public void TelemetryObserver_OnFirstBatchReady_OnlyFirstCallWins() + { + // Arrange + (TelemetryObserver observer, _) = CreateObserver(); + + // Act + observer.OnFirstBatchReady(latencyMs: 50); + observer.OnFirstBatchReady(latencyMs: 999); + + // Assert: subsequent calls do not overwrite the earliest observed latency. + Assert.Equal(50, observer.Context.FirstBatchReadyMs); + } + + [Fact] + public void TelemetryObserver_OnReaderInspected_OverridesResultFormatAndCompression() + { + // Arrange: OnExecuteStarted seeds the placeholder defaults (InlineArrow / false) + // that the legacy CreateTelemetryContext helper stamped. The reader inspection + // hook fires later from the finalize path and must overwrite both fields with + // the server-reported truth (PECO-2988, PECO-2978). + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: false); + observer.OnExecuteSucceeded("stmt-rdr", ExecutionResultFormat.InlineArrow); + + // Act: simulate the CloudFetch + LZ4 case where the active reader reports + // a different result format and compression flag than the defaults. + observer.OnReaderInspected(ExecutionResultFormat.ExternalLinks, isCompressed: true); + observer.OnFinalized(); + + // Assert: context reflects the post-inspection values, and the emitted log + // carries them through to the proto fields. + Assert.Equal(ExecutionResultFormat.ExternalLinks, observer.Context.ResultFormat); + Assert.True(observer.Context.IsCompressed); + OssSqlDriverTelemetryLog log = client.Logs[0].Entry!.SqlDriverLog!; + Assert.Equal(ExecutionResultFormat.ExternalLinks, log.SqlOperation.ExecutionResult); + Assert.True(log.SqlOperation.IsCompressed); + } + + [Fact] + public void TelemetryObserver_OnReaderInspected_DoesNotThrow_WhenInvokedBeforeExecuteStart() + { + // Arrange: caller invokes OnReaderInspected directly without an OnExecuteStarted + // prelude. The method must be idempotent-friendly and never throw, per the + // fail-open contract. + (TelemetryObserver observer, _) = CreateObserver(); + + // Act + Assert + Exception? captured = Record.Exception(() => + observer.OnReaderInspected(ExecutionResultFormat.InlineArrow, isCompressed: false)); + Assert.Null(captured); + + // The fields it touches must reflect the invocation even without a prior call. + Assert.Equal(ExecutionResultFormat.InlineArrow, observer.Context.ResultFormat); + Assert.False(observer.Context.IsCompressed); + } + + [Fact] + public void TelemetryObserver_OnPollCompleted_StoresCountAndLatency() + { + // Arrange + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + + // Act + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatementAsync, isCompressed: false); + observer.OnPollCompleted(count: 5, latencyMs: 250); + observer.OnFinalized(); + + // Assert + Assert.Equal(5, observer.Context.PollCount); + Assert.Equal(250, observer.Context.PollLatencyMs); + OperationDetail detail = client.Logs[0].Entry!.SqlDriverLog!.SqlOperation.OperationDetail; + Assert.Equal(5, detail.NOperationStatusCalls); + Assert.Equal(250, detail.OperationStatusLatencyMillis); + } + } +} diff --git a/docs/designs/PECO-3022-sea-telemetry-integration-design.md b/docs/designs/PECO-3022-sea-telemetry-integration-design.md index 9f02ecb4..d267c489 100644 --- a/docs/designs/PECO-3022-sea-telemetry-integration-design.md +++ b/docs/designs/PECO-3022-sea-telemetry-integration-design.md @@ -255,7 +255,7 @@ public static IConnectionTelemetry Create( IReadOnlyDictionary properties, string host, string assemblyVersion, - IOAuthTokenProvider? oauthTokenProvider, + OAuthClientCredentialsProvider? oauthTokenProvider, string sessionId, // CHANGED: was TSessionHandle? sessionHandle DriverMode.Types.Type mode, // NEW: Thrift or Sea bool enableDirectResults, @@ -264,7 +264,9 @@ public static IConnectionTelemetry Create( Activity? activity); ``` -Thrift caller converts at the boundary: `sessionHandle.SessionId.Guid.ToString()`. SEA caller passes its `_sessionId` directly. +Thrift caller converts at the boundary: `sessionHandle.SessionId.Guid.ToString()` (with a null-check; an empty string is mapped to `null` `SessionId` inside `Create` to match the prior behavior). SEA caller passes its `_sessionId` directly. + +The `mode` parameter is also threaded through `BuildDriverConnectionParams` and `SafeBuildDriverConnectionParams` — both methods previously hardcoded `Mode = DriverMode.Types.Type.Thrift`. The literal is gone from `ConnectionTelemetry.cs`; only the `DatabricksConnection` (Thrift) call site supplies it. ### 5.3 `IConnectionTelemetry` — no surface change @@ -400,7 +402,9 @@ sequenceDiagram ### Connection-level concurrency -`IConnectionTelemetry.DisposeAsync` is called from `StatementExecutionConnection.Dispose` synchronously (consistent with existing Thrift pattern): `_telemetry.DisposeAsync().AsTask().Wait(TimeSpan.FromSeconds(5))`. This flushes any pending events with a hard timeout so connection-close cannot hang on a stuck exporter. +`IConnectionTelemetry.DisposeAsync` is called from `StatementExecutionConnection.Dispose` synchronously (consistent with existing Thrift pattern): `_telemetry.DisposeAsync().Wait(TimeSpan.FromSeconds(5))`. This flushes any pending events with a hard timeout so connection-close cannot hang on a stuck exporter. + +> **Implementation note (T5):** `IConnectionTelemetry.DisposeAsync` returns `Task` (not `ValueTask`), so the call is `_telemetry.DisposeAsync().Wait(TimeSpan.FromSeconds(5))` rather than `.AsTask().Wait(...)`. The result is the same: a hard-bounded synchronous flush. --- @@ -454,7 +458,9 @@ The existing `CircuitBreakerTelemetryExporter` is reused unchanged. Behavior in ### Connection telemetry initialization failure -If `ConnectionTelemetry.Create` throws during `OpenAsync` (e.g. feature-flag fetch fails), the exception is caught locally in `StatementExecutionConnection`, logged at `TRACE`, and `_telemetry` is set to a `NullConnectionTelemetry` singleton (already exists for Thrift). The connection open succeeds; only telemetry is disabled for the connection. +If `ConnectionTelemetry.Create` throws during `OpenAsync` (e.g. feature-flag fetch fails), the exception is caught locally in `StatementExecutionConnection`, logged at `TRACE`, and `_telemetry` is set to a `NoOpConnectionTelemetry` singleton (already exists for Thrift; the name in the codebase is `NoOpConnectionTelemetry`). The connection open succeeds; only telemetry is disabled for the connection. + +> **Implementation note (T5):** `ConnectionTelemetry.Create` already swallows internally and returns `NoOpConnectionTelemetry.Instance` on any failure, so the outer try/catch in `StatementExecutionConnection.InitializeTelemetry` is belt-and-suspenders — it only runs if `Create` is later modified to throw in a refactor. Keeping it makes the fail-open contract explicit at the call site. ---