diff --git a/csharp/doc/gap-report-PECO-3022-sea-telemetry-2026-05-14.md b/csharp/doc/gap-report-PECO-3022-sea-telemetry-2026-05-14.md new file mode 100644 index 00000000..7169eee9 --- /dev/null +++ b/csharp/doc/gap-report-PECO-3022-sea-telemetry-2026-05-14.md @@ -0,0 +1,160 @@ +# Gap Report — PECO-3022 SEA Telemetry Integration + +**Report date:** 2026-05-14 +**Branch under review:** `stack/pr-phase5-sea-statement-telemetry` +**Design doc:** [`docs/designs/PECO-3022-sea-telemetry-integration-design.md`](../../docs/designs/PECO-3022-sea-telemetry-integration-design.md) +**Sprint plan:** [`csharp/doc/sprint-plan-PECO-3022-sea-telemetry-2026-05-14.md`](./sprint-plan-PECO-3022-sea-telemetry-2026-05-14.md) +**Jira:** [PECO-3022](https://databricks.atlassian.net/browse/PECO-3022) + +--- + +## 1. Bottom line + +Foundation work (T1, T2, T4, T5) is complete and matches the design. T3 (`SeaResultFormatMapper`) was never started, and T6 (`StatementExecutionStatement` hookpoints) is **4 of 7 wired**. The combination produces a **critical functional gap**: no SEA telemetry record reaches the wire today, because `OnFinalized` — the only path that enqueues a `OssSqlDriverTelemetryLog` — is not called from `StatementExecutionStatement.Dispose`. + +Closing the gap is straightforward and bounded — roughly 1.5–2 days of focused work, all in `StatementExecutionStatement.cs` and one new mapper file. + +--- + +## 2. What's complete and on-spec + +| Item | Status | Evidence | +|---|---|---| +| `IStatementOperationObserver` interface | DONE | `csharp/src/Telemetry/IStatementOperationObserver.cs` — 8 methods, fail-open contract documented per design §5.1 | +| `TelemetryObserver` impl | DONE | `csharp/src/Telemetry/TelemetryObserver.cs` — `Safe(Action)` helper per design §12, idempotent `OnFinalized` via `Interlocked.CompareExchange` | +| `NullObserver` singleton | DONE | `csharp/src/Telemetry/NullObserver.cs` | +| `SafeObserver` decorator | DONE | `csharp/src/Telemetry/SafeObserver.cs` | +| T1: `ConnectionTelemetry.Create` refactor (`string sessionId` + `DriverMode mode`) | DONE | `csharp/src/Telemetry/ConnectionTelemetry.cs:73-83`; hardcoded `Thrift` removed at lines 466, 651 | +| T1: Thrift caller converts at boundary | DONE | `csharp/src/DatabricksConnection.cs:737-747` — `new Guid(SessionHandle.SessionId.Guid).ToString()`, `mode: Thrift` | +| T4: `DatabricksStatement` observer field | DONE | `csharp/src/DatabricksStatement.cs:81`, injected from `DatabricksConnection.CreateStatement` at line 537-541 | +| T4: Private telemetry hooks removed | DONE | `CreateTelemetryContext`, `RecordSuccess`, `RecordError`, `EmitTelemetry` no longer present | +| T5: `StatementExecutionConnection._telemetry` field + lifecycle | DONE | `StatementExecutionConnection.cs:66`; `InitializeTelemetry` at line 457; `EmitCreateSessionTelemetry` at line 459 | +| T5: `DriverMode.Sea` passed for SEA | DONE | `StatementExecutionConnection.cs:487` | +| T5: 5-second dispose timeout | DONE | `StatementExecutionConnection.cs:1104` — `_telemetry.DisposeAsync().Wait(TimeSpan.FromSeconds(5))` in try/catch | +| T5: Observer construction in `CreateStatement` | DONE | `StatementExecutionConnection.cs:577-587` — `TelemetryObserver(session)` when telemetry enabled, else `NullObserver.Instance` | + +--- + +## 3. Critical gaps + +### G1 — `OnFinalized` not wired into `StatementExecutionStatement.Dispose` ⚠️ **Blocking** + +**Severity:** Critical. `TelemetryObserver.OnFinalized` is the only path that builds a `OssSqlDriverTelemetryLog` and enqueues it for export. Without this call, every observer method up to this point only mutates an in-memory `StatementTelemetryContext` that is garbage-collected when the statement disposes. Zero SEA telemetry records reach `eng_lumberjack`. + +**Location:** `csharp/src/StatementExecution/StatementExecutionStatement.cs:887` (`Dispose`). + +**Fix:** Add `_observer.OnFinalized()` in the dispose path. The observer's idempotency gate (CAS on `_emitted`) guarantees exactly-once even if the error path later also calls `OnFinalized`. + +**Effort:** ~30 minutes including a unit test that confirms exactly-once after both error and dispose. + +--- + +### G2 — `SeaResultFormatMapper` (T3) not created + +**Severity:** High. Without the mapper, `OnExecuteSucceeded` cannot populate `result_format` — currently passes `ExecutionResultFormat.Unspecified` as a placeholder. Result format is one of the sprint-plan success-criteria fields. + +**Locations:** +- Missing file: `csharp/src/StatementExecution/SeaResultFormatMapper.cs` +- Placeholder usage: `csharp/src/StatementExecution/StatementExecutionStatement.cs:389-391` (comment notes the helper "is being added in a parallel phase 6 PR") + +**Fix:** Implement per design §8 four-cell mapping table. Pure-function static helper. Add unit tests for all four cells. + +**Effort:** ~½ day. + +--- + +### G3 — Reader-side observer hookpoints not wired + +**Severity:** High. Three of the seven hookpoints from design §6 are missing. Each one corresponds to a specific telemetry field that will be absent or null on the wire. + +| Hookpoint | Design § | Expected location | Field affected | Status | +|---|---|---|---|---| +| `OnFirstBatchReady` | §6 row 4 | `CreateCloudFetchReader` (line 612) + `InlineArrowStreamReader` ctor (line 970) | `result_latency.result_set_ready_latency_millis` | Missing | +| `OnConsumed` | §6 row 5 | Reader Dispose for both paths | `result_latency.result_set_consumption_latency_millis` | Missing | +| `OnChunksDownloaded` | §6 row 5 | CloudFetch reader Dispose | `chunk_details` (initial/slowest/sum/totals) | Missing | + +**Fix:** Thread the observer through the relevant reader classes (`InlineArrowStreamReader`, `CloudFetchReader`/`StatementExecutionResultFetcher`). For chunk metrics, depends on the gap-fix workstream's `ChunkMetrics` plumbing — if not available, pass `ChunkMetrics.Empty`. + +**Effort:** ~1 day for `OnFirstBatchReady` + `OnConsumed`. `OnChunksDownloaded` adds ~0.5 day if the chunk-metrics aggregator from gap-fix is not yet exposed. + +--- + +## 4. Hookpoint wiring matrix + +For quick reference — current state of all 7 hookpoints in `StatementExecutionStatement`: + +| # | Hookpoint | Design § | Current state | File:Line | +|---|---|---|---|---| +| 1 | `OnExecuteStarted` | §6 row 2 | ✅ Wired | `StatementExecutionStatement.cs:360` | +| 2 | `OnExecuteSucceeded` | §6 row 3 | ⚠️ Wired but passes `Unspecified` (blocked on G2) | `StatementExecutionStatement.cs:391` | +| 3 | `OnPollCompleted` | §6 row 4 | ✅ Wired | `StatementExecutionStatement.cs:554` (accumulated, emitted once on terminal) | +| 4 | `OnFirstBatchReady` | §6 row 5 | ❌ Missing | n/a | +| 5 | `OnConsumed` | §6 row 6 | ❌ Missing | n/a | +| 6 | `OnChunksDownloaded` | §6 row 6 | ❌ Missing | n/a | +| 7 | `OnError` | §6 row 7 | ✅ Wired | `StatementExecutionStatement.cs:462` | +| 8 | `OnFinalized` | §6 row 8 | ❌ **Missing — blocks all SEA telemetry emission** | n/a | + +--- + +## 5. Minor divergences (non-blocking) + +### D1 — `connectTimeoutMilliseconds` semantically mislabeled + +**Location:** `StatementExecutionConnection.cs:490` — passes `(int)TimeSpan.FromSeconds(_waitTimeoutSeconds).TotalMilliseconds`. + +**Issue:** `_waitTimeoutSeconds` is the SEA query-wait (CONTINUE) timeout, not a connection timeout. The Thrift path passes `ConnectTimeoutMilliseconds` which is a different concept. Dashboards filtering on `socket_timeout` will see SEA records mislabeled. + +**Fix:** Read from the connection-timeout connection-string property (or a sensible default), not from `_waitTimeoutSeconds`. ~15 minutes. + +--- + +### D2 — `PendingTelemetryContext` leaks observer internals back to statement + +**Location:** `csharp/src/DatabricksStatement.cs:112` — getter does `(_observer as TelemetryObserver)?.Context`. + +**Issue:** The statement still reaches into the observer's internal `StatementTelemetryContext` from the reader-finalize block at lines 425-431 to override `IsCompressed` / `ResultFormat`. Comment notes this preserves byte-identical output for PECO-2988/2978. + +**Fix (optional cleanup):** Formalize as an observer method (e.g. `OnReaderInspected(format, compressed)`) and remove the downcast. Not blocking; document or fix in a follow-up. + +--- + +### D3 — Design §6 row "Session open" signature drift + +**Location:** `StatementExecutionConnection.cs:517-522` calls `EmitOperationTelemetry(CreateSession, …)` directly. + +**Issue:** Design row says `_telemetry.EmitCreateSessionTelemetry(activity)`. Same observable effect; just a method-name mismatch with the spec. + +**Fix:** None required. Update the design row to match the actual call, or leave as-is. + +--- + +### D4 — Sprint plan file paths stale + +**Issue:** Sprint plan references `csharp/src/Drivers/Databricks/...` but the actual repo layout dropped that prefix — files live under `csharp/src/Telemetry/` and `csharp/src/StatementExecution/`. + +**Fix:** Update the sprint plan paths in a small docs commit. Not blocking. + +--- + +## 6. Action plan (ordered) + +| Step | Action | Effort | Blocks | +|---|---|---|---| +| 1 | **G1**: Add `_observer.OnFinalized()` in `StatementExecutionStatement.Dispose` (line 887). Test exactly-once after error+dispose. | 30 min | All SEA telemetry emission | +| 2 | **G2**: Create `SeaResultFormatMapper` per design §8; update `OnExecuteSucceeded` callsite (line 391) to use it. | ½ day | `result_format` field correctness | +| 3 | **G3a**: Wire `OnFirstBatchReady` at reader construction in both paths (cloud-fetch + inline). | ½ day | `result_set_ready_latency_millis` | +| 4 | **G3b**: Wire `OnConsumed` at reader Dispose; thread observer into reader classes as needed. | ½ day | `result_set_consumption_latency_millis` | +| 5 | **G3c**: Wire `OnChunksDownloaded` for cloud-fetch path. If gap-fix `ChunkMetrics` aggregator not yet exposed, pass `ChunkMetrics.Empty` and note as follow-up. | ½ day | `chunk_details` | +| 6 | **D1**: Fix `connectTimeoutMilliseconds` source. | 15 min | `socket_timeout` field correctness | +| 7 | End-to-end smoke against real SQL warehouse: run a SELECT via REST, confirm `DRIVER_MODE_SEA` record in `eng_lumberjack.prod_frontend_log_sql_driver_log`. | ~1 hour | Sprint demo | +| 8 | T7 integration tests per design §15. | 2 days | Definition-of-done | + +**Critical path:** Steps 1 → 2 → 7 (about 1 day) gets a SEA record on the wire end-to-end. Steps 3–6 round out field coverage to full Thrift parity. + +--- + +## 7. Risks introduced or remaining + +- **Gap-fix `ChunkMetrics` plumbing dependency** — still applies. If not landed by the time G3c is reached, ship with `ChunkMetrics.Empty`. Backfill in a follow-up sprint. Proto fields are nullable. +- **OnFinalized exactly-once semantics** — interface guarantees idempotence via the `_emitted` CAS in `TelemetryObserver`. When wiring, both error path and dispose path should be allowed to call it; only the first wins. Already covered by `TelemetryObserverTests.OnFinalized_CalledTwice_EnqueuesOnce`. +- **Reader threading** — observer must be passed into `InlineArrowStreamReader` and `CloudFetchReader` for the reader-Dispose hookpoints. Watch for constructor signature changes affecting Thrift call sites. diff --git a/csharp/doc/gap-report-PECO-3022-sea-telemetry-e2e-2026-05-15.md b/csharp/doc/gap-report-PECO-3022-sea-telemetry-e2e-2026-05-15.md new file mode 100644 index 00000000..e165811e --- /dev/null +++ b/csharp/doc/gap-report-PECO-3022-sea-telemetry-e2e-2026-05-15.md @@ -0,0 +1,148 @@ +# Gap Report — PECO-3022 SEA Telemetry E2E Coverage + +**Report date:** 2026-05-15 +**Branch under review:** `stack/pr-phase5-sea-statement-telemetry` +**Companion to:** [`gap-report-PECO-3022-sea-telemetry-2026-05-14.md`](./gap-report-PECO-3022-sea-telemetry-2026-05-14.md) (G1–G3, D1–D4) +**Design doc:** [`docs/designs/PECO-3022-sea-telemetry-integration-design.md`](../../docs/designs/PECO-3022-sea-telemetry-integration-design.md) +**Jira:** [PECO-3022](https://databricks.atlassian.net/browse/PECO-3022) + +--- + +## 1. Bottom line + +After the gap-fill commits (`gap-1` through `gap-5`) wired all 8 observer hookpoints in `StatementExecutionStatement`, the production code path is functionally complete. However, **end-to-end test coverage for the SEA telemetry path remains gated off**. The existing E2E telemetry test suite is already protocol-parameterized via `TestConfiguration.Protocol` but every SEA-touching test is guarded by `Skip.If(Protocol == "rest", ...)` with reasons that no longer apply. + +This is the **G4 gap**: shipped code with stale skip-guards equals zero comparator coverage. We currently cannot detect regressions between the Thrift and SEA telemetry paths. + +Closing G4 is mechanical (one-line skip removals) and immediately turns the existing 12-file Thrift test suite into a Thrift-vs-SEA comparator — exactly the validation needed before declaring the sprint goal met. + +--- + +## 2. Existing E2E infrastructure (what we get to reuse) + +| Component | Path | Note | +|---|---|---| +| Capture-side telemetry exporter | `csharp/test/E2E/Telemetry/CapturingTelemetryExporter.cs` | Local in-process capture; no lumberjack dependency | +| Test helpers | `csharp/test/E2E/Telemetry/TelemetryTestHelpers.cs` | Shared utilities (assertion helpers, factories) | +| Protocol-parameterized harness | `TestConfiguration.Protocol` | Test runs over `"thrift"` and `"rest"` based on configuration | + +The harness was designed for cross-protocol parity from the start. Each test file checks `TestConfiguration.Protocol` and skips when it can't run — meaning once the skip is removed, the same test exercises both transports without any further plumbing. + +--- + +## 3. The G4 gap — 12 files, all gated off for SEA + +### Category A — Drop skip immediately ("PECO-3010: telemetry not wired for SEA") + +Seven files carry the marker `PECO-3010 - telemetry not wired for SEA protocol`. PECO-3022 has now wired the telemetry, so the marker is stale. Drop the skip. + +| File | Skip line | What it tests | +|---|---|---| +| `TelemetryBaselineTests.cs` | 43 | Smoke: any record produced, basic fields present | +| `AuthTypeTests.cs` | 40 | `auth_type` field correctness across PAT / OAuth-M2M / OAuth-U2M | +| `SystemConfigurationTests.cs` | 41 | `system_configuration` — driver_version, OS, runtime, locale | +| `ClientTelemetryE2ETests.cs` | 44 | End-to-end record content for typical SELECT | +| `StatementMetadataTelemetryTests.cs` | 51 | `statement_type` + metadata operation tagging | +| `MetadataOperationTests.cs` | 41 | GetObjects / GetTableTypes / GetInfo telemetry | +| `InternalCallTests.cs` | 42 | `is_internal_call` flag for internal-driver SQL | + +### Category B — Drop skip, observe what fails ("Thrift-only" but likely runnable on SEA) + +Four files carry "Thrift-only" reasons that are not strictly true post-PECO-3022. They depend on plumbing this design either delivered (`DriverMode.Sea`, `chunk_details`) or stubbed gracefully (`ChunkMetrics.Empty` fallback). Drop the skip and see which assertions actually fail — those failures are the next real gaps. + +| File | Skip line | Why it might pass now | Risk if it fails | +|---|---|---|---| +| `ConnectionParametersTests.cs` | 41 | `driver_connection_params.mode = DRIVER_MODE_SEA` is now set per design §10; field-population should match Thrift modulo `socket_timeout` (D1 still open) | D1 (`socket_timeout` mislabeled) — fix flagged in previous gap report | +| `ChunkDetailsTelemetryTests.cs` | 44 | `OnChunksDownloaded` is wired with `_cloudFetchReader.GetChunkMetrics() ?? new ChunkMetrics()` | Empty `ChunkMetrics` when gap-fix plumbing absent — assertions may need a tolerance check | +| `ChunkMetricsReaderTests.cs` | 41 | Same as above | Same | +| `ChunkMetricsAggregationTests.cs` | 39 | Same as above | Same | + +### Category C — Keep skipped (out of PECO-3022 scope) + +| File | Skip line | Reason | +|---|---|---| +| `RetryCountTests.cs` | 43 | `retry_count` not wired for SEA. This is in the gap-fix workstream's scope, not PECO-3022. | + +--- + +## 4. Comparator-testing strategy + +The skip removal effectively turns each existing Thrift telemetry test into a Thrift-vs-SEA parity check. Run the suite in two passes: + +```mermaid +graph LR + A[E2E telemetry suite] -->|TestConfiguration.Protocol=thrift| B[Pass A: Thrift records] + A -->|TestConfiguration.Protocol=rest| C[Pass B: SEA records] + B --> D{Per-test diff} + C --> D + D -->|both pass| E[Parity confirmed] + D -->|Thrift only| F[SEA regression - file new gap] + D -->|SEA only| G[Thrift regression - unlikely - file bug] + D -->|both fail| H[Test infra issue] +``` + +**Comparison fields** (per design §6 + §15): +- `session_id`, `sql_statement_id` +- `driver_connection_params.mode` (THRIFT vs SEA — the *only* expected diff) +- `sql_operation.statement_type`, `operation_type` +- `sql_operation.execution_result.format` +- `sql_operation.result_latency.result_set_ready_latency_millis` +- `sql_operation.result_latency.result_set_consumption_latency_millis` +- `sql_operation.operation_detail.n_operation_status_calls` (poll count — may differ by transport since SEA polls via `GetStatementAsync`, Thrift via `GetOperationStatus`) +- `sql_operation.chunk_details.*` (may be empty on SEA until gap-fix lands) +- `error_info.error_name` +- `system_configuration.*`, `driver_connection_params.*` + +The expected diffs are bounded and known. Anything outside the expected-diff set is a bug. + +--- + +## 5. Action plan (ordered) + +| Step | Action | Effort | Outcome | +|---|---|---|---| +| 1 | **Category A: drop 7 PECO-3010 skips** — one-line edit per file. Group as a single commit `[gap-fill][gap-6] Enable SEA-mode for PECO-3010-gated telemetry E2E tests`. | 30 min | 7 test files become Thrift-vs-SEA comparators | +| 2 | **Run E2E suite, protocol=rest**: `dotnet test --filter "Telemetry" -- TestConfiguration.Protocol=rest`. Capture pass/fail per test. | 30 min (CI) | Baseline SEA pass/fail picture | +| 3 | **Run E2E suite, protocol=thrift**: same command with `=thrift`. Compare to step 2. | 30 min (CI) | Comparator diff | +| 4 | **Category B: experimentally drop 4 "Thrift-only" skips** — separate commit per file so partial failures stay revertible. | 30 min | Reveal whether chunk-metrics path works end-to-end with `ChunkMetrics.Empty` fallback | +| 5 | **Triage failures** — for each SEA-failing test from steps 2–4, classify as: (a) D1 / known divergence, (b) genuinely new bug, (c) test expectation needs SEA branch (e.g. expected `EXECUTE_STATEMENT_ASYNC` instead of `EXECUTE_STATEMENT`). | 1–2 hours | Updated gap list with concrete failure modes | +| 6 | **Author final SEA-specific E2E tests** per design §15 for gaps the existing suite doesn't cover: error path, telemetry-disabled-by-feature-flag, concurrent statements. | 1 day | Coverage parity with design §15 | +| 7 | **Sprint demo evidence** — capture two side-by-side records from `CapturingTelemetryExporter` (Thrift + SEA for the same query) and include in the demo. | 1 hour | Visible parity confirmation | + +**Critical path:** Steps 1 → 2 → 3 (≈1.5 hours) gives us a Thrift-vs-SEA comparator baseline. The rest is triage and follow-up. + +--- + +## 6. Concrete expected diffs (so failures can be classified fast) + +| Field | Thrift | SEA | Status | +|---|---|---|---| +| `driver_connection_params.mode` | `DRIVER_MODE_THRIFT` | `DRIVER_MODE_SEA` | **Expected** | +| `operation_detail.operation_type` | `EXECUTE_STATEMENT` | `EXECUTE_STATEMENT_ASYNC` | **Expected** — SEA is always async on wire (design §17 open question #3) | +| `operation_detail.n_operation_status_calls` | counts `GetOperationStatusAsync` | counts `GetStatementAsync` | **Expected** — design §17 open question #1 | +| `driver_connection_params.socket_timeout` | actual connect timeout | mislabeled as `_waitTimeoutSeconds` | **Known divergence — D1** | +| `chunk_details.*` | populated when CloudFetch active | may be zero/null until gap-fix `ChunkMetrics` plumbing lands | **Expected — bounded** | +| `error_info.error_name` | exception type name | exception type name | should match | +| `system_configuration.*` | driver/OS/runtime info | same | should match | +| `auth_type` | resolved auth | resolved auth | should match (T5 wired identically) | +| `sql_statement_id` | populated | populated | should match (gap-1 OnFinalized + gap-2 mapper) | + +Anything outside this table that diverges → file as a new gap. + +--- + +## 7. Risks + +- **CI flake** — real-warehouse E2E tests are slow and prone to transient failures. Mitigate by running each protocol twice on first execution and treating only consistent failures as real. +- **Hidden Thrift-only assertions** — some tests in Category B may make assertions on fields where SEA simply does not emit (e.g. `retry_count` if accidentally asserted alongside chunk details). Expect ~25% of Category B to need a small SEA branch in the assertion rather than a wholesale revert. +- **Skip-removal collateral** — dropping skips on tests authored when SEA was completely absent means we may surface assumptions buried in test fixtures (auth setup, warehouse selection). Allocate buffer time in step 5. +- **Gap-fix `ChunkMetrics` dependency** — Category B tests for chunk details may exercise the `ChunkMetrics.Empty` fallback path. The tests should be updated to tolerate empty / unset chunk fields on SEA until the gap-fix workstream's `CloudFetchDownloader → CloudFetchReader.GetChunkMetrics()` plumbing surfaces real values. + +--- + +## 8. Definition of done for G4 + +- All seven Category A skip guards removed. +- E2E suite passes on both `Protocol=thrift` and `Protocol=rest` (Category B failures triaged into either fix or kept-skipped with documented reason). +- Sprint demo includes side-by-side Thrift and SEA telemetry records for the same query. +- Any new failures surfaced by skip removal are either fixed in this sprint or filed as follow-up gaps with concrete repro steps. diff --git a/csharp/doc/gap-report-PECO-3022-sea-telemetry-e2e-comparator-findings-2026-05-18.md b/csharp/doc/gap-report-PECO-3022-sea-telemetry-e2e-comparator-findings-2026-05-18.md new file mode 100644 index 00000000..c2c87f65 --- /dev/null +++ b/csharp/doc/gap-report-PECO-3022-sea-telemetry-e2e-comparator-findings-2026-05-18.md @@ -0,0 +1,213 @@ +# Gap Report — PECO-3022 SEA Telemetry E2E Comparator Findings + +**Report date:** 2026-05-18 +**Branch under review:** `stack/pr-phase5-sea-statement-telemetry` +**Companion to:** +- [`gap-report-PECO-3022-sea-telemetry-2026-05-14.md`](./gap-report-PECO-3022-sea-telemetry-2026-05-14.md) (G1–G3, D1–D4) +- [`gap-report-PECO-3022-sea-telemetry-e2e-2026-05-15.md`](./gap-report-PECO-3022-sea-telemetry-e2e-2026-05-15.md) (G4) +- [`gap-report-PECO-3022-sea-telemetry-prod-findings-2026-05-15.md`](./gap-report-PECO-3022-sea-telemetry-prod-findings-2026-05-15.md) (B1–B7, prod-data findings) + +--- + +## 1. Bottom line + +After G4 closed (commit `1f3d5aa`), the existing E2E telemetry suite was unblocked for SEA and run end-to-end. **Results: 50 passed, 33 failed, 10 skipped (93 total).** The skip-removal effectively turned the suite into a Thrift-vs-SEA comparator and exposed four new categories of gaps that prior reports did not surface: + +- **B8** — Metadata operations on SEA emit `operation_type=EXECUTE_STATEMENT_ASYNC` instead of categorized `LIST_CATALOGS / LIST_SCHEMAS / LIST_TABLES / LIST_COLUMNS / LIST_TABLE_TYPES`. Eleven test failures cluster here. +- **B9** — UPDATE statements on SEA produce zero telemetry events. `OnFinalized` does not fire on the `ExecuteUpdate` path. +- **B10** — `driver_connection_params.enable_direct_results` is hardcoded to `true` on SEA, ignoring the user-supplied `adbc.databricks.enable_direct_results` connection property. +- **B11** — SEA does not honor `SparkParameters.ConnectTimeoutMilliseconds`. After the D1 fix, `socket_timeout` reads from `HttpClient.Timeout` which is wired only to `CloudFetchTimeoutMinutes`. Tests that set `ConnectTimeoutMilliseconds=120000` see `socket_timeout=900` (the 15-minute default), not `120`. + +The remaining 17 failures fall under **B5 / ChunkDetails** — already-known gap-fix workstream dependency for `ChunkMetrics` aggregation. + +All four new findings are bounded code-side fixes. + +--- + +## 2. Run context + +| Attribute | Value | +|---|---| +| Command | `dotnet test --filter "FullyQualifiedName~E2E.Telemetry"` | +| Config | `/home/jade.wang/tmp/adbc_test_config_sea.json` (existing test config + `"protocol": "rest"`) | +| Driver build | Local from `stack/pr-phase5-sea-statement-telemetry` HEAD (includes gap-fills 1–5 + D1 + B3/B4/B6) | +| Total tests | 93 | +| Passed | 50 | +| Failed | 33 | +| Skipped | 10 (5 `RetryCountTests` intentional, 5 misc edge cases) | +| Duration | 6m32s | + +**Note on lumberjack ingestion:** the tests use `TelemetryClientManager.ExporterOverride = exporter` to swap in `CapturingTelemetryExporter`. Records are captured **in-process for assertion only** and do **not** flow to `/telemetry-ext`. Re-running these tests will not produce additional `prod_frontend_log_sql_driver_log` records. + +--- + +## 3. New findings + +### B8 — Metadata operations not categorized on SEA + +**Severity:** High (parity gap) + +**Affected tests:** 11 failures across two files: + +| File | Tests | +|---|---| +| `MetadataOperationTests.cs` | `Telemetry_GetObjects_Catalogs_EmitsListCatalogs`, `_Schemas_`, `_Tables_`, `_Columns_`, `_AllDepths_EmitCorrectOperationType`, `Telemetry_GetTableTypes_EmitsListTableTypes` | +| `StatementMetadataTelemetryTests.cs` | `Telemetry_StatementGetCatalogs_`, `_GetSchemas_`, `_GetTables_`, `_GetColumns_`, `_AllCommands_EmitCorrectOperationType` | + +**Evidence:** All failures share the same assertion pattern: + +``` +Assert.NotNull() Failure: Value is null + at FindLog(logs, proto => + proto.SqlOperation?.OperationDetail?.OperationType == OperationType.ListCatalogs); +``` + +The captured records contain only `operation_type=EXECUTE_STATEMENT_ASYNC` (the post-B3 default). No record carries `LIST_CATALOGS / LIST_SCHEMAS / LIST_TABLES / LIST_COLUMNS / LIST_TABLE_TYPES`. + +**Likely cause:** `StatementExecutionStatement.cs:416` unconditionally emits: +```csharp +_observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatementAsync, isCompressed); +``` + +Metadata operations (`GetCatalogs`, `GetSchemas`, `GetTables`, `GetColumns`, `GetTableTypes`) flow through the same code path but should emit `OperationType.ListCatalogs` (etc.) and `StatementType.Metadata` instead — matching the Thrift path which uses `DatabricksStatement.CreateMetadataTelemetryContext()`. + +**Proposed fix:** +1. Detect metadata operations at the entry point (SEA metadata methods like `GetCatalogsAsync`, `GetSchemasAsync`, etc., or via a `isMetadataExecution` flag that already exists on `ExecuteQueryAsync(... bool isMetadataExecution = false)` per the SDD findings). +2. Pass the appropriate `(StatementType, OperationType)` pair to `OnExecuteStarted` based on the metadata operation. +3. Likely needs a small mapping helper (e.g., `SeaMetadataOperationMapper`) or a parameter threading through `ExecuteQueryInternalAsync`. + +**Effort:** ~½ day fix + ~½ day test/verification. + +--- + +### B9 — UPDATE statements emit zero telemetry on SEA + +**Severity:** High (missing emission, like B4 but for a different code path) + +**Affected tests:** 2 failures + 1 skip: + +| Test | Failure | +|---|---| +| `InternalCallTests.UserUpdate_IsNotMarkedAsInternal` | "Expected at least 1 telemetry event, got 0" | +| `TelemetryBaselineTests.BaselineTest_UpdateStatement_FieldsPopulated` | Skipped: "No telemetry captured for UPDATE statement" | +| (others may be affected — full UPDATE coverage limited in the suite) | + +**Evidence (from `InternalCallTests.cs:114-120`):** +```csharp +statement.SqlQuery = "USE default"; +statement.ExecuteUpdate(); +statement.Dispose(); +var logs = await TelemetryTestHelpers.WaitForTelemetryEvents(exporter, expectedCount: 1); +Assert.True(logs.Count >= 1, $"Expected at least 1 telemetry event, got {logs.Count}"); +``` + +UPDATE was executed but **no telemetry record was emitted**. + +**Likely cause:** `StatementExecutionStatement.ExecuteUpdate()` / `ExecuteUpdateAsync()` is a separate entry point from `ExecuteQueryInternalAsync()`. The observer hookpoints (`OnExecuteStarted`, `OnFinalized`, etc.) were wired into the QUERY path only. The UPDATE path likely: +- Either bypasses the observer entirely (most likely) +- Or sets up the observer but the `_executeStarted` gate at `Dispose:1054` (`if (_executeStarted) _observer.OnFinalized();`) never trips because `_executeStarted` is set only inside `ExecuteQueryInternalAsync` + +**Proposed fix:** Wire the same observer lifecycle (`OnExecuteStarted` → `OnExecuteSucceeded`/`OnError` → set `_executeStarted=true` → eventually `OnFinalized` via Dispose) into the `ExecuteUpdate` / `ExecuteUpdateAsync` code path. Operation type for UPDATE is `EXECUTE_STATEMENT_ASYNC` (same as query) but `StatementType.Update`. + +**Effort:** ~½ day — mirror the existing Execute path's observer wiring. + +--- + +### B10 — `enable_direct_results` hardcoded on SEA, ignores user property + +**Severity:** Medium + +**Affected test:** `ConnectionParametersTests.ConnectionParams_EnableDirectResults_IsPopulated` + +**Evidence:** Test sets `DatabricksParameters.EnableDirectResults = "false"` and expects `protoLog.DriverConnectionParams.EnableDirectResults == false`. Actual: `true`. + +**Likely cause:** `StatementExecutionConnection.cs:487` passes `enableDirectResults: true` literally to `ConnectionTelemetry.Create`. SEA does not have a true "direct results" concept (it uses disposition parameters instead), so the hardcoding is a placeholder. + +**Proposed fix — two options:** + +- **Option A** (recommended): Read the user property `DatabricksParameters.EnableDirectResults` and emit faithfully. Even if SEA doesn't use the value internally, reflecting user intent makes the telemetry record honest. Same pattern Thrift uses. +- **Option B**: Leave the field unset / `null` for SEA, since the concept doesn't apply. Adjust the test to skip this assertion for `Protocol == "rest"`. + +**Effort:** ~30 minutes (Option A) or ~15 minutes (Option B). + +--- + +### B11 — SEA does not honor `ConnectTimeoutMilliseconds`; HttpClient.Timeout only reads `CloudFetchTimeoutMinutes` + +**Severity:** Medium + +**Affected test:** `ConnectionParametersTests.ConnectionParams_SocketTimeout_IsPopulated` + +**Evidence:** Test sets `SparkParameters.ConnectTimeoutMilliseconds = 120000` (with retries disabled to prevent bump). Expects `socket_timeout == 120`. After D1 fix, actual is `900` (the default `CloudFetchTimeoutMinutes=15` → 15m → 900s). + +**Likely cause:** `StatementExecutionConnection.CreateHttpClient` reads only `DatabricksParameters.CloudFetchTimeoutMinutes` (line 322); it does not read `SparkParameters.ConnectTimeoutMilliseconds`. After D1, `socket_timeout` is correctly sourced from `_httpClient.Timeout` — but `_httpClient.Timeout` reflects only `CloudFetchTimeoutMinutes`, not `ConnectTimeoutMilliseconds`. So the chain of bugs is: +- D1 fix: ✅ socket_timeout now reads HttpClient.Timeout +- B11: ❌ HttpClient.Timeout doesn't honor the user's `ConnectTimeoutMilliseconds` + +**Proposed fix:** In `CreateHttpClient`, prefer `ConnectTimeoutMilliseconds` if set (matching Thrift behavior); fall back to `CloudFetchTimeoutMinutes * 60_000` otherwise. Or unify on a single timeout property and document. + +**Effort:** ~1 hour including test verification. + +--- + +## 4. Already-known failure cluster (no new action) + +### B5 / ChunkDetails (~17 failures) + +Files affected: `ChunkDetailsTelemetryTests.cs`, `ChunkMetricsReaderTests.cs`, `ChunkMetricsAggregationTests.cs`. + +All failures stem from `chunk_details` fields being empty/zero on SEA because the gap-fix workstream's `CloudFetchDownloader → ChunkMetrics → CloudFetchReader.GetChunkMetrics()` plumbing has not yet been wired for SEA. SEA's `OnChunksDownloaded` hookpoint passes `new ChunkMetrics()` as the fallback (per design §9 and gap-1 commit `d6d54ec`). + +**Action:** None in PECO-3022 scope. Will auto-resolve when the gap-fix workstream lands SEA-side chunk-metrics aggregation. The failing tests should either stay failing (so the gap stays visible) or be skipped with `Skip.If(Protocol == "rest" && !chunkMetricsAvailable, ...)` once that's wired. + +--- + +## 5. Action plan — closed 2026-05-18 + +| Step | Fix | Status | Commit | +|---|---|---|---| +| 1 | **B11**: Honor `ConnectTimeoutMilliseconds` in `StatementExecutionConnection.CreateHttpClient` | ✅ Closed | `360048f` | +| 2 | **B10**: Read `DatabricksParameters.EnableDirectResults` in SEA path | ✅ Closed | `360048f` | +| 3 | **B9**: Wire observer hookpoints into `ExecuteUpdate`/`ExecuteUpdateAsync` | ✅ Closed | `ec07e4d` | +| 4 | **B8**: Detect metadata operations and emit `(StatementType.Metadata, OperationType.List*)` | ✅ Closed | uncommitted SDD + `360048f` | +| 5 | **B5 cluster** (17 tests): chunk_details empty — gap-fix workstream dependency | ⏸️ Expected | n/a | + +### Final triage of the remaining metadata-test failures + +`Telemetry_GetObjects_Schemas_EmitsListSchemas`, `Telemetry_GetObjects_Tables_EmitsListTables`, and `Telemetry_GetObjects_AllDepths_EmitCorrectOperationType` were investigated with a single-test verbose run. **Root cause: `TaskCanceledException` from `SHOW SCHEMAS IN ALL CATALOGS` / `SHOW TABLES IN ALL CATALOGS`** — the test workspace cannot complete these queries within the 10-second metadata-timeout budget (`_waitTimeoutSeconds` default). The telemetry implementation is correct; the failures are environmental. + +**Not a PECO-3022 fix.** Options for follow-up: +- Raise the SEA metadata timeout when `protocol=rest` so cross-catalog scans complete (separate PR; affects user-facing behavior). +- Or skip these tests on slow workspaces with `Skip.If(...)` and a documented reason. +- Or accept this as a known limitation of the test environment and document. + +--- + +## 6. Updated gap-landscape summary + +| Gap | Status | +|---|---| +| G1, G2, G3a, G3b, B3, B4, B6 — observer hookpoints + categorization | ✅ Closed | +| G3c / B5 — chunk_details | ⏸️ Gap-fix dependency | +| D1 / B2 — socket_timeout source | ✅ Closed (`5e2819c`) — but B11 exposes that the upstream HttpClient timeout doesn't honor user-supplied ConnectTimeoutMilliseconds | +| B1 — driver_name drift | ✅ No fix needed (historical) | +| B7 — poll_count coverage | ⏸️ Likely expected | +| G4 — E2E skip-guards | ✅ Closed (`1f3d5aa`) | +| **B8** — metadata operation types | ❌ **New finding** — high severity | +| **B9** — UPDATE no telemetry | ❌ **New finding** — high severity | +| **B10** — enable_direct_results hardcoded | ❌ **New finding** — medium | +| **B11** — ConnectTimeoutMilliseconds not honored | ❌ **New finding** — medium | + +--- + +## 7. Notes on running the E2E suite for lumberjack verification + +The E2E telemetry tests swap in `CapturingTelemetryExporter` via `TelemetryClientManager.ExporterOverride`, which **blocks the real `DatabricksTelemetryExporter` from posting to `/telemetry-ext`**. Re-running these tests will not produce additional records in `prod_frontend_log_sql_driver_log`. + +If independent prod-side verification is needed, the path is: +1. Write a small standalone driver harness (a regular .NET program, or an xunit test that does not call `CreateConnectionWithCapturingTelemetry`) that constructs `DatabricksDriver` naturally with telemetry enabled. +2. Run a few queries against a real warehouse. +3. Wait ~1 hour for lumberjack ingestion. +4. Re-query `prod_frontend_log_sql_driver_log` to verify the in-process assertions match what arrives on the wire. + +The in-process E2E assertions are strong evidence the wire records would be correct (same code paths, same record shapes) — but if you want a final stamp, the harness is bounded work (~1 hour). diff --git a/csharp/doc/gap-report-PECO-3022-sea-telemetry-prod-findings-2026-05-15.md b/csharp/doc/gap-report-PECO-3022-sea-telemetry-prod-findings-2026-05-15.md new file mode 100644 index 00000000..086cf66c --- /dev/null +++ b/csharp/doc/gap-report-PECO-3022-sea-telemetry-prod-findings-2026-05-15.md @@ -0,0 +1,246 @@ +# Gap Report — PECO-3022 SEA Telemetry Production Findings + +**Report date:** 2026-05-15 +**Branch under review:** `stack/pr-phase5-sea-statement-telemetry` +**Companion to:** [`gap-report-PECO-3022-sea-telemetry-2026-05-14.md`](./gap-report-PECO-3022-sea-telemetry-2026-05-14.md) (G1–G3, D1–D4) and [`gap-report-PECO-3022-sea-telemetry-e2e-2026-05-15.md`](./gap-report-PECO-3022-sea-telemetry-e2e-2026-05-15.md) (G4) +**Source:** `main.eng_lumberjack.prod_frontend_log_sql_driver_log` via `central-logfood-prodtools-azure-westus` workspace +**Jira:** [PECO-3022](https://databricks.atlassian.net/browse/PECO-3022) + +--- + +## 1. Bottom line + +PECO-3022 telemetry **is reaching production lumberjack** as designed — the G1 critical-emission gap is conclusively closed. However, comparator analysis against Thrift records on the same driver version (1.1.4) surfaced **seven concrete bugs**, two of which (B2, B3) are spec deviations the existing gap reports did not predict at this magnitude. None block emission, but together they make SEA records look meaningfully different from Thrift records in places where the design called for parity. + +The findings are bounded and reachable — most are single-line fixes in `StatementExecutionConnection.cs` or `StatementExecutionStatement.cs`. None require new architecture. + +--- + +## 2. Sample context + +| Attribute | Value | +|---|---| +| Window | 14 days (2026-05-04 → 2026-05-18) | +| Driver-name filter | `'ADBC Databricks Driver'` ∪ `'Databricks ADBC Driver'` | +| Driver version | `1.1.4` (the build with PECO-3022 wiring) | +| SEA records | 23 EXECUTE_STATEMENT + 23 CREATE_SESSION + 23 DELETE_SESSION (one test session per execution) | +| Thrift records | 555 EXECUTE_STATEMENT + 477 CLOSE_STATEMENT + 233 CREATE_SESSION + 192 DELETE_SESSION | +| Workspace | `benchmarking-prod-aws-us-west-2.cloud.databricks.com` (test/benchmark) | +| `client_app_name` | `testhost` (dev/test data, not real customer usage) | +| Day | All 23 SEA records emitted 2026-05-14 | + +Sample is small but reliable for field-level parity checks. Latency and population percentages may shift with broader adoption. + +--- + +## 3. Bugs found + +### B1 — `driver_name` string drift between SEA and Thrift + +**Severity:** Medium +**Evidence:** Two distinct strings coexist in v1.1.4: +- `Databricks ADBC Driver` — 685 records, all `THRIFT` mode +- `ADBC Databricks Driver` — 4,401 records mixed `THRIFT` + 69 `SEA` + +Lumberjack dashboards and downstream queries filtering on the established string `'Databricks ADBC Driver'` will silently miss all SEA records and a significant fraction of recent Thrift records. + +**Likely cause:** The constant that drives `system_configuration.driver_name` was renamed in the C# driver at some recent point. Both code paths (Thrift, SEA) appear to read from sources that disagree, or the constant flipped between releases. + +**Proposed fix:** Identify the canonical driver-name constant in `csharp/src/Telemetry/ConnectionTelemetry.cs` (or wherever `BuildSystemConfiguration` populates `driver_name`) and ensure both transports use the same literal. Confirm with downstream (telemetry consumer team) which string is authoritative for dashboards. + +**Effort:** ~30 minutes for the fix; ~1 hour for coordinating the canonical name with consumers. + +--- + +### B2 — `socket_timeout` populated with wrong source AND wrong scale (D1 confirmed and worse) + +**Severity:** High — this is the same D1 issue from the original gap report, but production data shows it's worse than predicted. + +**Evidence (14-day query, v1.1.4):** + +| Mode | min | avg | max | +|---|---|---|---| +| SEA | 0 | 6.5 | 10 | +| Thrift | 900 | 900 | 900 | + +The proto field `driver_connection_params.socket_timeout` is in **seconds**. Thrift emits 900 (a connection timeout). SEA emits 0–10 — exactly the range of `_waitTimeoutSeconds`, which is the SEA query-wait (CONTINUE) timeout, not a connection timeout. + +**Code path:** +```csharp +// StatementExecutionConnection.cs:490 +connectTimeoutMilliseconds: (int)TimeSpan.FromSeconds(_waitTimeoutSeconds).TotalMilliseconds, +``` + +The `_waitTimeoutSeconds` → ms conversion produces 10,000 ms, but `ConnectionTelemetry` then divides by 1,000 to populate the proto's seconds field, landing on 10. The original gap report flagged this as a "mislabeled" field — production data shows the value is wrong by 2 orders of magnitude on top of the labeling issue. + +**Proposed fix:** Pass an actual connect-timeout value (read from `ConnectTimeoutMilliseconds`-equivalent connection-string property), not `_waitTimeoutSeconds`. Mirror what `DatabricksConnection.cs:737-747` passes. + +**Effort:** ~15 minutes. + +--- + +### B3 — `operation_type = EXECUTE_STATEMENT` instead of `EXECUTE_STATEMENT_ASYNC` + +**Severity:** High (spec deviation) + +**Evidence:** All 23 SEA EXECUTE_STATEMENT records show `sql_operation.operation_detail.operation_type = 'EXECUTE_STATEMENT'`. Design §17 open question #3 calls this out: + +> Always `EXECUTE_STATEMENT_ASYNC` (SEA is always async on the wire). Confirm this matches the telemetry consumer team's expectation, or whether we should map specifically (`EXECUTE_STATEMENT` for sync-emulated paths). + +Production code passes the synchronous variant. + +**Code path:** +```csharp +// StatementExecutionStatement.cs:390 +_observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed); +``` + +**Proposed fix:** Change `OperationType.ExecuteStatement` to `OperationType.ExecuteStatementAsync` (assuming the proto enum has that value). Verify the enum name; design uses `EXECUTE_STATEMENT_ASYNC` as the proto convention. + +**Effort:** ~15 minutes. + +--- + +### B4 — Reader latencies populated on only 78% of SEA EXECUTE records + +**Severity:** High + +**Evidence:** +- `result_set_ready_latency_millis`: 18 / 23 SEA EXECUTE records (78%) — Thrift: 555 / 555 (100%) +- `result_set_consumption_latency_millis`: 18 / 23 SEA EXECUTE records (78%) — Thrift: 555 / 555 (100%) + +5 statements executed and finalized but never populated reader latencies. No errors recorded on any of the 23 records. This is unexpected — the design intent is that every successful EXECUTE produces both metrics. + +**Likely causes (in order of probability):** + +1. Statements that fail or are canceled after execute but before reader construction reach `OnFinalized` (via Dispose) without ever firing `OnFirstBatchReady` or `OnConsumed`. The current `OnFinalized` guard (`_executeStarted`) allows the emission. Verify by querying the 5 missing-reader-latency records for status field, cancellation markers, or unusual error_info shape. +2. Inline-arrow path: `InlineArrowStreamReader` construction at line 685 fires `OnFirstBatchReady`, but Dispose at line 1024 may not fire `OnConsumed` if the reader is never read from (e.g., a consumer that opens then immediately closes the result). +3. Race: `Dispose` runs on a different thread than reader iteration and skips the observer call. + +**Proposed fix:** +1. Query lumberjack for the 5 missing records to characterize them (cancellation? immediate-close? what queries?). +2. Audit the reader Dispose paths in `StatementExecutionStatement.cs:1024` and the cloud-fetch reader equivalent — ensure `OnConsumed` and (where applicable) `OnFirstBatchReady` always fire on the successful path. +3. Consider emitting `OnConsumed(0)` from `OnFinalized` if Dispose never fired it — but only after understanding the root cause; this should not paper over a real bug. + +**Effort:** ~½ day investigation + 1 hour fix. + +--- + +### B5 — `chunk_details` empty for SEA (expected, gap-fix dependency) + +**Severity:** Low / expected +**Evidence:** 0 / 23 SEA EXECUTE records have `chunk_details` populated; Thrift shows 23 / 555. The 4 SEA EXTERNAL_LINKS queries did not produce chunk metrics. + +**Cause:** Known dependency on the gap-fix workstream's `CloudFetchDownloader → ChunkMetrics → CloudFetchReader.GetChunkMetrics()` plumbing. The SEA `OnChunksDownloaded` hookpoint (line 1143) is wired but falls back to `new ChunkMetrics()` when the reader returns null. The observer treats `new ChunkMetrics()` (all-zero) as effectively empty. + +**Proposed fix:** No action in PECO-3022. Track in the gap-fix workstream. Once the aggregator surfaces real values for SEA, this field will populate automatically. + +**Effort:** None for PECO-3022. + +--- + +### B6 — No `CLOSE_STATEMENT` event for SEA + +**Severity:** Medium (parity gap) +**Evidence:** Thrift emits 477 `CLOSE_STATEMENT` operation events. SEA emits 0. Net Thrift CLOSE_STATEMENT shipping rate is 477 / 555 ≈ 86% of executes — meaning nearly every Thrift statement disposal produces a standalone close event in addition to the per-statement record. + +**Cause:** SEA's `Dispose` path calls `_observer.OnFinalized()` which builds the per-statement `OssSqlDriverTelemetryLog`, but no equivalent of Thrift's `EmitOperationTelemetry(CLOSE_STATEMENT, ...)` call exists for SEA. Design §6 does not list CLOSE_STATEMENT under "SEA integration points"; it may have been overlooked. + +**Question for design review:** Is the absence intentional (per-statement record subsumes the CLOSE_STATEMENT event) or a parity gap to fix? + +**Proposed fix (if a gap):** Add a `_telemetry.EmitOperationTelemetry(CLOSE_STATEMENT, ...)` call in `StatementExecutionStatement.Dispose` mirroring `DatabricksStatement` Thrift behavior. + +**Effort:** ~½ day including design review and tests. + +--- + +### B7 — `poll_count` low-coverage on SEA EXECUTE records + +**Severity:** Low (likely expected) +**Evidence:** `n_operation_status_calls` populated on: +- SEA: 8 / 23 (35%) +- Thrift: 65 / 555 (12%) + +Both transports show low population — both depend on the polling loop actually running. Statements with immediate terminal responses skip polling and therefore skip `OnPollCompleted`. This may be expected behavior. + +**Cause:** `OnPollCompleted` is only called when the polling loop terminates (`StatementExecutionStatement.cs:610`). If the first `GetStatementAsync` returns a terminal state, no poll occurred and no event fires. + +**Decision needed:** Should the observer emit `OnPollCompleted(0, 0)` for statements that didn't poll? Pro: consistent telemetry shape, allows aggregation queries to assume the field exists. Con: emits noise for the common case. + +**Proposed fix:** None unless consumers want consistent shape. Document the contract: `poll_count` is null when no polling occurred. + +**Effort:** None unless we decide to change the contract. + +--- + +## 4. Confirmed working items (from production data) + +The following design behaviors are verified by lumberjack records: + +| Item | Verification | +|---|---| +| Per-statement record emission (G1) | 23 / 23 SEA EXECUTE records present with `sql_statement_id` populated | +| `SeaResultFormatMapper` (G2) | 19 INLINE_ARROW + 4 EXTERNAL_LINKS, zero `FORMAT_UNSPECIFIED` | +| `OnExecuteStarted` (T6) | All 23 records have `statement_type`, `is_compressed` populated | +| `OnExecuteSucceeded` (T6) | All 23 records have `sql_statement_id` populated | +| `OnError` (T6) | 0 / 23 errors in sample — cannot verify positively, but the path is wired (covered by unit tests) | +| `OnFinalized` (gap-1) | Every executed statement produced a record | +| `OnFirstBatchReady` / `OnConsumed` (gap-3, gap-4) | Wired and working for the majority of records (see B4 for partial gap) | +| `DriverMode = SEA` (design §10) | All 23 records have `driver_connection_params.mode = 'SEA'` | +| Session lifecycle (T5) | 23 CREATE_SESSION ↔ 23 DELETE_SESSION balance — every session opens and closes | +| `auth_type`, `auth_mech`, `auth_flow` | Correctly populated (`pat`, `PAT`, `TOKEN_PASSTHROUGH`) | +| `system_configuration` (driver_version, runtime, OS) | All fields populated | + +--- + +## 5. Action plan (ordered by effort × value) + +| Step | Action | Severity | Effort | Outcome | +|---|---|---|---|---| +| 1 | **B3**: Change `OperationType.ExecuteStatement` → `OperationType.ExecuteStatementAsync` in `StatementExecutionStatement.cs:390` | High | 15 min | Spec parity restored | +| 2 | **B2 (D1)**: Replace `_waitTimeoutSeconds`-derived value in `StatementExecutionConnection.cs:490` with a real connect-timeout source | High | 15 min | `socket_timeout` matches Thrift scale and semantics | +| 3 | **B1**: Audit the `driver_name` constant; align SEA path with the canonical Thrift literal | Medium | 30 min | Lumberjack dashboards no longer miss SEA records | +| 4 | **B4**: Query lumberjack for the 5 missing-reader-latency records; characterize them; fix the missing-emit path | High | ½ day | 100% reader-latency coverage | +| 5 | **B6**: Decide on CLOSE_STATEMENT parity with design owner; implement if confirmed | Medium | ½ day | Operation-event parity with Thrift | +| 6 | **B5**: No action — track in gap-fix workstream | Expected | 0 | Auto-resolved when ChunkMetrics aggregator lands | +| 7 | **B7**: Confirm contract with consumer team — document or adjust | Low | 30 min | Documented null-handling for `n_operation_status_calls` | + +**Critical path:** Steps 1–3 (~1 hour) eliminate the highest-visibility parity issues. Step 4 is the only remaining investigative effort. + +--- + +## 6. Updated gap-landscape summary + +| Gap | Status (before this report) | Status (after prod findings) | Status (after follow-up fixes 2026-05-18) | +|---|---|---|---| +| G1 (OnFinalized) | ✅ Fixed | ✅ Verified in prod | ✅ | +| G2 (SeaResultFormatMapper) | ✅ Fixed | ✅ Verified in prod | ✅ | +| G3a (OnFirstBatchReady) | ✅ Fixed | ⚠️ B4 — 78% pop, not 100% | ✅ Fixed (commit `ad268d1`) | +| G3b (OnConsumed) | ✅ Fixed | ⚠️ B4 — 78% pop, not 100% | ✅ Fixed (commit `ad268d1`) | +| G3c (OnChunksDownloaded) | ⏸️ Empty as designed | ⏸️ Confirmed (B5) | ⏸️ Expected — gap-fix workstream dependency | +| D1 (socket_timeout) | ❌ Open (predicted "mislabeled") | ❌ **B2 — worse than predicted, also wrong scale** | ✅ Fixed (commit `5e2819c`) — uses `_httpClient.Timeout`, expected to land on 900s matching Thrift | +| G4 (E2E coverage) | ❌ Still open | ❌ Still open | ❌ Still open — see e2e gap report | +| **B1** (driver_name drift) | n/a | ❌ New finding | ✅ No code change needed — already fixed pre-1.1.4; production records under `Databricks ADBC Driver` are stale from older builds. Current code only uses canonical `ADBC Databricks Driver`. | +| **B3** (op_type EXECUTE_STATEMENT) | n/a (open question in design) | ❌ New finding — needs fix or design update | ✅ Fixed (commit `fab04a8`) — passes `OperationType.ExecuteStatementAsync` | +| **B6** (no CLOSE_STATEMENT) | n/a | ❌ New finding — parity question | ✅ Fixed (commit `4126a2e`) — emits CLOSE_STATEMENT event with idempotency gate | +| **B7** (poll_count coverage) | n/a | ⏸️ Likely expected | ⏸️ Documented; consumer-team confirmation pending | + +### Code-side gap landscape, summary + +After the 2026-05-18 follow-up fixes (commits `fab04a8`, `ad268d1`, `4126a2e`, `5e2819c`, `1f3d5aa`), **every code-side gap surfaced by these three reports is closed**, except for: + +- **G3c / B5** — `chunk_details` empty for SEA. Dependent on the gap-fix workstream's `ChunkMetrics` aggregator. Will auto-resolve when upstream lands. +- **B7** — `poll_count` low coverage. Likely expected behavior; awaiting consumer-team confirmation, no code change planned. + +**G4 (E2E skip-guards)** was closed by commit `1f3d5aa` — 11 of the 12 protocol-skipped telemetry tests in `csharp/test/E2E/Telemetry/` are now enabled for both protocols. `RetryCountTests` remains skipped because `retry_count` is not wired for SEA (tracked in the gap-fix workstream). + +Lumberjack verification of `B3`, `B4`, `B6`, `D1` is pending the next driver build reaching prod + the ~1-hour ingestion lag. + +--- + +## 7. Risks + +- **Sample bias:** All 23 SEA records are from one test session in a single benchmark workspace on one day. Real-world distributions (longer queries, larger result sets, more polling, errors) may surface additional bugs not present here. +- **Test data drift:** As SEA usage grows post-v1.1.4 rollout, the `client_app_name=testhost` filter will need to be widened to capture customer traffic. Recommend a follow-up query in 2 weeks against real workspaces. +- **D1 escalation:** B2 is meaningfully worse than the original D1 framing. If `socket_timeout` is consumed by any internal SLA dashboard or alert, SEA records may be triggering false signals. +- **B4 root cause unknown:** Until we characterize the 5 missing-reader-latency records, we don't know whether B4 is a benign edge case or a systematic gap. Step 4 in the action plan should not be skipped. diff --git a/csharp/doc/sprint-plan-PECO-3022-sea-telemetry-2026-05-14.md b/csharp/doc/sprint-plan-PECO-3022-sea-telemetry-2026-05-14.md new file mode 100644 index 00000000..47491e72 --- /dev/null +++ b/csharp/doc/sprint-plan-PECO-3022-sea-telemetry-2026-05-14.md @@ -0,0 +1,228 @@ +# Sprint Plan — PECO-3022 SEA Telemetry Integration + +**Sprint window:** 2026-05-14 → 2026-05-28 (2 weeks) +**Implementer:** Jade Wang (sole) +**Design doc:** [`docs/designs/PECO-3022-sea-telemetry-integration-design.md`](../../docs/designs/PECO-3022-sea-telemetry-integration-design.md) +**Design PR:** https://github.com/adbc-drivers/databricks/pull/455 +**Jira:** [PECO-3022](https://databricks.atlassian.net/browse/PECO-3022) + +--- + +## Sprint Goal + +Ship end-to-end SEA client telemetry to production parity with Thrift — connection session events, per-statement operation events, error events, chunk metrics — verified against a real SQL warehouse. Includes the mechanical refactor of `DatabricksStatement` to consume the new observer interface so both transports use the same telemetry seam. + +### Success criteria + +- A statement executed via `adbc.databricks.protocol=rest` emits a `OssSqlDriverTelemetryLog` carrying `driver_connection_params.mode = DRIVER_MODE_SEA`, populated session id, statement id, operation latency, result format, poll count, first-batch and consumed latencies. +- Error in a SEA statement produces an `error_info` record with `error_name` populated. +- Thrift telemetry output is byte-identical to current main (regression-tested). +- SEA telemetry visible in `eng_lumberjack.prod_frontend_log_sql_driver_log` after sprint deploys. + +### Dependency + +- The gap-fix workstream's `CloudFetchDownloader → ChunkMetrics → CloudFetchReader.GetChunkMetrics()` plumbing. If it lands in-sprint, wire it in. If not, ship with `ChunkMetrics.Empty` and backfill in a follow-up. + +--- + +## Task Breakdown (7 tasks, ~11.5 person-days) + +### T1 — Refactor `ConnectionTelemetry.Create` signature (1 day) + +Replace `TSessionHandle? sessionHandle` with `string sessionId`. Add `DriverMode.Types.Type mode` parameter. Remove the hardcoded `DriverMode.Types.Type.Thrift` at `ConnectionTelemetry.cs:458` and `:642`. + +**Files touched:** +- `csharp/src/Telemetry/ConnectionTelemetry.cs` +- `csharp/src/DatabricksConnection.cs` (single Thrift call site, convert `sessionHandle.SessionId.Guid.ToString()` at boundary) + +**Acceptance criteria:** +- All existing telemetry unit tests pass unchanged. +- Thrift integration test emits `driver_connection_params.mode = DRIVER_MODE_THRIFT` (regression check). +- New unit test: `Create_AcceptsStringSessionId`, `Create_ThriftMode_SetsDriverModeThrift`, `Create_SeaMode_SetsDriverModeSea`. + +**Risks:** Low. Mechanical refactor with one Thrift caller to update. + +--- + +### T2 — Introduce `IStatementOperationObserver` + impls (2 days) + +Create the interface and three implementations per design §5.1 and §12. + +**New files:** +- `csharp/src/Telemetry/IStatementOperationObserver.cs` +- `csharp/src/Telemetry/TelemetryObserver.cs` (uses `Safe(Action)` helper pattern from design §12) +- `csharp/src/Telemetry/NullObserver.cs` (singleton) +- `csharp/src/Telemetry/SafeObserver.cs` (decorator) + +**Acceptance criteria:** +- Interface contract documented: methods MUST NOT throw, thread-safe, `OnFinalized` is terminal and idempotent. +- `TelemetryObserver` writes into a `StatementTelemetryContext`; on `OnFinalized` builds `OssSqlDriverTelemetryLog` and enqueues via `IConnectionTelemetry`. +- Unit tests per design §15: + - `NullObserver_AllMethods_AreNoOps` / `NullObserver_IsSingleton` + - `TelemetryObserver_OnExecuteStarted_PopulatesContext` + - `TelemetryObserver_OnExecuteSucceeded_RecordsStatementId` + - `TelemetryObserver_OnFinalized_EnqueuesExactlyOnce` + - `TelemetryObserver_OnFinalized_CalledTwice_EnqueuesOnce` + - `TelemetryObserver_OnError_RecordsErrorAndFinalizes` + - `TelemetryObserver_AllMethods_NeverThrow_WhenContextCorrupted` + - `TelemetryObserver_OnChunksDownloaded_MergesIntoChunkDetails` + - `SafeObserver_PropagatesNormalCallsToInner` + - `SafeObserver_SwallowsExceptionsFromInner_LogsAtTrace` + +**Risks:** Low. New code, no existing callers yet. + +--- + +### T3 — Add `SeaResultFormatMapper` (1 day) + +Static helper that maps wire `disposition` × manifest state → proto `ExecutionResult.Format`. Per design §8. + +**New files:** +- `csharp/src/StatementExecution/SeaResultFormatMapper.cs` + +**Acceptance criteria:** +- Unit tests covering all four cells in §8 table: + - `Map_InlineDisposition_ReturnsInlineArrow` + - `Map_ExternalLinksDisposition_ReturnsExternalLinks` + - `Map_AutoDisposition_WithInlineResult_ReturnsInlineArrow` + - `Map_AutoDisposition_WithExternalLinks_ReturnsExternalLinks` + +**Risks:** Low. Isolated pure-function helper. + +--- + +### T4 — Refactor `DatabricksStatement` to use observer (1 day) + +Mechanical refactor: replace the private telemetry methods (`CreateTelemetryContext`, `CreateMetadataTelemetryContext`, `RecordSuccess`, `RecordError`, `EmitTelemetry`) with `_observer: IStatementOperationObserver` field calls. Behavior unchanged. + +**Files touched:** +- `csharp/src/DatabricksStatement.cs` + +**Acceptance criteria:** +- All existing Thrift telemetry unit tests pass unchanged. +- Manual diff check: byte-equivalent `OssSqlDriverTelemetryLog` for a known statement before/after the refactor. +- `((DatabricksConnection)Connection).TelemetrySession` cast eliminated; observer is injected at statement construction from `DatabricksConnection.CreateStatement()`. + +**Risks:** Medium. The refactor is mechanical but the existing Thrift test suite is the safety net. Allocate buffer time for any subtle behavior differences (e.g. `PendingTelemetryContext` exposure used by external callers). + +**Depends on:** T1 (Create signature), T2 (observer types). + +--- + +### T5 — Wire telemetry into `StatementExecutionConnection` (1.5 days) — **DONE 2026-05-14** + +Mirror the Thrift pattern at `DatabricksConnection.cs:594-724`. Add `_telemetry: IConnectionTelemetry` field. Call `ConnectionTelemetry.Create(...)` in `OpenAsync` after `CreateSessionAsync` succeeds, emit `CREATE_SESSION` event, then on `Dispose` emit `DELETE_SESSION` and run `DisposeAsync` with 5-second timeout. + +**Files touched:** +- `csharp/src/StatementExecution/StatementExecutionConnection.cs` (modified) +- `csharp/test/Unit/StatementExecution/StatementExecutionConnectionTelemetryTests.cs` (new) + +**Acceptance criteria:** +- ✅ `OpenAsync` succeeds even if telemetry initialization throws (telemetry is fail-open; falls back to `NoOpConnectionTelemetry`). +- ✅ `Dispose` completes within 5 seconds even if exporter is wedged (test `Dispose_FlushHangs_CompletesWithin5Seconds`). +- ⏭ Observer creation in `CreateStatement()` deferred to **T6** (this task only exposes `TelemetrySession` accessor so the SEA statement can pull the session for observer construction). +- 🟡 Manual test (lumberjack verification): pending sprint demo against real warehouse. + +**Implementation notes:** +- `IConnectionTelemetry.DisposeAsync` returns `Task` (not `ValueTask`), so the call is `_telemetry.DisposeAsync().Wait(TimeSpan.FromSeconds(5))`. The sprint description's `.AsTask().Wait(...)` was based on a different return type assumption — the simpler form is equivalent. +- `CreateHttpClient` was switched from `HttpHandlerFactory.CreateHandlers` to `CreateHandlersWithTokenProvider` so the OAuth `_oauthTokenProvider` can be captured and threaded into `ConnectionTelemetry.Create` (matches the Thrift path's token-caching behavior). +- Added test seam `TelemetryForTesting` (settable property) mirroring `DatabricksConnection.TelemetryForTesting` so unit tests can inject a `RecordingTelemetry` fake without driving a real CreateSession RPC. Direct unit testing of `OpenAsync` end-to-end requires a fake `IStatementExecutionClient`, which is out-of-scope for T5; the public wiring contract is verified by exercising `EmitCreateSessionTelemetry` / `EmitDeleteSessionTelemetry` directly, identical to the Thrift `DriverTelemetryWiringTests` approach. + +**Risks:** Medium. New telemetry surface on a class that has never had it. Watch for null-handling around `_telemetry` and `Session`. + +**Depends on:** T1 (Create signature). + +--- + +### T6 — Wire telemetry into `StatementExecutionStatement` (3 days) + +**Setup portion DONE 2026-05-14**: `_observer: IStatementOperationObserver` field added to `StatementExecutionStatement` (defaulted to `NullObserver.Instance`), constructor takes an optional `IStatementOperationObserver?` parameter, and `StatementExecutionConnection.CreateStatement` constructs a `TelemetryObserver` bound to `TelemetrySession` when telemetry is enabled (and `session.TelemetryClient != null`), or falls back to `NullObserver.Instance` otherwise. No hookpoint calls wired yet — those land in the subsequent commits below. Unit tests: `StatementExecutionStatementObserverInjectionTests` (6 tests covering field shape, default coercion, telemetry-enabled/disabled branches, and per-statement freshness). + +The meatiest task. Add `_observer: IStatementOperationObserver` field (defaults to `NullObserver.Instance`, set by `StatementExecutionConnection.CreateStatement`). Call observer methods at all 7 hookpoints per design §6: + +1. `OnExecuteStarted` — `ExecuteQueryInternalAsync` before `_client.ExecuteStatementAsync` (line 345) +2. `OnExecuteSucceeded` — after response received, using `SeaResultFormatMapper` +3. `OnPollCompleted` — in `PollUntilCompleteAsync` (line 453), accumulate count/ms across the loop, emit once on terminal state +4. `OnFirstBatchReady` — at `CreateCloudFetchReader` (line 542) and `InlineArrowStreamReader` construction (nested at line 900) +5. `OnConsumed` + `OnChunksDownloaded` — at reader Dispose +6. `OnError` — `ExecuteQueryInternalAsync` catch block +7. `OnFinalized` — `Dispose` (line 817) + +**Files touched:** +- `csharp/src/StatementExecution/StatementExecutionStatement.cs` + +**Acceptance criteria:** +- Manual test: execute a SELECT via REST, verify a telemetry record arrives with `statement_id`, `result_format`, `operation_latency_ms`, `poll_count`, `result_set_ready_latency_millis`, `result_set_consumption_latency_millis` populated. +- Manual test: execute a bad SQL via REST, verify `error_info.error_name` populated. +- `OnFinalized` exactly-once even when both error path and dispose path fire. +- `ChunkMetrics`: wire to `OnChunksDownloaded` if gap-fix plumbing is available, else pass `ChunkMetrics.Empty`. + +**Risks:** Medium-high. Largest scope; touches Execute, Poll loop, both reader construction paths, Dispose, and error catch. Highest chance of edge-case regressions. + +**Depends on:** T1, T2, T3. + +--- + +### T7 — SEA integration tests against real SQL warehouse (2 days) + +Mirror the Thrift integration test set per design §15. + +**New files:** +- `csharp/test/E2E/Telemetry/SeaTelemetryIntegrationTests.cs` (or similar) + +**Test cases:** +- `Sea_ExecuteQuery_EmitsTelemetryWithStatementId` +- `Sea_ExecuteQuery_WithSyntaxError_EmitsErrorTelemetry` +- `Sea_ExecuteQuery_CloudFetch_RecordsChunkMetrics` (skipped if gap-fix plumbing not present) +- `Sea_ExecuteQuery_InlineResults_RecordsInlineFormat` +- `Sea_OpenConnection_EmitsCreateSession` +- `Sea_CloseConnection_EmitsDeleteSessionAndFlushes` +- `Sea_TelemetryDisabledByFeatureFlag_EmitsZeroEvents` +- `Sea_TelemetryDisabledByProperty_EmitsZeroEvents` +- `Sea_TelemetryExporterFails_DoesNotAffectQueryExecution` +- `Sea_TelemetryRecord_HasDriverModeSea` +- `Sea_ConcurrentStatements_EachEmitsExactlyOneRecord` + +**Acceptance criteria:** +- All tests pass against a dev/staging Databricks SQL warehouse. +- Test infrastructure verifies records via either a local capture exporter or by querying `eng_lumberjack.prod_frontend_log_sql_driver_log` after a settling delay. + +**Risks:** Medium. Real-warehouse tests are slow and flaky; allocate time for retry/stabilization. + +**Depends on:** T5, T6. + +--- + +## Sequencing + +``` +Week 1 (Mon-Fri): T1 → T2 → T3 → T4 + (T2 and T3 parallelizable if context allows) + +Week 2 (Mon-Fri): T5 → T6 → T7 + (T5 in parallel with start of T6 if discipline holds) +``` + +**Critical path:** T1 → T6 → T7 (≈6 days). +**Slack:** ~1.5 days for review iteration, unexpected edge cases, gap-fix integration if it lands. + +--- + +## Definition of Done + +- All 7 tasks merged to `main`. +- Design PR (#455) approved and merged. +- SEA telemetry records visible in `eng_lumberjack.prod_frontend_log_sql_driver_log` via the [client-telemetry-query](https://databricks.atlassian.net/) skill. +- Thrift telemetry regression test green. +- Sprint demo: show side-by-side Thrift vs SEA telemetry records for the same query. + +--- + +## Risks and Mitigations + +| Risk | Likelihood | Mitigation | +|---|---|---| +| Gap-fix `ChunkMetrics` plumbing slips | Medium | Ship with `ChunkMetrics.Empty`; backfill in follow-up sprint | +| `DatabricksStatement` refactor (T4) hits subtle regression | Medium | Cross-transport byte-identical regression test in T1, dry-run in T4 | +| SEA integration tests flaky in CI | Medium | Tag as `[Trait("Category", "Integration")]`; run on-demand initially | +| Sprint overflow (11.5d est in 10d capacity) | High | T7 can slip to follow-up sprint if T5/T6 take longer than estimated; foundation is the priority | diff --git a/csharp/src/DatabricksConnection.cs b/csharp/src/DatabricksConnection.cs index 5ec2ecbf..d353c2d0 100644 --- a/csharp/src/DatabricksConnection.cs +++ b/csharp/src/DatabricksConnection.cs @@ -533,8 +533,16 @@ internal override IArrowArrayStream NewReader(T statement, Schema schema, IRe public override AdbcStatement CreateStatement() { - DatabricksStatement statement = new DatabricksStatement(this); - return statement; + // Inject the per-statement observer at construction so DatabricksStatement is + // not coupled to ((DatabricksConnection)Connection).TelemetrySession at runtime. + // When telemetry is disabled (or the session has no live client), we hand the + // statement a NullObserver so every observer hook call is a no-op without + // requiring null-checks at the callsite. + TelemetrySessionContext? session = TelemetrySession; + IStatementOperationObserver observer = session?.TelemetryClient != null + ? (IStatementOperationObserver)new TelemetryObserver(session) + : NullObserver.Instance; + return new DatabricksStatement(this, observer); } protected override TOpenSessionReq CreateSessionRequest() @@ -727,12 +735,36 @@ internal IConnectionTelemetry TelemetryForTesting /// private void InitializeTelemetry(Activity? activity = null) { + // Convert TSessionHandle -> string at the transport boundary so + // ConnectionTelemetry.Create stays transport-agnostic. SEA will pass its + // server-assigned session id string directly. + // + // Wrap the byte[] -> Guid conversion locally: `new Guid(byte[])` throws + // ArgumentException on a wrong-length array, and that would propagate to + // connection-open. Pre-refactor the same conversion was inside + // ConnectionTelemetry.Create's outer try/catch, so a malformed session GUID + // degraded to NoOp telemetry — preserve that fail-open contract. + string sessionId = string.Empty; + try + { + if (SessionHandle?.SessionId?.Guid != null) + { + sessionId = new Guid(SessionHandle.SessionId.Guid).ToString(); + } + } + catch + { + // Intentionally swallowed: malformed session GUID disables telemetry, + // not the connection. + } + _telemetry = Telemetry.ConnectionTelemetry.Create( properties: Properties, host: GetHost(), assemblyVersion: s_assemblyVersion, oauthTokenProvider: _oauthTokenProvider, - sessionHandle: SessionHandle, + sessionId: sessionId, + mode: Telemetry.Proto.DriverMode.Types.Type.Thrift, enableDirectResults: _enableDirectResults, useDescTableExtended: _useDescTableExtended, connectTimeoutMilliseconds: ConnectTimeoutMilliseconds, diff --git a/csharp/src/DatabricksStatement.cs b/csharp/src/DatabricksStatement.cs index a5313de8..1850f2e2 100644 --- a/csharp/src/DatabricksStatement.cs +++ b/csharp/src/DatabricksStatement.cs @@ -31,10 +31,9 @@ using AdbcDrivers.Databricks.Reader.CloudFetch; using AdbcDrivers.Databricks.Result; using AdbcDrivers.Databricks.Telemetry; -using AdbcDrivers.Databricks.Telemetry.Models; -using AdbcDrivers.Databricks.Telemetry.Proto; using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; using Apache.Arrow; using Apache.Arrow.Adbc; using AdbcDrivers.HiveServer2; @@ -71,18 +70,47 @@ internal class DatabricksStatement : SparkStatement, IHiveServer2Statement internal bool IsInternalCall { get; set; } // Marks if this is a driver-internal operation (e.g., USE SCHEMA) /// - /// Telemetry context for the current statement execution, pending emission on Dispose. - /// Set before calling base.ExecuteQueryAsync()/ExecuteQuery() so that - /// can forward it to the - /// composite reader and operation status poller. The poller increments - /// and accumulates - /// on each - /// GetOperationStatus call so the emitted telemetry log carries the - /// n_operation_status_calls and operation_status_latency_millis - /// fields (PECO-2992). After a successful execute the same instance remains - /// here until is invoked from Dispose. + /// Observer for this statement's operational lifecycle. Injected at construction + /// from — a + /// bound to the connection's + /// when telemetry is enabled, otherwise + /// . All hookpoints in this class flow through + /// the observer interface so the SEA path can reuse the exact same statement code + /// with a different observer implementation. /// - internal StatementTelemetryContext? PendingTelemetryContext { get; private set; } + private readonly IStatementOperationObserver _observer; + + /// + /// Tracks whether has fired. Gates the + /// terminal call in + /// : when a statement is disposed without ever being + /// executed, the observer must not enqueue an empty execute-statement log. + /// This preserves byte-identical behavior with the pre-refactor implementation, + /// which only emitted execute telemetry when CreateTelemetryContext had + /// produced a non-null PendingTelemetryContext at execute time. + /// + private bool _executeStarted; + + /// + /// Tracks whether the execute path has already finalized telemetry for this + /// statement (true after the error path drained it). Lets + /// avoid re-invoking the success-path inspection logic in that case. + /// is itself idempotent, so + /// this flag is only an optimization for the chunk-metrics / reader-inspection + /// path; it is not required for correctness. + /// + private bool _executeFinalized; + + /// + /// Telemetry context for the current statement execution. Exposes the observer's + /// underlying when one exists so the + /// reader and operation status poller can mutate poll-count and poll-latency + /// fields directly (PECO-2992). Returns null when telemetry is disabled + /// () so the reader code can skip its telemetry + /// branches without a separate feature flag check. + /// + internal StatementTelemetryContext? PendingTelemetryContext => + (_observer as TelemetryObserver)?.Context; // Stopwatch covering the lifetime of this statement, started at construction. // Used to scope CANCEL_STATEMENT timing within the statement's lifetime. @@ -98,8 +126,14 @@ internal class DatabricksStatement : SparkStatement, IHiveServer2Statement public override long BatchSize { get; protected set; } = DatabricksBatchSizeDefault; public DatabricksStatement(DatabricksConnection connection) + : this(connection, ResolveDefaultObserver(connection)) + { + } + + internal DatabricksStatement(DatabricksConnection connection, IStatementOperationObserver observer) : base(connection) { + _observer = observer ?? NullObserver.Instance; // set the catalog name for legacy compatibility // TODO: use catalog and schema fields in hiveserver2 connection instead of DefaultNamespace so we don't need to cast var defaultNamespace = ((DatabricksConnection)Connection).DefaultNamespace; @@ -129,21 +163,23 @@ public DatabricksStatement(DatabricksConnection connection) } } - private StatementTelemetryContext? CreateTelemetryContext(Telemetry.Proto.Statement.Types.Type statementType) + /// + /// Resolves the default for a statement + /// constructed without an explicit observer (e.g. the connection-side + /// / ApplyServerSidePropertiesAsync + /// helpers that use new DatabricksStatement(this) directly). Returns a + /// bound to the connection's session when telemetry + /// is enabled and the session has a live telemetry client, otherwise + /// . This is the same decision + /// makes when it injects an + /// observer explicitly, so direct-construction callers get identical behavior. + /// + private static IStatementOperationObserver ResolveDefaultObserver(DatabricksConnection connection) { - var session = ((DatabricksConnection)Connection).TelemetrySession; - if (session?.TelemetryClient == null) return null; - - var ctx = new StatementTelemetryContext(session); - ctx.OperationType = OperationType.ExecuteStatement; - ctx.StatementType = statementType; - // IsCompressed and ResultFormat are populated from the actual result stream in - // EmitTelemetry, not the connection-level capability flags. Defaults here cover error - // and unconsumed paths (PECO-2988, PECO-2978). - ctx.IsCompressed = false; - ctx.ResultFormat = ExecutionResultFormat.InlineArrow; - ctx.IsInternalCall = IsInternalCall; - return ctx; + TelemetrySessionContext? session = connection.TelemetrySession; + return session?.TelemetryClient != null + ? (IStatementOperationObserver)new TelemetryObserver(session) + : NullObserver.Instance; } /// @@ -165,37 +201,70 @@ public DatabricksStatement(DatabricksConnection connection) }; } - private StatementTelemetryContext? CreateMetadataTelemetryContext() + /// + /// Begins an execute-time observer hook. Replaces the old + /// CreateTelemetryContext / CreateMetadataTelemetryContext helpers: + /// signals the observer with + /// and stamps the per-statement with the + /// defaults the legacy helpers used (InlineArrow result format, IsInternalCall). + /// The reader inspection at still overrides + /// IsCompressed and ResultFormat based on the active reader (PECO-2988, PECO-2978). + /// + private void BeginExecuteTelemetry(StatementType statementType, OperationType operationType) { - var session = ((DatabricksConnection)Connection).TelemetrySession; - if (session?.TelemetryClient == null) return null; - - var operationType = GetMetadataOperationType(SqlQuery) ?? OperationType.Unspecified; - - var ctx = new StatementTelemetryContext(session); - ctx.OperationType = operationType; - ctx.StatementType = Telemetry.Proto.Statement.Types.Type.Metadata; - ctx.ResultFormat = ExecutionResultFormat.InlineArrow; - ctx.IsCompressed = false; - ctx.IsInternalCall = IsInternalCall; - return ctx; + _executeStarted = true; + // Placeholder isCompressed=false matches the old CreateTelemetryContext default; + // the actual value is sourced from the reader at finalize time. + _observer.OnExecuteStarted(statementType, operationType, isCompressed: false); + StatementTelemetryContext? ctx = PendingTelemetryContext; + if (ctx != null) + { + ctx.ResultFormat = ExecutionResultFormat.InlineArrow; + ctx.IsInternalCall = IsInternalCall; + } } - private void RecordSuccess(StatementTelemetryContext ctx) + /// + /// Records a successful execute on the observer. Replaces the legacy + /// RecordSuccess helper. Marks first-batch-ready, captures HTTP retry count + /// from , then signals the observer with + /// . ResultFormat is + /// passed as InlineArrow here and overridden at finalize time based on the active + /// reader (PECO-2978) so this hook does not mislabel inline results when CloudFetch + /// is enabled on the connection. + /// + private void RecordExecuteSuccess() { - ctx.RecordFirstBatchReady(); - // ResultFormat is populated from the actual active reader in EmitTelemetry (PECO-2978); - // setting it here from the connection-level useCloudFetch flag mislabels inline results - // as EXTERNAL_LINKS whenever CloudFetch is enabled on the connection. - ctx.StatementId = StatementId; - CaptureRetryCount(ctx); + StatementTelemetryContext? ctx = PendingTelemetryContext; + if (ctx != null) + { + _observer.OnFirstBatchReady(ctx.ExecuteStopwatch.ElapsedMilliseconds); + CaptureRetryCount(ctx); + } + _observer.OnExecuteSucceeded(StatementId ?? string.Empty, ExecutionResultFormat.InlineArrow); + } + + /// + /// Records a failed execute on the observer. Replaces the legacy + /// RecordError helper. Captures HTTP retry count before signalling + /// ; ordering matches the + /// previous helper. + /// + private void RecordExecuteError(Exception ex) + { + StatementTelemetryContext? ctx = PendingTelemetryContext; + if (ctx != null) + { + CaptureRetryCount(ctx); + } + _observer.OnError(ex); } - private void CaptureRetryCount(StatementTelemetryContext ctx) + private static void CaptureRetryCount(StatementTelemetryContext ctx) { if (Activity.Current != null) { - var retryCountTag = Activity.Current.GetTagItem("http.retry.total_attempts"); + object? retryCountTag = Activity.Current.GetTagItem("http.retry.total_attempts"); if (retryCountTag is int retryCount) { ctx.RetryCount = retryCount; @@ -203,190 +272,174 @@ private void CaptureRetryCount(StatementTelemetryContext ctx) } } - private void RecordError(StatementTelemetryContext ctx, Exception ex) - { - ctx.HasError = true; - ctx.ErrorName = ex.GetType().Name; - ctx.ErrorMessage = ex.Message; - CaptureRetryCount(ctx); - } - public override QueryResult ExecuteQuery() { - var ctx = IsMetadataCommand - ? CreateMetadataTelemetryContext() - : CreateTelemetryContext(Telemetry.Proto.Statement.Types.Type.Query); - if (ctx == null) return base.ExecuteQuery(); + if (IsMetadataCommand) + { + BeginExecuteTelemetry( + StatementType.Metadata, + GetMetadataOperationType(SqlQuery) ?? OperationType.Unspecified); + } + else + { + BeginExecuteTelemetry(StatementType.Query, OperationType.ExecuteStatement); + } - // Expose ctx to NewReader so the operation status poller can update PollCount/PollLatencyMs (PECO-2992). - PendingTelemetryContext = ctx; try { QueryResult result = base.ExecuteQuery(); _lastQueryResult = result; // Store for telemetry - RecordSuccess(ctx); + RecordExecuteSuccess(); return result; } catch (Exception ex) { - RecordError(ctx, ex); - // Emit telemetry immediately on error (won't reach Dispose) - EmitTelemetry(ctx); - PendingTelemetryContext = null; // Clear to avoid double emission + RecordExecuteError(ex); + // Finalize telemetry immediately on error (won't reach Dispose's success-path emission). + FinalizeExecuteTelemetry(); throw; } } public override async ValueTask ExecuteQueryAsync() { - var ctx = IsMetadataCommand - ? CreateMetadataTelemetryContext() - : CreateTelemetryContext(Telemetry.Proto.Statement.Types.Type.Query); - if (ctx == null) return await base.ExecuteQueryAsync(); + if (IsMetadataCommand) + { + BeginExecuteTelemetry( + StatementType.Metadata, + GetMetadataOperationType(SqlQuery) ?? OperationType.Unspecified); + } + else + { + BeginExecuteTelemetry(StatementType.Query, OperationType.ExecuteStatement); + } - // Expose ctx to NewReader so the operation status poller can update PollCount/PollLatencyMs (PECO-2992). - PendingTelemetryContext = ctx; try { QueryResult result = await base.ExecuteQueryAsync(); _lastQueryResult = result; // Store for telemetry - RecordSuccess(ctx); + RecordExecuteSuccess(); return result; } catch (Exception ex) { - RecordError(ctx, ex); - // Emit telemetry immediately on error (won't reach Dispose) - EmitTelemetry(ctx); - PendingTelemetryContext = null; // Clear to avoid double emission + RecordExecuteError(ex); + FinalizeExecuteTelemetry(); throw; } } public override UpdateResult ExecuteUpdate() { - var ctx = CreateTelemetryContext(Telemetry.Proto.Statement.Types.Type.Update); - if (ctx == null) return base.ExecuteUpdate(); + BeginExecuteTelemetry(StatementType.Update, OperationType.ExecuteStatement); - PendingTelemetryContext = ctx; try { UpdateResult result = base.ExecuteUpdate(); - RecordSuccess(ctx); + RecordExecuteSuccess(); return result; } catch (Exception ex) { - RecordError(ctx, ex); - // Emit telemetry immediately on error (won't reach Dispose) - EmitTelemetry(ctx); - PendingTelemetryContext = null; // Clear to avoid double emission + RecordExecuteError(ex); + FinalizeExecuteTelemetry(); throw; } } public override async Task ExecuteUpdateAsync() { - var ctx = CreateTelemetryContext(Telemetry.Proto.Statement.Types.Type.Update); - if (ctx == null) return await base.ExecuteUpdateAsync(); + BeginExecuteTelemetry(StatementType.Update, OperationType.ExecuteStatement); - PendingTelemetryContext = ctx; try { UpdateResult result = await base.ExecuteUpdateAsync(); - RecordSuccess(ctx); + RecordExecuteSuccess(); return result; } catch (Exception ex) { - RecordError(ctx, ex); - // Emit telemetry immediately on error (won't reach Dispose) - EmitTelemetry(ctx); - PendingTelemetryContext = null; // Clear to avoid double emission + RecordExecuteError(ex); + FinalizeExecuteTelemetry(); throw; } } - private void EmitTelemetry(StatementTelemetryContext ctx) + /// + /// Replaces the legacy EmitTelemetry helper. Inspects the active reader for + /// chunk metrics, IsCompressed, and ResultFormat (preserving the PECO-2988 / PECO-2978 + /// behavior of sourcing these from the actual reader rather than connection-level + /// capability flags), then signals the observer with + /// , + /// , and finally + /// . OnFinalized is the + /// terminal call — both and + /// guarantee idempotency, so this method may be invoked from both the error path + /// (in Execute*) and the success path (in ). + /// + private void FinalizeExecuteTelemetry() { - try + if (_executeFinalized) { - ctx.RecordResultsConsumed(); + return; + } + _executeFinalized = true; - // Extract chunk metrics if this was a CloudFetch query. - // Check for both CloudFetchReader (direct) and DatabricksCompositeReader (wrapped). - ChunkMetrics? metrics = null; - if (_lastQueryResult?.Stream is CloudFetchReader cfReader) + StatementTelemetryContext? ctx = PendingTelemetryContext; + + // Extract chunk metrics if this was a CloudFetch query. Check for both + // CloudFetchReader (direct) and DatabricksCompositeReader (wrapped). In the error + // path _lastQueryResult is null and this whole block is a no-op. + ChunkMetrics? metrics = null; + if (_lastQueryResult?.Stream is CloudFetchReader cfReader) + { + try { - try - { - metrics = cfReader.GetChunkMetrics(); - } - catch - { - // Ignore errors retrieving chunk metrics - telemetry must not fail driver operations - } + metrics = cfReader.GetChunkMetrics(); } - else if (_lastQueryResult?.Stream is DatabricksCompositeReader compositeReader) + catch { - try - { - metrics = compositeReader.GetChunkMetrics(); - } - catch - { - // Ignore errors retrieving chunk metrics - telemetry must not fail driver operations - } + // Ignore errors retrieving chunk metrics - telemetry must not fail driver operations. } - - // Source IsCompressed and ResultFormat from the active reader, not connection-level - // capability flags. The composite reader holds the server-reported truth for both: - // IsLz4Compressed mirrors TGetResultSetMetadataResp.Lz4Compressed (drives both inline - // and CloudFetch decompression), and IsCloudFetchActive reflects whether the server - // returned result links for this statement (PECO-2988, PECO-2978). - if (_lastQueryResult?.Stream is DatabricksCompositeReader composite) + } + else if (_lastQueryResult?.Stream is DatabricksCompositeReader compositeReader) + { + try { - ctx.IsCompressed = composite.IsLz4Compressed; - ctx.ResultFormat = composite.IsCloudFetchActive - ? ExecutionResultFormat.ExternalLinks - : ExecutionResultFormat.InlineArrow; + metrics = compositeReader.GetChunkMetrics(); } - - // Set chunk details if we have metrics and at least one chunk was iterated - // (avoids leaking the -1 sentinel from InitialChunkLatencyMs when no chunks were downloaded) - if (metrics != null && metrics.TotalChunksIterated > 0) + catch { - ctx.SetChunkDetails( - metrics.TotalChunksPresent, - metrics.TotalChunksIterated, - metrics.InitialChunkLatencyMs, - metrics.SlowestChunkLatencyMs, - metrics.SumChunksDownloadTimeMs); + // Ignore errors retrieving chunk metrics - telemetry must not fail driver operations. } + } - OssSqlDriverTelemetryLog telemetryLog = ctx.BuildTelemetryLog(); - - var frontendLog = new TelemetryFrontendLog - { - WorkspaceId = ctx.WorkspaceId, - FrontendLogEventId = Guid.NewGuid().ToString(), - Context = new FrontendLogContext - { - TimestampMillis = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds(), - }, - Entry = new FrontendLogEntry - { - SqlDriverLog = telemetryLog - } - }; - - var session = ((DatabricksConnection)Connection).TelemetrySession; - session?.TelemetryClient?.Enqueue(frontendLog); + // Source IsCompressed and ResultFormat from the active reader, not connection-level + // capability flags. The composite reader holds the server-reported truth for both: + // IsLz4Compressed mirrors TGetResultSetMetadataResp.Lz4Compressed (drives both inline + // and CloudFetch decompression), and IsCloudFetchActive reflects whether the server + // returned result links for this statement (PECO-2988, PECO-2978). The observer's + // OnReaderInspected hook overwrites the same fields the legacy EmitTelemetry helper + // mutated immediately before BuildTelemetryLog, preserving byte-identical output + // without leaking the observer's internal StatementTelemetryContext to the statement. + if (_lastQueryResult?.Stream is DatabricksCompositeReader composite) + { + _observer.OnReaderInspected( + resultFormat: composite.IsCloudFetchActive + ? ExecutionResultFormat.ExternalLinks + : ExecutionResultFormat.InlineArrow, + isCompressed: composite.IsLz4Compressed); } - catch + + // Only emit chunk metrics if at least one chunk was iterated; avoids leaking the + // -1 sentinel from InitialChunkLatencyMs when no chunks were downloaded. + if (metrics != null && metrics.TotalChunksIterated > 0) { - // Telemetry must never impact driver operations + _observer.OnChunksDownloaded(metrics); } + + _observer.OnConsumed(ctx?.ExecuteStopwatch.ElapsedMilliseconds ?? 0); + _observer.OnFinalized(); } /// @@ -1300,11 +1353,15 @@ protected override void Dispose(bool disposing) { if (disposing) { - if (PendingTelemetryContext != null) + // Finalize the execute observer now that results have been consumed. This + // replaces the legacy EmitTelemetry call. Gated on _executeStarted so a + // statement disposed without execute does not enqueue a stray empty log + // through the observer (preserves byte-identical behavior with the prior + // PendingTelemetryContext!=null gate). FinalizeExecuteTelemetry itself is + // idempotent — the error path inside Execute* may have already finalized. + if (_executeStarted) { - // Emit telemetry now that results have been consumed - EmitTelemetry(PendingTelemetryContext); - PendingTelemetryContext = null; + FinalizeExecuteTelemetry(); } // Emit a CLOSE_STATEMENT telemetry event once per statement disposal, @@ -1316,22 +1373,21 @@ protected override void Dispose(bool disposing) // or the operation was already closed by direct-results), elapsedMs is 0 // and the event acts purely as a lifecycle marker. // _closeStatementTelemetryEmitted makes the emission idempotent across - // repeated Dispose() calls (PECO-2991). + // repeated Dispose() calls (PECO-2991). The connection-side telemetry + // implementation (NoOpConnectionTelemetry when disabled) makes the + // EmitStatementOperationTelemetry call itself a no-op, so no explicit + // TelemetrySession null-check is needed here. try { if (!_closeStatementTelemetryEmitted) { - var session = ((DatabricksConnection)Connection).TelemetrySession; - if (session?.TelemetryClient != null) - { - _closeStatementTelemetryEmitted = true; - ((DatabricksConnection)Connection).EmitStatementOperationTelemetry( - OperationType.CloseStatement, - Telemetry.Proto.Statement.Types.Type.Unspecified, - StatementId, - elapsedMs: CloseStatementRpcLatencyMs ?? 0, - error: CloseStatementRpcError); - } + _closeStatementTelemetryEmitted = true; + ((DatabricksConnection)Connection).EmitStatementOperationTelemetry( + OperationType.CloseStatement, + StatementType.Unspecified, + StatementId, + elapsedMs: CloseStatementRpcLatencyMs ?? 0, + error: CloseStatementRpcError); } } catch (Exception ex) @@ -1357,6 +1413,10 @@ public override void Cancel() // signals the cancellation token source and is idempotent across repeats // (PECO-2991). Each invocation emits its own event so analysts can see // duplicate user-initiated cancels. + // + // The connection-side telemetry implementation (NoOpConnectionTelemetry when + // disabled) makes the EmitStatementOperationTelemetry call itself a no-op, + // so no explicit TelemetrySession null-check is needed here. Exception? error = null; long startMs = _statementLifetimeStopwatch.ElapsedMilliseconds; try @@ -1372,17 +1432,13 @@ public override void Cancel() { try { - var session = ((DatabricksConnection)Connection).TelemetrySession; - if (session?.TelemetryClient != null) - { - long elapsedMs = _statementLifetimeStopwatch.ElapsedMilliseconds - startMs; - ((DatabricksConnection)Connection).EmitStatementOperationTelemetry( - OperationType.CancelStatement, - Telemetry.Proto.Statement.Types.Type.Unspecified, - StatementId, - elapsedMs, - error); - } + long elapsedMs = _statementLifetimeStopwatch.ElapsedMilliseconds - startMs; + ((DatabricksConnection)Connection).EmitStatementOperationTelemetry( + OperationType.CancelStatement, + StatementType.Unspecified, + StatementId, + elapsedMs, + error); } catch (Exception ex) { diff --git a/csharp/src/StatementExecution/SeaMetadataOperationMapper.cs b/csharp/src/StatementExecution/SeaMetadataOperationMapper.cs new file mode 100644 index 00000000..65ce216f --- /dev/null +++ b/csharp/src/StatementExecution/SeaMetadataOperationMapper.cs @@ -0,0 +1,63 @@ +/* + * Copyright (c) 2025 ADBC Drivers Contributors + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; + +namespace AdbcDrivers.Databricks.StatementExecution +{ + /// + /// Maps a SEA metadata-command keyword (the ApacheParameters.IsMetadataCommand SqlQuery + /// value, e.g. "getcatalogs") to the corresponding telemetry-proto + /// (e.g. ). + /// + /// + /// Mirrors the Thrift-side helper DatabricksStatement.GetMetadataOperationType + /// so SEA emits the same StatementType.Metadata + OperationType.List* + /// pairs the Thrift path produces — see PECO-3022 gap B8. Used by + /// when routing + /// a metadata command into StatementExecutionConnection.ExecuteMetadataSqlAsync, + /// and by the connection-level metadata data-provider methods that drive + /// GetObjects / GetTableTypes. + /// + /// + /// Pure-function; safe to call from any thread. + /// + internal static class SeaMetadataOperationMapper + { + /// + /// Returns the telemetry for a metadata command keyword, + /// or null when is not a recognized metadata + /// command (in which case callers should fall back to the regular query operation + /// type, e.g. ). + /// + /// The statement's SqlQuery when + /// ApacheParameters.IsMetadataCommand=true; case-insensitive. + public static OperationType? Map(string? sqlQuery) + { + return sqlQuery?.ToLowerInvariant() switch + { + "getcatalogs" => OperationType.ListCatalogs, + "getschemas" => OperationType.ListSchemas, + "gettables" => OperationType.ListTables, + "getcolumns" or "getcolumnsextended" => OperationType.ListColumns, + "gettabletypes" => OperationType.ListTableTypes, + "getprimarykeys" => OperationType.ListPrimaryKeys, + "getcrossreference" => OperationType.ListCrossReferences, + _ => null + }; + } + } +} diff --git a/csharp/src/StatementExecution/SeaResultFormatMapper.cs b/csharp/src/StatementExecution/SeaResultFormatMapper.cs new file mode 100644 index 00000000..e781d408 --- /dev/null +++ b/csharp/src/StatementExecution/SeaResultFormatMapper.cs @@ -0,0 +1,156 @@ +/* + * Copyright (c) 2025 ADBC Drivers Contributors + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +using System; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; + +namespace AdbcDrivers.Databricks.StatementExecution +{ + /// + /// Maps the SEA wire-level (disposition, format) request pair plus the observed response + /// shape into the typed proto enum used by telemetry. + /// + /// + /// SEA does not expose a typed result-format field on the statement — it uses string + /// disposition + format on the request and reports back inline data or external-link + /// references in the manifest/result. The telemetry proto, however, expects one of + /// INLINE_ARROW, EXTERNAL_LINKS, etc. This helper applies the four-cell + /// table from PECO-3022-sea-telemetry-integration-design.md §8: + /// + /// + /// + /// INLINE + ARROW_STREAMINLINE_ARROW + /// EXTERNAL_LINKS + ARROW_STREAMEXTERNAL_LINKS + /// INLINE_OR_EXTERNAL_LINKS + ARROW_STREAM, external links populated + /// in manifest/result → EXTERNAL_LINKS + /// INLINE_OR_EXTERNAL_LINKS + ARROW_STREAM, no external links → + /// INLINE_ARROW + /// + /// + /// + /// Non-ARROW_STREAM formats (e.g. JSON_ARRAY, CSV) are not represented in the + /// proto enum and map to . Unknown disposition + /// strings also map to rather than guessing. + /// + /// + /// + /// Pure-function: no side effects, no allocations beyond the enum return value. Safe to call + /// once at OnExecuteSucceeded time without peeking into the reader. + /// + /// + internal static class SeaResultFormatMapper + { + private const string DispositionInline = "INLINE"; + private const string DispositionExternalLinks = "EXTERNAL_LINKS"; + private const string DispositionInlineOrExternalLinks = "INLINE_OR_EXTERNAL_LINKS"; + private const string FormatArrowStream = "ARROW_STREAM"; + + /// + /// Maps a SEA request's (disposition, format) and the observed response shape to a + /// telemetry-proto . See class docs for the table. + /// + /// Request disposition string (case-insensitive). + /// Request format string (case-insensitive). + /// Response from ExecuteStatementAsync; may have empty + /// or null Manifest/Result (e.g. PENDING state), in which case the + /// caller may receive for the + /// auto-disposition cells where the result shape is not yet known. + /// The mapped proto enum; for + /// unknown disposition, non-ARROW_STREAM format, or auto-disposition with no + /// manifest/result data yet. + public static ExecutionResultFormat Map( + string? disposition, + string? format, + ExecuteStatementResponse? response) + { + // The §8 mapping table covers ARROW_STREAM only — the proto enum has no entries + // for JSON_ARRAY or CSV, so anything else falls through to Unspecified. + if (!string.Equals(format, FormatArrowStream, StringComparison.OrdinalIgnoreCase)) + { + return ExecutionResultFormat.Unspecified; + } + + if (string.Equals(disposition, DispositionInline, StringComparison.OrdinalIgnoreCase)) + { + return ExecutionResultFormat.InlineArrow; + } + + if (string.Equals(disposition, DispositionExternalLinks, StringComparison.OrdinalIgnoreCase)) + { + return ExecutionResultFormat.ExternalLinks; + } + + if (string.Equals(disposition, DispositionInlineOrExternalLinks, StringComparison.OrdinalIgnoreCase)) + { + // Auto disposition: distinguish by what the server actually produced. + if (response == null) + { + return ExecutionResultFormat.Unspecified; + } + + if (HasExternalLinks(response)) + { + return ExecutionResultFormat.ExternalLinks; + } + + // The server picks inline for small results in auto disposition. We only call + // this when the response is non-null AND has no external links, which matches + // the inline-attachment row of the table. + if (HasInlineResult(response)) + { + return ExecutionResultFormat.InlineArrow; + } + + // No manifest and no result data yet (e.g. PENDING) — can't safely pick. + return ExecutionResultFormat.Unspecified; + } + + return ExecutionResultFormat.Unspecified; + } + + private static bool HasExternalLinks(ExecuteStatementResponse response) + { + // External links can be reported either on the result chunks in the manifest or + // directly on the (hybrid) Result payload. Either is sufficient evidence. + if (response.Manifest?.Chunks != null) + { + foreach (var chunk in response.Manifest.Chunks) + { + if (chunk.ExternalLinks != null && chunk.ExternalLinks.Count > 0) + { + return true; + } + } + } + + if (response.Result?.ExternalLinks != null && response.Result.ExternalLinks.Count > 0) + { + return true; + } + + return false; + } + + private static bool HasInlineResult(ExecuteStatementResponse response) + { + // Treat presence of a Manifest as sufficient evidence the server has produced a + // shape — combined with the no-external-links check above, that places us in the + // inline-attachment cell of the table. The attachment bytes themselves may be + // empty (zero-row result) but the format is still INLINE_ARROW. + return response.Manifest != null || response.Result != null; + } + } +} diff --git a/csharp/src/StatementExecution/StatementExecutionConnection.cs b/csharp/src/StatementExecution/StatementExecutionConnection.cs index daade95c..e0defcbc 100644 --- a/csharp/src/StatementExecution/StatementExecutionConnection.cs +++ b/csharp/src/StatementExecution/StatementExecutionConnection.cs @@ -16,11 +16,14 @@ using System; using System.Collections.Generic; +using System.Diagnostics; using System.Linq; using System.Net.Http; using System.Threading; using System.Threading.Tasks; +using AdbcDrivers.Databricks.Auth; using AdbcDrivers.Databricks.Http; +using AdbcDrivers.Databricks.Telemetry; using AdbcDrivers.HiveServer2; using AdbcDrivers.HiveServer2.Hive2; using AdbcDrivers.Databricks.StatementExecution.MetadataCommands; @@ -31,6 +34,7 @@ using Apache.Arrow.Ipc; using Apache.Arrow.Types; using static Apache.Arrow.Adbc.AdbcConnection; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; namespace AdbcDrivers.Databricks.StatementExecution { @@ -55,6 +59,37 @@ internal class StatementExecutionConnection : TracingConnection, IGetObjectsData private string? _sessionId; private readonly SemaphoreSlim _sessionLock = new SemaphoreSlim(1, 1); + // Shared OAuth token provider for connection-wide token caching (used by telemetry HTTP client) + private OAuthClientCredentialsProvider? _oauthTokenProvider; + + // Telemetry — mirrors the DatabricksConnection (Thrift) wiring. Defaults to NoOp so + // callsites never need null checks; replaced by ConnectionTelemetry.Create after + // session open succeeds. Tests can substitute via TelemetryForTesting. + private IConnectionTelemetry _telemetry = NoOpConnectionTelemetry.Instance; + + // Stopwatch covering the connection lifetime; used to measure session-open latency for + // the CREATE_SESSION telemetry event. Matches the Thrift path's _connectionLifetimeStopwatch. + private readonly Stopwatch _connectionLifetimeStopwatch = Stopwatch.StartNew(); + private bool _sessionOpenTelemetryEmitted; + private bool _sessionDeleteTelemetryEmitted; + + /// + /// The session context consumed by SEA statements (next phase) to create + /// per-statement observer contexts. Returns null when telemetry is disabled. + /// + internal TelemetrySessionContext? TelemetrySession => _telemetry.Session; + + /// + /// Test seam allowing unit tests to substitute a fake + /// so they can verify the production code calls EmitOperationTelemetry at the + /// right lifecycle hooks. Production code never sets this property. + /// + internal IConnectionTelemetry TelemetryForTesting + { + get => _telemetry; + set => _telemetry = value; + } + // Configuration for statement creation private readonly string _resultDisposition; private readonly string _resultFormat; @@ -66,6 +101,25 @@ internal class StatementExecutionConnection : TracingConnection, IGetObjectsData private readonly bool _useDescTableExtended; private readonly bool _applySSPWithQueries; + // Direct-results capability flag — read from the same connection-string property the + // Thrift path uses (DatabricksParameters.EnableDirectResults). Mirrored into the + // telemetry payload's enable_direct_results field so dashboards reflect the user's + // actual configuration rather than a hardcoded literal. Defaults to true, matching + // DatabricksConnection (Thrift) defaults. + private readonly bool _enableDirectResults; + + // Default connect-timeout value used when the connection string omits one. + // Mirrors the Thrift path's HiveServer2Connection.ConnectTimeoutMillisecondsDefault + // so dashboards filtering on socket_timeout see consistent values across both transports. + private const int ConnectTimeoutMillisecondsDefault = 30000; + + // Connect timeout in milliseconds, read from the same connection-string property the + // Thrift path uses (SparkParameters.ConnectTimeoutMilliseconds). Used only to stamp the + // telemetry payload's socket_timeout field — distinct from _waitTimeoutSeconds, which is + // the SEA query-wait (CONTINUE) timeout and unrelated to the connection-establishment + // timeout the telemetry field represents. + private readonly int _connectTimeoutMilliseconds; + // Memory pooling (shared across connection) private readonly Microsoft.IO.RecyclableMemoryStreamManager _recyclableMemoryStreamManager; private readonly System.Buffers.ArrayPool _lz4BufferPool; @@ -212,6 +266,7 @@ private StatementExecutionConnection( // the server validates differently in CreateSession vs SET) get the same semantics // on SEA. JDBC has no equivalent flag — this is parity with the Thrift driver only. _applySSPWithQueries = PropertyHelper.GetBooleanPropertyWithValidation(properties, DatabricksParameters.ApplySSPWithQueries, false); + _enableDirectResults = PropertyHelper.GetBooleanPropertyWithValidation(properties, DatabricksParameters.EnableDirectResults, true); // Session configuration // Only supply catalog from connection properties when EnableMultipleCatalogSupport is true. @@ -243,6 +298,13 @@ private StatementExecutionConnection( _pollingIntervalMs = PropertyHelper.GetPositiveIntPropertyWithValidation( properties, ApacheParameters.PollTimeMilliseconds, defaultValue: 1000); + // Connect timeout — used only for telemetry-init stamping (socket_timeout field). + // Read from the same connection-string property the Thrift path uses so SEA and + // Thrift records agree on the same source. NOT derived from _waitTimeoutSeconds: + // that field is the SEA CONTINUE/query-wait timeout, a semantically different concept. + _connectTimeoutMilliseconds = PropertyHelper.GetIntPropertyWithValidation( + properties, SparkParameters.ConnectTimeoutMilliseconds, ConnectTimeoutMillisecondsDefault); + // Memory pooling _recyclableMemoryStreamManager = memoryStreamManager ?? new Microsoft.IO.RecyclableMemoryStreamManager(); _lz4BufferPool = lz4BufferPool ?? System.Buffers.ArrayPool.Create(maxArrayLength: 4 * 1024 * 1024, maxArraysPerBucket: 10); @@ -299,6 +361,17 @@ private HttpClient CreateHttpClient(IReadOnlyDictionary properti int rateLimitRetryTimeout = PropertyHelper.GetIntPropertyWithValidation(properties, DatabricksParameters.RateLimitRetryTimeout, DatabricksConstants.DefaultRateLimitRetryTimeout); int timeoutMinutes = PropertyHelper.GetPositiveIntPropertyWithValidation(properties, DatabricksParameters.CloudFetchTimeoutMinutes, DatabricksConstants.DefaultCloudFetchTimeoutMinutes); + // Resolve the effective HttpClient.Timeout. Prefer the user-supplied + // SparkParameters.ConnectTimeoutMilliseconds (matching the Thrift path, which + // wires that property into its transport-level connect timeout) so that the + // socket_timeout telemetry field — which downstream gap fixes source from + // _httpClient.Timeout — reflects the user's intent. Fall back to + // CloudFetchTimeoutMinutes-based timeout when the property is not set, to + // preserve the historical default behavior for callers that never tuned this. + TimeSpan effectiveTimeout = properties.ContainsKey(SparkParameters.ConnectTimeoutMilliseconds) + ? TimeSpan.FromMilliseconds(_connectTimeoutMilliseconds) + : TimeSpan.FromMinutes(timeoutMinutes); + var config = new HttpHandlerFactory.HandlerConfig { BaseHandler = HttpClientFactory.CreateHandler(properties), @@ -319,11 +392,14 @@ private HttpClient CreateHttpClient(IReadOnlyDictionary properti AddThriftErrorHandler = false }; - var result = HttpHandlerFactory.CreateHandlers(config); + // Use CreateHandlersWithTokenProvider so we can capture the OAuth token provider + // and reuse it for the telemetry HTTP client (avoids duplicate token fetches). + var result = HttpHandlerFactory.CreateHandlersWithTokenProvider(config); + _oauthTokenProvider = result.TokenProvider; - var httpClient = new HttpClient(result) + var httpClient = new HttpClient(result.Handler) { - Timeout = TimeSpan.FromMinutes(timeoutMinutes) + Timeout = effectiveTimeout }; // Set user agent @@ -430,6 +506,13 @@ public async Task OpenAsync(CancellationToken cancellationToken = default) { _catalog = GetCurrentCatalog(); } + + // Initialize telemetry after a successful CreateSession. Telemetry is fail-open: + // any failure here is swallowed and falls back to NoOpConnectionTelemetry so the + // user-visible connection open still succeeds. + InitializeTelemetry(Activity.Current); + + EmitCreateSessionTelemetry(Activity.Current); } } finally @@ -439,11 +522,163 @@ public async Task OpenAsync(CancellationToken cancellationToken = default) } } + /// + /// Initializes telemetry via the ConnectionTelemetry factory. Mirrors the Thrift + /// path's DatabricksConnection.InitializeTelemetry. Falls back to + /// on any failure so connection open + /// always succeeds even if the telemetry pipeline is misconfigured. + /// + private void InitializeTelemetry(Activity? activity) + { + try + { + // SEA hands its server-assigned session id string directly — no transport handle + // to unwrap, unlike the Thrift caller which converts TSessionHandle->string. + _telemetry = Telemetry.ConnectionTelemetry.Create( + properties: _properties, + host: GetHost(_properties), + assemblyVersion: AssemblyVersion, + oauthTokenProvider: _oauthTokenProvider, + sessionId: _sessionId ?? string.Empty, + mode: Telemetry.Proto.DriverMode.Types.Type.Sea, + enableDirectResults: _enableDirectResults, + useDescTableExtended: _useDescTableExtended, + connectTimeoutMilliseconds: _connectTimeoutMilliseconds, + activity: activity); + } + catch (Exception ex) + { + // Defensive: ConnectionTelemetry.Create already swallows internally, but per the + // design contract telemetry init must never escape into the caller's open path. + activity?.AddEvent(new ActivityEvent("telemetry.initialization.error", + tags: new ActivityTagsCollection + { + { "error.type", ex.GetType().Name }, + { "error.message", ex.Message } + })); + _telemetry = NoOpConnectionTelemetry.Instance; + } + } + + /// + /// Emits the CREATE_SESSION telemetry event after a successful CreateSession RPC. + /// Internal so unit tests can call this directly after injecting a fake + /// via . + /// + internal void EmitCreateSessionTelemetry(Activity? activity = null) + { + try + { + long elapsedMs = _connectionLifetimeStopwatch.ElapsedMilliseconds; + _telemetry.EmitOperationTelemetry( + Telemetry.Proto.Operation.Types.Type.CreateSession, + Telemetry.Proto.Statement.Types.Type.Unspecified, + statementId: null, + elapsedMs: elapsedMs, + error: null); + _sessionOpenTelemetryEmitted = true; + } + catch (Exception ex) + { + activity?.AddEvent(new ActivityEvent("telemetry.emit.error", + tags: new ActivityTagsCollection + { + { "error.type", ex.GetType().Name }, + { "error.message", ex.Message }, + { "operation_type", "CREATE_SESSION" } + })); + } + } + + /// + /// Emits the DELETE_SESSION telemetry event when a session was previously opened. + /// Idempotent across repeated calls. Internal for test access. + /// + internal void EmitDeleteSessionTelemetry(long elapsedMs = 0, Exception? error = null) + { + try + { + if (_sessionOpenTelemetryEmitted && !_sessionDeleteTelemetryEmitted) + { + _sessionDeleteTelemetryEmitted = true; + _telemetry.EmitOperationTelemetry( + Telemetry.Proto.Operation.Types.Type.DeleteSession, + Telemetry.Proto.Statement.Types.Type.Unspecified, + statementId: null, + elapsedMs: elapsedMs, + error: error); + } + } + catch (Exception ex) + { + Activity.Current?.AddEvent(new ActivityEvent("telemetry.emit.error", + tags: new ActivityTagsCollection + { + { "error.type", ex.GetType().Name }, + { "error.message", ex.Message }, + { "operation_type", "DELETE_SESSION" } + })); + } + } + + /// + /// Statement-facing entry point for emitting telemetry for discrete statement + /// operations (CLOSE_STATEMENT, CANCEL_STATEMENT) that don't have their own + /// execute path. Delegates to the connection's telemetry implementation and + /// swallows any exception thrown by the emit call so statement Dispose never + /// surfaces a telemetry failure to the consumer. Mirrors the Thrift-side + /// entry + /// point used by DatabricksStatement.Dispose. + /// + internal void EmitStatementOperationTelemetry( + Telemetry.Proto.Operation.Types.Type operationType, + Telemetry.Proto.Statement.Types.Type statementType, + string? statementId, + long elapsedMs, + Exception? error) + { + try + { + _telemetry.EmitOperationTelemetry( + operationType, + statementType, + statementId, + elapsedMs, + error); + } + catch (Exception ex) + { + Activity.Current?.AddEvent(new ActivityEvent("telemetry.emit.error", + tags: new ActivityTagsCollection + { + { "error.type", ex.GetType().Name }, + { "error.message", ex.Message }, + { "operation_type", operationType.ToString() } + })); + } + } + /// /// Creates a new statement for query execution. + /// Constructs a per-statement bound to + /// this connection's telemetry session — a when + /// telemetry is enabled and the session has a live client, otherwise + /// . This mirrors the Thrift-side + /// wiring and gives all + /// subsequent hookpoint commits a non-null target in the statement. /// public override AdbcStatement CreateStatement() { + // Inject the observer at construction so StatementExecutionStatement is not + // coupled to TelemetrySession at runtime. When telemetry is disabled (NoOp + // connection telemetry → Session is null) or the session has no live client + // (configuration/circuit breaker), we hand the statement a NullObserver so + // every hookpoint call is a fail-open no-op without callsite null-checks. + TelemetrySessionContext? session = TelemetrySession; + IStatementOperationObserver observer = session?.TelemetryClient != null + ? (IStatementOperationObserver)new TelemetryObserver(session) + : NullObserver.Instance; + return new StatementExecutionStatement( _client, _sessionId, @@ -459,7 +694,8 @@ public override AdbcStatement CreateStatement() _recyclableMemoryStreamManager, _lz4BufferPool, _cloudFetchHttpClient, - this); // Pass connection as TracingConnection for tracing support + this, // Pass connection as TracingConnection for tracing support + observer); } public override void SetOption(string key, string? value) @@ -536,11 +772,27 @@ public override IArrowArrayStream GetTableTypes() { return this.TraceActivity(activity => { + var sw = Stopwatch.StartNew(); var builder = new StringArray.Builder(); builder.Append("TABLE"); builder.Append("VIEW"); var schema = new Schema(new[] { new Field("table_type", StringType.Default, false) }, null); - return new HiveInfoArrowStream(schema, new IArrowArray[] { builder.Build() }); + var stream = new HiveInfoArrowStream(schema, new IArrowArray[] { builder.Build() }); + sw.Stop(); + + // GetTableTypes is purely local — it never executes SQL, so the per-statement + // observer pipeline that drives the other LIST_* events does not fire here. + // Emit a one-shot connection-level telemetry record so SEA matches the Thrift + // path's "every metadata operation produces a categorized record" guarantee. + // See PECO-3022 gap B8 finisher. + EmitStatementOperationTelemetry( + OperationType.ListTableTypes, + Telemetry.Proto.Statement.Types.Type.Metadata, + statementId: null, + elapsedMs: sw.ElapsedMilliseconds, + error: null); + + return stream; }, nameof(GetTableTypes)); } @@ -594,7 +846,7 @@ public override Schema GetTableSchema(string? catalog, string? dbSchema, string async Task> IGetObjectsDataProvider.GetCatalogsAsync(string? catalogPattern, CancellationToken cancellationToken) { string sql = new ShowCatalogsCommand(catalogPattern).Build(); - var batches = await ExecuteMetadataSqlAsync(sql, cancellationToken).ConfigureAwait(false); + var batches = await ExecuteMetadataSqlAsync(sql, OperationType.ListCatalogs, cancellationToken).ConfigureAwait(false); var result = new List(); foreach (var batch in batches) { @@ -620,7 +872,7 @@ async Task> IGetObjectsDataProvider.GetCatalogsAsync(strin List batches; try { - batches = await ExecuteMetadataSqlAsync(sql, cancellationToken).ConfigureAwait(false); + batches = await ExecuteMetadataSqlAsync(sql, OperationType.ListSchemas, cancellationToken).ConfigureAwait(false); } catch (DatabricksException ex) when (ex.IsObjectNotFoundException()) { @@ -668,7 +920,7 @@ async Task> IGetObjectsDataProvider.GetCatalogsAsync(strin List batches; try { - batches = await ExecuteMetadataSqlAsync(sql, cancellationToken).ConfigureAwait(false); + batches = await ExecuteMetadataSqlAsync(sql, OperationType.ListTables, cancellationToken).ConfigureAwait(false); } catch (DatabricksException ex) when (ex.IsObjectNotFoundException()) { @@ -780,10 +1032,43 @@ async Task IGetObjectsDataProvider.PopulateColumnInfoAsync(string? catalogPatter } internal async Task> ExecuteMetadataSqlAsync(string sql, CancellationToken cancellationToken = default) + { + // Back-compat overload (no operationType): used by callers that don't have a + // mapped telemetry OperationType handy. The sub-statement falls back to + // (StatementType.Query, OperationType.ExecuteStatementAsync) which matches + // pre-PECO-3022-B8 behavior. + return await ExecuteMetadataSqlAsyncCore(sql, operationType: null, cancellationToken).ConfigureAwait(false); + } + + /// + /// Overload that tags the sub-statement with a metadata + /// so its telemetry emits + /// (StatementType.Metadata, operationType) rather than the regular + /// query pair. Mirrors the Thrift parity model — see PECO-3022 gap B8. + /// + internal async Task> ExecuteMetadataSqlAsync( + string sql, + OperationType operationType, + CancellationToken cancellationToken = default) + { + return await ExecuteMetadataSqlAsyncCore(sql, operationType, cancellationToken).ConfigureAwait(false); + } + + private async Task> ExecuteMetadataSqlAsyncCore( + string sql, + OperationType? operationType, + CancellationToken cancellationToken) { var batches = new List(); using var stmt = (StatementExecutionStatement)CreateStatement(); stmt.SqlQuery = sql; + if (operationType.HasValue) + { + // Stamp the metadata operation type before ExecuteQueryAsync so the sub- + // statement's OnExecuteStarted emits (Metadata, operationType) instead of + // (Query, ExecuteStatementAsync). See SeaMetadataOperationMapper. + stmt.SetPendingMetadataOperation(operationType.Value); + } var result = await stmt.ExecuteQueryAsync(cancellationToken, isMetadataExecution: true).ConfigureAwait(false); using var stream = result.Stream; if (stream == null) return batches; @@ -805,6 +1090,10 @@ internal List ExecuteMetadataSql(string sql, CancellationToken canc /// /// Executes a SHOW COLUMNS command. When catalog is null, iterates over all catalogs /// since SHOW COLUMNS IN ALL CATALOGS is not yet supported by the backend. + /// Telemetry: every sub-statement emits (StatementType.Metadata, + /// OperationType.ListColumns) — including the helper SHOW CATALOGS issued + /// during the iterate-all-catalogs fallback, since the caller's semantic operation + /// is still ListColumns (PECO-3022 gap B8). /// internal async Task> ExecuteShowColumnsAsync( string? catalog, string? schemaPattern, string? tablePattern, string? columnPattern, @@ -813,14 +1102,14 @@ internal async Task> ExecuteShowColumnsAsync( if (catalog != null) { string sql = new ShowColumnsCommand(catalog, schemaPattern, tablePattern, columnPattern).Build(); - return await ExecuteMetadataSqlAsync(sql, cancellationToken).ConfigureAwait(false); + return await ExecuteMetadataSqlAsync(sql, OperationType.ListColumns, cancellationToken).ConfigureAwait(false); } // SHOW COLUMNS IN ALL CATALOGS is not supported — iterate over each catalog. // TODO: Remove this fallback when the backend supports SHOW COLUMNS IN ALL CATALOGS. var allBatches = new List(); string catalogsSql = new ShowCatalogsCommand(null).Build(); - var catalogBatches = await ExecuteMetadataSqlAsync(catalogsSql, cancellationToken).ConfigureAwait(false); + var catalogBatches = await ExecuteMetadataSqlAsync(catalogsSql, OperationType.ListColumns, cancellationToken).ConfigureAwait(false); foreach (var batch in catalogBatches) { @@ -833,7 +1122,7 @@ internal async Task> ExecuteShowColumnsAsync( string sql = new ShowColumnsCommand(cat, schemaPattern, tablePattern, columnPattern).Build(); try { - var batches = await ExecuteMetadataSqlAsync(sql, cancellationToken).ConfigureAwait(false); + var batches = await ExecuteMetadataSqlAsync(sql, OperationType.ListColumns, cancellationToken).ConfigureAwait(false); allBatches.AddRange(batches); } catch @@ -950,6 +1239,9 @@ public async Task ApplyServerSidePropertiesAsync(CancellationToken cancellationT /// /// Disposes the connection and deletes the session if it exists. + /// Emits DELETE_SESSION telemetry with the measured DeleteSession RPC latency + /// (or zero if no session was ever opened), then flushes the telemetry pipeline + /// with a 5-second hard timeout so a wedged exporter cannot hang Dispose. /// public override void Dispose() { @@ -958,27 +1250,57 @@ public override void Dispose() activity?.SetTag("session_id", _sessionId); activity?.SetTag("warehouse_id", _warehouseId); + long deleteSessionElapsedMs = 0; + Exception? deleteSessionError = null; + if (_sessionId != null) { + var deleteStopwatch = Stopwatch.StartNew(); try { - activity?.AddEvent(new System.Diagnostics.ActivityEvent("session.delete.start")); + activity?.AddEvent(new ActivityEvent("session.delete.start")); // Delete session synchronously during dispose _client.DeleteSessionAsync(_sessionId, _warehouseId, CancellationToken.None).GetAwaiter().GetResult(); - activity?.AddEvent(new System.Diagnostics.ActivityEvent("session.delete.success")); + activity?.AddEvent(new ActivityEvent("session.delete.success")); } catch (Exception ex) { + deleteSessionError = ex; // Best effort - ignore errors during dispose but trace them - activity?.AddEvent(new System.Diagnostics.ActivityEvent("session.delete.error", - tags: new System.Diagnostics.ActivityTagsCollection { { "error", ex.Message } })); + activity?.AddEvent(new ActivityEvent("session.delete.error", + tags: new ActivityTagsCollection { { "error", ex.Message } })); } finally { + deleteStopwatch.Stop(); + deleteSessionElapsedMs = deleteStopwatch.ElapsedMilliseconds; _sessionId = null; } } + // Emit DELETE_SESSION after the RPC completes (success or failure) and before we + // tear down the telemetry client. Idempotent — repeated Dispose calls fire once. + EmitDeleteSessionTelemetry(deleteSessionElapsedMs, deleteSessionError); + + // Flush + release the telemetry client. Bounded at 5 seconds so a stuck exporter + // cannot hang connection close. Per design §11: this matches the Thrift path's + // DisposeTelemetryAsync().GetAwaiter().GetResult() pattern, but with a hard + // timeout because SEA Dispose is not wrapped in a try/finally that flushes. + // IConnectionTelemetry.DisposeAsync already returns Task, so we call .Wait directly. + try + { + _telemetry.DisposeAsync().Wait(TimeSpan.FromSeconds(5)); + } + catch (Exception ex) + { + activity?.AddEvent(new ActivityEvent("telemetry.dispose.error", + tags: new ActivityTagsCollection + { + { "error.type", ex.GetType().Name }, + { "error.message", ex.Message } + })); + } + // Dispose the HTTP client if we own it if (_ownsHttpClient) { diff --git a/csharp/src/StatementExecution/StatementExecutionStatement.cs b/csharp/src/StatementExecution/StatementExecutionStatement.cs index b910097a..0ae034b0 100644 --- a/csharp/src/StatementExecution/StatementExecutionStatement.cs +++ b/csharp/src/StatementExecution/StatementExecutionStatement.cs @@ -25,6 +25,7 @@ using AdbcDrivers.Databricks.Reader.CloudFetch; using AdbcDrivers.Databricks.StatementExecution.MetadataCommands; using AdbcDrivers.Databricks.Result; +using AdbcDrivers.Databricks.Telemetry; using AdbcDrivers.HiveServer2; using AdbcDrivers.HiveServer2.Hive2; using Apache.Arrow; @@ -32,6 +33,8 @@ using Apache.Arrow.Adbc.Tracing; using Apache.Arrow.Ipc; using Apache.Arrow.Types; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; namespace AdbcDrivers.Databricks.StatementExecution { @@ -72,6 +75,62 @@ internal class StatementExecutionStatement : TracingStatement // Connection reference for metadata queries private readonly StatementExecutionConnection _connection; + /// + /// Observer for this statement's operational lifecycle. Injected at construction by + /// — a + /// bound to the connection's + /// when telemetry is enabled, otherwise + /// . Subsequent hookpoint commits will route the + /// statement's lifecycle calls (execute, poll, first-batch, consumed, error, finalized) + /// through this field. Never null — callsites do not need to null-check. + /// + private readonly IStatementOperationObserver _observer; + + /// + /// Tracks whether has been + /// fired for this statement. Gates the terminal + /// call in : + /// when a statement is disposed without ever being executed, the observer must not + /// enqueue an empty execute-statement log. Mirrors the gate used in the Thrift + /// DatabricksStatement path so SEA and Thrift produce byte-identical telemetry + /// for the never-executed-then-disposed shape. + /// + private bool _executeStarted; + + /// + /// Wall-clock stopwatch started in lockstep with + /// and read at reader- + /// construction time to supply the latencyMs argument for + /// . Mirrors the Thrift + /// path's StatementTelemetryContext.ExecuteStopwatch so SEA reports the same + /// result_set_ready_latency_millis semantic — elapsed time from execute start + /// to the moment first-batch data is available to the reader. + /// + private readonly Stopwatch _executeStopwatch = new Stopwatch(); + + /// + /// Reference to the outermost wrapping the + /// reader returned by a successful execute. Tracked so can + /// invoke when the + /// consumer abandons the reader without disposing it — guaranteeing + /// (and, on the CloudFetch + /// path, ) fire on + /// every successful EXECUTE record before + /// emits the log. Null on the error path (no reader was constructed) and on the + /// never-executed path. Mirrors the Thrift driver's EmitTelemetry contract, + /// which always calls RecordResultsConsumed regardless of reader iteration. + /// + private ConsumptionObservingStream? _consumptionStream; + + /// + /// Idempotency gate for the CLOSE_STATEMENT operation telemetry emitted at + /// . Mirrors the Thrift driver's + /// _closeStatementTelemetryEmitted field — repeated Dispose() calls + /// (common in using + manual dispose patterns) must not produce duplicate + /// CLOSE_STATEMENT records. + /// + private bool _closeStatementTelemetryEmitted; + // Statement state private string? _currentStatementId; private string? _sqlQuery; @@ -93,6 +152,39 @@ internal class StatementExecutionStatement : TracingStatement private string? _metadataForeignTableName; private string? _queryTags; + /// + /// When non-null, indicates this statement is acting as the internal sub-statement + /// for a metadata operation (e.g. GetObjects / GetCatalogs / + /// SqlQuery="getschemas"). The next OnExecuteStarted hook will emit + /// + this pair + /// instead of the regular (Query, ExecuteStatementAsync) pair. + /// + /// + /// Set via by the connection-level helpers + /// before ExecuteQueryAsync runs. Mirrors the Thrift parity model where + /// DatabricksStatement.BeginExecuteTelemetry picks the + /// (StatementType, OperationType) pair based on + /// GetMetadataOperationType(SqlQuery); see PECO-3022 gap B8. + /// + /// + private OperationType? _pendingMetadataOperation; + + /// + /// Marks this statement as the internal sub-statement for a metadata operation so + /// the next OnExecuteStarted hook reports + /// + rather + /// than the regular (Query, ExecuteStatementAsync) pair. + /// + /// + /// Used by + /// before invoking on the sub-statement. + /// + /// + internal void SetPendingMetadataOperation(OperationType operationType) + { + _pendingMetadataOperation = operationType; + } + /// /// Parses "key1:val1,key2:val2" into a list of QueryTag objects. /// Keys cannot contain : or , so first : is always the key-value separator. @@ -165,7 +257,8 @@ public StatementExecutionStatement( Microsoft.IO.RecyclableMemoryStreamManager recyclableMemoryStreamManager, System.Buffers.ArrayPool lz4BufferPool, HttpClient httpClient, - StatementExecutionConnection connection) + StatementExecutionConnection connection, + IStatementOperationObserver? observer = null) : base(connection) { _connection = connection ?? throw new ArgumentNullException(nameof(connection)); @@ -186,6 +279,10 @@ public StatementExecutionStatement( _lz4BufferPool = lz4BufferPool ?? throw new ArgumentNullException(nameof(lz4BufferPool)); _httpClient = httpClient ?? throw new ArgumentNullException(nameof(httpClient)); _enableComplexDatatypeSupport = connection.EnableComplexDatatypeSupport; + // Defaulting to NullObserver — never null — lets every hookpoint callsite skip + // null-checks (design §12). Callers that want telemetry pass a TelemetryObserver + // bound to the connection's TelemetrySessionContext. + _observer = observer ?? NullObserver.Instance; // Match Thrift: statement starts with connection's default catalog. // When enableMultipleCatalogSupport=true, this is the catalog from config (e.g. "main"). @@ -193,6 +290,14 @@ public StatementExecutionStatement( _metadataCatalogName = catalog; } + /// + /// Internal accessor for the injected observer. Exposed so unit tests can verify + /// that wired the correct + /// observer type (TelemetryObserver when telemetry is enabled, NullObserver otherwise). + /// Production code uses the _observer field directly. + /// + internal IStatementOperationObserver Observer => _observer; + /// /// Gets or sets the SQL query to execute. /// @@ -322,92 +427,198 @@ public async Task ExecuteQueryAsync( private async Task ExecuteQueryInternalAsync(CancellationToken cancellationToken, bool isMetadataExecution) { - // Build the execute statement request - // Note: warehouse_id is always required by the Databricks Statement Execution API - // Note: catalog/schema cannot be set when session_id is provided (session has context) - var request = new ExecuteStatementRequest - { - Statement = _sqlQuery, - WarehouseId = _warehouseId, - SessionId = _sessionId, - Catalog = string.IsNullOrEmpty(_sessionId) ? _catalog : null, - Schema = string.IsNullOrEmpty(_sessionId) ? _schema : null, - Disposition = _resultDisposition, - Format = _resultFormat, - ResultCompression = _resultCompression, - WaitTimeout = $"{_waitTimeoutSeconds}s", - OnWaitTimeout = "CONTINUE", - IsMetadata = isMetadataExecution, - QueryTags = ParseQueryTags(_queryTags) - }; - - // Execute the statement - var response = await _client.ExecuteStatementAsync(request, cancellationToken).ConfigureAwait(false); - _currentStatementId = response.StatementId; - - // Handle query status according to Databricks API documentation: - // PENDING: waiting for warehouse - continue polling - // RUNNING: running - continue polling - // SUCCEEDED: execution was successful, result data available for fetch - // FAILED: execution failed; reason for failure described in accompanying error message - // CANCELED: user canceled; can come from explicit cancel call, or timeout with on_wait_timeout=CANCEL - // CLOSED: execution successful, and statement closed; result no longer available for fetch - var state = response.Status?.State; - if (state == "PENDING" || state == "RUNNING") - { - response = await PollWithTimeoutAsync(response.StatementId, cancellationToken).ConfigureAwait(false); - state = response.Status?.State; + // Telemetry: signal OnExecuteStarted before submitting the statement. ExecuteQueryInternalAsync + // runs for both regular queries and the sub-statements created by the connection's + // metadata helpers (ExecuteMetadataSqlAsync / ExecuteShowColumnsAsync). For regular + // queries the pair is (Query, ExecuteStatementAsync) — SEA is always async on the wire + // (the client submits a statement and polls for completion), so the operation_type + // recorded in telemetry must be EXECUTE_STATEMENT_ASYNC rather than the synchronous + // EXECUTE_STATEMENT used on the Thrift path. For metadata sub-statements, + // SetPendingMetadataOperation stamps _pendingMetadataOperation with the appropriate + // OperationType.List* before ExecuteQueryAsync runs, and we emit + // (Metadata, _pendingMetadataOperation) instead — mirroring the Thrift parity + // model in DatabricksStatement.BeginExecuteTelemetry (PECO-3022 gap B8). + // isCompressed reflects the requested result compression — the manifest may override + // later but the observer's first signal records what we asked for. + bool isCompressed = !string.IsNullOrEmpty(_resultCompression) + && string.Equals(_resultCompression, "LZ4_FRAME", StringComparison.OrdinalIgnoreCase); + + StatementType stmtType; + OperationType opType; + if (_pendingMetadataOperation.HasValue) + { + stmtType = StatementType.Metadata; + opType = _pendingMetadataOperation.Value; } - - // Check for terminal error states - if (state == "FAILED") - { - var error = response.Status?.Error; - throw new AdbcException($"Statement execution failed: {error?.Message ?? "Unknown error"} (Error Code: {error?.ErrorCode})"); - } - if (state == "CANCELED") - { - throw new AdbcException("Statement execution was canceled"); - } - if (state == "CLOSED") + else { - throw new AdbcException("Statement was closed before results could be retrieved"); + stmtType = StatementType.Query; + opType = OperationType.ExecuteStatementAsync; } - // Check for truncated results warning - if (response.Manifest?.Truncated == true) + // _executeStarted is set in lockstep with OnExecuteStarted so the dispose-path + // finalize call only fires when execute actually began. Setting before the observer + // call is safe: the observer contract guarantees OnExecuteStarted does not throw, + // and even if it did, treating a partially-observed execute as "started" matches + // the Thrift path's BeginExecuteTelemetry ordering. + _executeStarted = true; + // Start the stopwatch alongside OnExecuteStarted so OnFirstBatchReady can report + // elapsed-since-execute-start as its latencyMs argument (gap G3 / design §6 row 4). + _executeStopwatch.Start(); + _observer.OnExecuteStarted(stmtType, opType, isCompressed); + + try { - Activity.Current?.AddEvent(new ActivityEvent("statement.results_truncated", - tags: new ActivityTagsCollection - { - { "total_row_count", response.Manifest.TotalRowCount }, - { "total_byte_count", response.Manifest.TotalByteCount } - })); - } + // Build the execute statement request + // Note: warehouse_id is always required by the Databricks Statement Execution API + // Note: catalog/schema cannot be set when session_id is provided (session has context) + var request = new ExecuteStatementRequest + { + Statement = _sqlQuery, + WarehouseId = _warehouseId, + SessionId = _sessionId, + Catalog = string.IsNullOrEmpty(_sessionId) ? _catalog : null, + Schema = string.IsNullOrEmpty(_sessionId) ? _schema : null, + Disposition = _resultDisposition, + Format = _resultFormat, + ResultCompression = _resultCompression, + WaitTimeout = $"{_waitTimeoutSeconds}s", + OnWaitTimeout = "CONTINUE", + IsMetadata = isMetadataExecution, + QueryTags = ParseQueryTags(_queryTags) + }; - // Create appropriate reader based on result disposition. - // All paths (inline, CloudFetch, empty) expose the manifest schema so that - // IntervalSerializingStream and ComplexTypeSerializingStream can detect columns - // uniformly via Spark:DataType:SqlName metadata rather than Arrow IPC types. - IArrowArrayStream reader = CreateReader(response, cancellationToken); + // Execute the statement + var response = await _client.ExecuteStatementAsync(request, cancellationToken).ConfigureAwait(false); + _currentStatementId = response.StatementId; + + // Telemetry: signal OnExecuteSucceeded as soon as the server has accepted the statement + // and a statement id is known. SeaResultFormatMapper.Map derives the typed proto enum + // from the (disposition, format) request pair plus the observed response shape; for + // PENDING responses with no manifest yet, the auto-disposition cells fall back to + // Unspecified, which is acceptable because phase 5 fires this hookpoint before + // polling. + _observer.OnExecuteSucceeded( + response.StatementId ?? string.Empty, + SeaResultFormatMapper.Map(_resultDisposition, _resultFormat, response)); + + // Handle query status according to Databricks API documentation: + // PENDING: waiting for warehouse - continue polling + // RUNNING: running - continue polling + // SUCCEEDED: execution was successful, result data available for fetch + // FAILED: execution failed; reason for failure described in accompanying error message + // CANCELED: user canceled; can come from explicit cancel call, or timeout with on_wait_timeout=CANCEL + // CLOSED: execution successful, and statement closed; result no longer available for fetch + var state = response.Status?.State; + if (state == "PENDING" || state == "RUNNING") + { + response = await PollWithTimeoutAsync(response.StatementId, cancellationToken).ConfigureAwait(false); + state = response.Status?.State; + } - // SEA emits YearMonthIntervalType and DurationType; Thrift emits StringType for intervals. - // Convert interval/duration columns to canonical UTF-8 strings to match Thrift behavior. - reader = new IntervalSerializingStream(reader); + // Check for terminal error states + if (state == "FAILED") + { + var error = response.Status?.Error; + throw new AdbcException($"Statement execution failed: {error?.Message ?? "Unknown error"} (Error Code: {error?.ErrorCode})"); + } + if (state == "CANCELED") + { + throw new AdbcException("Statement execution was canceled"); + } + if (state == "CLOSED") + { + throw new AdbcException("Statement was closed before results could be retrieved"); + } - // When EnableComplexDatatypeSupport=false (default), serialize complex Arrow types to JSON strings - // so that SEA behavior matches Thrift (which sets ComplexTypesAsArrow=false). - if (!_enableComplexDatatypeSupport) - { - reader = new ComplexTypeSerializingStream(reader); - } + // Check for truncated results warning + if (response.Manifest?.Truncated == true) + { + Activity.Current?.AddEvent(new ActivityEvent("statement.results_truncated", + tags: new ActivityTagsCollection + { + { "total_row_count", response.Manifest.TotalRowCount }, + { "total_byte_count", response.Manifest.TotalByteCount } + })); + } - // Get schema from reader - var schema = reader.Schema; + // Telemetry: signal OnFirstBatchReady once the polling loop has confirmed SUCCEEDED + // state (gap B4 / "reader latencies on 100% of successful EXECUTE records"). Firing + // here rather than inside CreateReader / CreateCloudFetchReader covers ALL successful + // paths uniformly — inline-arrow with attachment bytes, CloudFetch with external + // links, AND the two empty-result paths (null manifest, manifest-without-data) that + // produce an EmptyArrowArrayStream. Without this earlier firepoint, empty-result + // executions would report only OnConsumed and no OnFirstBatchReady — populating + // result_set_consumption_latency_millis but leaving result_set_ready_latency_millis + // unset, which the lumberjack analysis surfaced as the 22% missing-latency tail. + // The TelemetryObserver's "first wins" guard makes any downstream redundant fires a + // no-op; this is the single canonical fire-point for SEA. + _observer.OnFirstBatchReady(_executeStopwatch.ElapsedMilliseconds); + + // Create appropriate reader based on result disposition. + // All paths (inline, CloudFetch, empty) expose the manifest schema so that + // IntervalSerializingStream and ComplexTypeSerializingStream can detect columns + // uniformly via Spark:DataType:SqlName metadata rather than Arrow IPC types. + IArrowArrayStream reader = CreateReader(response, cancellationToken); + + // Capture the underlying CloudFetchReader (if any) before any wrapping layers. + // ConsumptionObservingStream uses this at Dispose to snapshot chunk metrics and + // fire OnChunksDownloaded — gating on a non-null reference is what makes the + // inline path skip OnChunksDownloaded entirely (gap G3 / design §6 row 5). + // CreateReader returns one of CloudFetchReader, InlineArrowStreamReader, or + // EmptyArrowArrayStream, so the `as` cast cleanly partitions the two SEA paths + // without needing an out-parameter on CreateReader. + CloudFetchReader? cloudFetchReader = reader as CloudFetchReader; + + // SEA emits YearMonthIntervalType and DurationType; Thrift emits StringType for intervals. + // Convert interval/duration columns to canonical UTF-8 strings to match Thrift behavior. + reader = new IntervalSerializingStream(reader); + + // When EnableComplexDatatypeSupport=false (default), serialize complex Arrow types to JSON strings + // so that SEA behavior matches Thrift (which sets ComplexTypesAsArrow=false). + if (!_enableComplexDatatypeSupport) + { + reader = new ComplexTypeSerializingStream(reader); + } - // Return query result - use 0 if row count is not available - long rowCount = response.Manifest?.TotalRowCount ?? 0; - return new QueryResult(rowCount, reader); + // Telemetry: wrap the final reader so OnConsumed (and OnChunksDownloaded on the + // CloudFetch path) fire at consumer Dispose (gap G3 / design §6 row 5). Wrapping + // at the outermost layer covers both the inline and CloudFetch paths uniformly — + // every inner transform (IntervalSerializingStream, ComplexTypeSerializingStream, + // the underlying InlineArrowStreamReader / CloudFetchReader) already propagates + // Dispose to its inner stream, so the consumer's Dispose reaches this wrapper + // first and ElapsedMilliseconds captures the full consume-window before teardown + // costs are incurred. The wrapper is idempotent — both observer signals fire at + // most once even if Dispose is invoked multiple times. Passing a non-null + // cloudFetchReader is the CloudFetch-path opt-in for OnChunksDownloaded; the + // inline path passes null and OnChunksDownloaded is intentionally never fired. + // + // Tracking the wrapper in _consumptionStream lets Statement.Dispose call + // EnsureObserverSignaled() before OnFinalized, guaranteeing OnConsumed fires on + // EVERY successful EXECUTE record even when the consumer abandons the reader + // without disposing it (gap B4). EnsureObserverSignaled() is idempotent with the + // wrapper's own Dispose, so whichever path runs first wins and the other is a + // no-op — matching the Thrift driver's RecordResultsConsumed-on-statement-dispose + // behavior. + var consumptionStream = new ConsumptionObservingStream(reader, _observer, _executeStopwatch, cloudFetchReader); + _consumptionStream = consumptionStream; + reader = consumptionStream; + + // Get schema from reader + var schema = reader.Schema; + + // Return query result - use 0 if row count is not available + long rowCount = response.Manifest?.TotalRowCount ?? 0; + return new QueryResult(rowCount, reader); + } + catch (Exception ex) + { + // Telemetry: signal OnError on any failure path (server error, terminal FAILED/CANCELED/CLOSED + // state, polling cancellation, reader construction). The observer contract guarantees the call + // swallows exceptions, so no try/catch is wrapped around the observer call itself. + _observer.OnError(ex); + throw; + } } /// @@ -452,6 +663,16 @@ private async Task PollWithTimeoutAsync(string stateme /// private async Task PollUntilCompleteAsync(string statementId, CancellationToken cancellationToken) { + // Telemetry: accumulate the count and wall-clock latency of GetStatementAsync calls so we can + // emit a single OnPollCompleted on terminal state. Per IStatementOperationObserver, latencyMs is + // "sum of wall-clock time spent in poll calls" — i.e. the GetStatementAsync calls themselves, + // not the inter-poll Task.Delay backoff. OnPollCompleted is intentionally NOT called on + // cancellation/exception paths: those flow through ExecuteQueryInternalAsync's catch and trigger + // OnError; emitting a partial poll record there would muddle the telemetry contract. + int pollCount = 0; + long totalPollLatencyMs = 0; + var pollStopwatch = new Stopwatch(); + while (true) { // Check for cancellation before each polling iteration @@ -463,8 +684,12 @@ private async Task PollUntilCompleteAsync(string state // Check for cancellation after delay cancellationToken.ThrowIfCancellationRequested(); - // Get statement status + // Get statement status (timed for poll telemetry) + pollCount++; + pollStopwatch.Restart(); var response = await _client.GetStatementAsync(statementId, cancellationToken).ConfigureAwait(false); + pollStopwatch.Stop(); + totalPollLatencyMs += pollStopwatch.ElapsedMilliseconds; // Convert GetStatementResponse to ExecuteStatementResponse var executeResponse = new ExecuteStatementResponse @@ -482,6 +707,8 @@ private async Task PollUntilCompleteAsync(string state state == "CANCELED" || state == "CLOSED") { + // Telemetry: emit exactly once on terminal state with accumulated count/latency. + _observer.OnPollCompleted(pollCount, totalPollLatencyMs); return executeResponse; } @@ -523,6 +750,9 @@ private IArrowArrayStream CreateReader(ExecuteStatementResponse response, Cancel // Inline results - may be split across multiple chunks. // Pass the manifest schema so the reader reports it instead of the IPC-embedded schema, // keeping inline consistent with CloudFetch (which also uses the manifest schema). + // OnFirstBatchReady is fired by ExecuteQueryInternalAsync before this method is called, + // covering all reader paths (inline, CloudFetch, empty) from a single fire-point so + // result_set_ready_latency_millis populates on 100% of successful EXECUTE records. int totalChunks = response.Manifest?.Chunks?.Count ?? 1; return new InlineArrowStreamReader(_client, _currentStatementId!, response.Result.Attachment, isLz4Compressed, totalChunks, _lz4BufferPool, cancellationToken, GetSchemaFromManifest(response.Manifest)); @@ -541,6 +771,9 @@ private IArrowArrayStream CreateReader(ExecuteStatementResponse response, Cancel /// private IArrowArrayStream CreateCloudFetchReader(ExecuteStatementResponse response, Schema manifestSchema) { + // OnFirstBatchReady is fired by ExecuteQueryInternalAsync before CreateReader dispatches + // here, covering inline, CloudFetch, and empty paths from a single fire-point. This keeps + // result_set_ready_latency_millis populated on 100% of successful EXECUTE records (gap B4). var manifest = response.Manifest!; var schema = manifestSchema; @@ -663,56 +896,108 @@ public async Task ExecuteUpdateAsync(CancellationToken cancellatio private async Task ExecuteUpdateInternalAsync(CancellationToken cancellationToken) { - // Build the execute statement request - // Note: catalog/schema cannot be set when session_id is provided (session has context) - var request = new ExecuteStatementRequest - { - Statement = _sqlQuery, - WarehouseId = _warehouseId, - SessionId = _sessionId, - Catalog = string.IsNullOrEmpty(_sessionId) ? _catalog : null, - Schema = string.IsNullOrEmpty(_sessionId) ? _schema : null, - Disposition = _resultDisposition, - Format = _resultFormat, - ResultCompression = _resultCompression, - WaitTimeout = $"{_waitTimeoutSeconds}s", - OnWaitTimeout = "CONTINUE", - IsMetadata = false, - QueryTags = ParseQueryTags(_queryTags) - }; - - // Execute the statement - var response = await _client.ExecuteStatementAsync(request, cancellationToken).ConfigureAwait(false); - _currentStatementId = response.StatementId; + // Telemetry: signal OnExecuteStarted before submitting the statement. UPDATE + // statements (DML and DDL) share the same operation_type as queries on SEA — + // EXECUTE_STATEMENT_ASYNC, because SEA is always async on the wire (the client + // submits a statement and polls for completion regardless of whether the SQL is + // INSERT/UPDATE/DELETE or CREATE/DROP/CTAS). Only the StatementType differs: + // Update here vs Query in ExecuteQueryInternalAsync. Mirroring the Query path + // ensures B9 — UPDATE statements emit telemetry on SEA — closes: without these + // hookpoints, ExecuteUpdate would never fire OnExecuteStarted, _executeStarted + // would stay false, and Dispose's gated OnFinalized would skip the log entirely, + // producing the zero-telemetry shape observed in the lumberjack comparator. + // isCompressed reflects the requested result compression — for UPDATE the response + // typically has no result data, but recording what we asked for matches the + // QUERY-path contract and the Thrift driver's BeginExecuteTelemetry behavior. + bool isCompressed = !string.IsNullOrEmpty(_resultCompression) + && string.Equals(_resultCompression, "LZ4_FRAME", StringComparison.OrdinalIgnoreCase); + // _executeStarted is set in lockstep with OnExecuteStarted so the dispose-path + // OnFinalized fires for UPDATE statements just as it does for queries. Setting + // before the observer call is safe: the observer contract guarantees + // OnExecuteStarted does not throw, and even if it did, treating a partially + // observed execute as "started" matches the Thrift path's ordering. + _executeStarted = true; + // Start the stopwatch alongside OnExecuteStarted so the operation_latency_ms + // recorded at OnFinalized covers the full execute window for UPDATE statements + // just as it does for queries. + _executeStopwatch.Start(); + _observer.OnExecuteStarted(StatementType.Update, OperationType.ExecuteStatementAsync, isCompressed); - // Handle query status - poll until complete - var state = response.Status?.State; - if (state == "PENDING" || state == "RUNNING") + try { - response = await PollWithTimeoutAsync(response.StatementId, cancellationToken).ConfigureAwait(false); - state = response.Status?.State; - } + // Build the execute statement request + // Note: catalog/schema cannot be set when session_id is provided (session has context) + var request = new ExecuteStatementRequest + { + Statement = _sqlQuery, + WarehouseId = _warehouseId, + SessionId = _sessionId, + Catalog = string.IsNullOrEmpty(_sessionId) ? _catalog : null, + Schema = string.IsNullOrEmpty(_sessionId) ? _schema : null, + Disposition = _resultDisposition, + Format = _resultFormat, + ResultCompression = _resultCompression, + WaitTimeout = $"{_waitTimeoutSeconds}s", + OnWaitTimeout = "CONTINUE", + IsMetadata = false, + QueryTags = ParseQueryTags(_queryTags) + }; - // Check for terminal error states - if (state == "FAILED") - { - var error = response.Status?.Error; - throw new AdbcException($"Statement execution failed: {error?.Message ?? "Unknown error"} (Error Code: {error?.ErrorCode})"); - } - if (state == "CANCELED") - { - throw new AdbcException("Statement execution was canceled"); + // Execute the statement + var response = await _client.ExecuteStatementAsync(request, cancellationToken).ConfigureAwait(false); + _currentStatementId = response.StatementId; + + // Telemetry: signal OnExecuteSucceeded as soon as the server has accepted the + // statement and a statement id is known — mirrors the Query path. The result + // format derived by SeaResultFormatMapper may resolve to Unspecified for UPDATE + // (DDL has no manifest, and DML's num_affected_rows attachment is materialized + // post-poll), but the cell still records the requested (disposition, format) + // pair which is the contract the observer expects. + _observer.OnExecuteSucceeded( + response.StatementId ?? string.Empty, + SeaResultFormatMapper.Map(_resultDisposition, _resultFormat, response)); + + // Handle query status - poll until complete + var state = response.Status?.State; + if (state == "PENDING" || state == "RUNNING") + { + response = await PollWithTimeoutAsync(response.StatementId, cancellationToken).ConfigureAwait(false); + state = response.Status?.State; + } + + // Check for terminal error states + if (state == "FAILED") + { + var error = response.Status?.Error; + throw new AdbcException($"Statement execution failed: {error?.Message ?? "Unknown error"} (Error Code: {error?.ErrorCode})"); + } + if (state == "CANCELED") + { + throw new AdbcException("Statement execution was canceled"); + } + if (state == "CLOSED") + { + throw new AdbcException("Statement was closed before results could be retrieved"); + } + + // DML statements (INSERT, UPDATE, DELETE) return a 1-row result whose first + // column is num_affected_rows. DDL (CREATE TABLE, DROP TABLE, CTAS, etc.) + // returns no result data. Return -1 for DDL per the ADBC convention for + // "unknown or not applicable", matching what the Thrift path does. + return new UpdateResult(ReadNumAffectedRows(response.Manifest, response.Result)); } - if (state == "CLOSED") - { - throw new AdbcException("Statement was closed before results could be retrieved"); + catch (Exception ex) + { + // Telemetry: signal OnError on any failure path (server error, terminal + // FAILED/CANCELED/CLOSED state, polling cancellation, num_affected_rows + // parse failures). Mirrors the Query-path catch. The observer contract + // guarantees the call swallows exceptions, so no try/catch is wrapped + // around the observer call itself. OnFinalized still fires once via the + // gated path in Dispose (Interlocked CAS makes the observer's OnFinalized + // idempotent — exactly-once on the Update error+Dispose sequence). + _observer.OnError(ex); + throw; } - - // DML statements (INSERT, UPDATE, DELETE) return a 1-row result whose first - // column is num_affected_rows. DDL (CREATE TABLE, DROP TABLE, CTAS, etc.) - // returns no result data. Return -1 for DDL per the ADBC convention for - // "unknown or not applicable", matching what the Thrift path does. - return new UpdateResult(ReadNumAffectedRows(response.Manifest, response.Result)); } private static long ReadNumAffectedRows(ResultManifest? manifest, ResultData? result) @@ -758,8 +1043,16 @@ private static long ReadNumAffectedRows(ResultManifest? manifest, ResultData? re /// public override void Dispose() { + // Capture the statement id and close-RPC outcome locally so we can populate the + // CLOSE_STATEMENT telemetry event below even after _currentStatementId is cleared. + string? closedStatementId = _currentStatementId; + long closeRpcElapsedMs = 0; + Exception? closeRpcError = null; + bool closeRpcAttempted = false; + if (_currentStatementId != null) { + var closeStopwatch = Stopwatch.StartNew(); try { // Close statement synchronously during dispose @@ -768,11 +1061,13 @@ public override void Dispose() { { "statement_id", _currentStatementId } })); + closeRpcAttempted = true; _client.CloseStatementAsync(_currentStatementId, CancellationToken.None).GetAwaiter().GetResult(); } catch (Exception ex) { // Best effort - ignore errors during dispose + closeRpcError = ex; Activity.Current?.AddEvent(new ActivityEvent("statement.dispose.error", tags: new ActivityTagsCollection { @@ -781,9 +1076,59 @@ public override void Dispose() } finally { + closeStopwatch.Stop(); + closeRpcElapsedMs = closeStopwatch.ElapsedMilliseconds; _currentStatementId = null; } } + + // Safety net for reader latency telemetry (gap B4): if a successful execute + // constructed a result reader but the consumer abandoned it without calling + // Dispose, OnConsumed (and OnChunksDownloaded on the CloudFetch path) would + // never fire — leaving result_set_consumption_latency_millis unset on every + // such record. The lumberjack analysis of 22% missing-latency records traced + // back to exactly this pattern. EnsureObserverSignaled fires the missing + // observer signals NOW (using the same execute-time Stopwatch the reader's + // own Dispose would have used) without disposing the inner stream, so a + // later consumer Dispose still releases inner resources and the second + // signal attempt is a no-op via the wrapper's idempotency gate. Matches the + // Thrift driver's EmitTelemetry, which always calls RecordResultsConsumed + // on statement Dispose regardless of whether the consumer iterated the reader. + _consumptionStream?.EnsureObserverSignaled(); + + // Terminal observer signal: build the OssSqlDriverTelemetryLog and enqueue it for + // background export. Gated on _executeStarted so a never-executed statement does + // not produce a stray empty log (mirrors the Thrift path's _executeStarted gate). + // The observer's own OnFinalized is idempotent via Interlocked CAS, so subsequent + // Dispose() calls or an error path that also finalizes are no-ops downstream — this + // call is the only place SEA telemetry reaches eng_lumberjack today, so it must + // fire for every executed statement regardless of CloseStatementAsync's outcome. + if (_executeStarted) + { + _observer.OnFinalized(); + } + + // Emit a CLOSE_STATEMENT operation telemetry event once per statement disposal, + // independently of the EXECUTE_STATEMENT_ASYNC log produced by the observer's + // OnFinalized above. operation_latency_ms reflects the wall-clock duration of + // the CloseStatementAsync RPC just issued; if no close RPC was attempted (the + // statement was disposed without ever running an execute, so _currentStatementId + // was null), elapsedMs is 0 and the event acts purely as a lifecycle marker — + // matching the Thrift driver's behavior in DatabricksStatement.Dispose. The + // _closeStatementTelemetryEmitted gate makes the emission idempotent across + // repeated Dispose() calls. EmitStatementOperationTelemetry on the connection + // swallows any exception from the underlying telemetry implementation so a + // telemetry failure cannot surface from Dispose. + if (!_closeStatementTelemetryEmitted) + { + _closeStatementTelemetryEmitted = true; + _connection.EmitStatementOperationTelemetry( + OperationType.CloseStatement, + StatementType.Unspecified, + statementId: closedStatementId, + elapsedMs: closeRpcAttempted ? closeRpcElapsedMs : 0, + error: closeRpcError); + } } public override void Cancel() @@ -826,6 +1171,151 @@ public void Dispose() } } + /// + /// Decorator that wraps the final result reader and signals + /// (and + /// on the CloudFetch + /// path) exactly once at consumer Dispose (gap G3 / design §6 rows 5–6). The + /// latencyMs argument is elapsed-since-execute-start so the telemetry field + /// result_latency.result_set_consumption_latency_millis represents the + /// total wall-clock from execute → reader consumption end, matching the Thrift + /// path's StatementTelemetryContext.ExecuteStopwatch semantic. + /// + /// Placement: wrapping at the outermost transform layer + /// (after and + /// ) is intentional. Each inner + /// transform already forwards to its inner stream, so the + /// consumer's Dispose reaches this wrapper first and the stopwatch read captures + /// the full consume window before teardown costs are incurred. Firing inside the + /// nested would miss the CloudFetch path; + /// firing inside the shared CloudFetchReader would also re-emit on Thrift + /// queries that share the same reader class. + /// + /// CloudFetch chunk metrics: when constructed with a + /// non-null cloudFetchReader, Dispose snapshots GetChunkMetrics() + /// before disposing the inner chain — the inner Dispose tears down the download + /// manager, so we must query the aggregator while it is still live. If the + /// aggregator throws or returns null we fall back to a fresh empty + /// (the design's documented dependency on the gap-fix + /// plumbing: proto fields are nullable on the wire, and an empty metrics object + /// is preferable to dropping the signal entirely). The inline path passes + /// null here and + /// is never fired — matching the Thrift path, which gates emission on the result + /// stream actually being a CloudFetchReader. + /// + /// Idempotency: only the first Dispose signals + /// OnConsumed / OnChunksDownloaded; subsequent calls are no-ops. + /// The observer contract is fail-open (SafeObserver swallows exceptions), but the + /// try/finally guarantees inner cleanup even if a hand-rolled observer bypasses + /// that contract. + /// + private sealed class ConsumptionObservingStream : IArrowArrayStream + { + private readonly IArrowArrayStream _inner; + private readonly IStatementOperationObserver _observer; + private readonly Stopwatch _executeStopwatch; + private readonly CloudFetchReader? _cloudFetchReader; + + // Separate gates so consumer Dispose and statement-Dispose safety-net can be + // ordered independently. _observerSignaled is the once-only guard for the + // observer signals (OnChunksDownloaded + OnConsumed); _disposed is the once- + // only guard for inner-stream teardown. Both use Interlocked CAS so concurrent + // invocation from the reader's disposal thread and the statement's dispose + // thread is safe (per IStatementOperationObserver thread-safety contract). + private int _observerSignaled; + private int _disposed; + + public ConsumptionObservingStream( + IArrowArrayStream inner, + IStatementOperationObserver observer, + Stopwatch executeStopwatch, + CloudFetchReader? cloudFetchReader = null) + { + _inner = inner ?? throw new ArgumentNullException(nameof(inner)); + _observer = observer ?? throw new ArgumentNullException(nameof(observer)); + _executeStopwatch = executeStopwatch ?? throw new ArgumentNullException(nameof(executeStopwatch)); + // Nullable on purpose — non-null only on the CloudFetch path. The inline + // and empty paths intentionally do not signal OnChunksDownloaded. + _cloudFetchReader = cloudFetchReader; + } + + public Schema Schema => _inner.Schema; + + public ValueTask ReadNextRecordBatchAsync(CancellationToken cancellationToken = default) + => _inner.ReadNextRecordBatchAsync(cancellationToken); + + /// + /// Fires (CloudFetch + /// path only) and exactly + /// once, in that order, then returns without touching the inner stream. Safe to + /// call multiple times and from multiple threads — subsequent calls are no-ops + /// via the _observerSignaled Interlocked CAS gate. Invoked from + /// as a safety net so reader + /// latency telemetry fires on EVERY successful EXECUTE record, even when the + /// consumer abandons the result reader without disposing it (gap B4). Also + /// invoked from itself so the consumer's normal flow still + /// produces these signals before inner teardown. + /// + public void EnsureObserverSignaled() + { + if (System.Threading.Interlocked.CompareExchange(ref _observerSignaled, 1, 0) != 0) + { + return; + } + + // Read the stopwatch and snapshot chunk metrics BEFORE the inner stream is + // disposed (the consumer may not have called Dispose yet, but if they have, + // we're racing with their Dispose). The chunk metrics aggregator is held by + // the inner CloudFetchReader's download manager; querying it before any + // inner teardown lets us capture state even when statement-Dispose runs + // first. Fall back to an empty ChunkMetrics if the aggregator throws or + // returns null — telemetry must never fail driver operations, and the + // proto fields are nullable so an empty metrics object is a valid payload. + if (_cloudFetchReader != null) + { + ChunkMetrics metrics; + try + { + metrics = _cloudFetchReader.GetChunkMetrics() ?? new ChunkMetrics(); + } + catch + { + metrics = new ChunkMetrics(); + } + + // Emit before OnConsumed to match the Thrift path's emission order in + // DatabricksStatement.FinalizeExecuteTelemetry — chunk_details and + // result_latency end up on the same OssSqlDriverTelemetryLog regardless + // of order, but keeping a single canonical sequence simplifies any + // future record-assembly reasoning. + _observer.OnChunksDownloaded(metrics); + } + + _observer.OnConsumed(_executeStopwatch.ElapsedMilliseconds); + } + + public void Dispose() + { + if (System.Threading.Interlocked.CompareExchange(ref _disposed, 1, 0) != 0) + { + return; + } + + // Fire observer signals first (if not already signaled by the statement's + // safety net) so the stopwatch read captures "time until the consumer let + // go" rather than "time until inner teardown completes". try/finally + // guarantees inner cleanup even on observer fault. + try + { + EnsureObserverSignaled(); + } + finally + { + _inner.Dispose(); + } + } + } + /// /// Reader for inline results in Arrow IPC stream format. /// Handles both single-chunk and multi-chunk inline results by fetching @@ -1038,7 +1528,7 @@ private async Task GetCatalogsAsync(CancellationToken cancellationT // matching Thrift behavior (Thrift RPC has no catalog filter for GetCatalogs). string sql = new ShowCatalogsCommand(null).Build(); activity?.SetTag("sql_query", sql); - var batches = await _connection.ExecuteMetadataSqlAsync(sql, cancellationToken).ConfigureAwait(false); + var batches = await _connection.ExecuteMetadataSqlAsync(sql, OperationType.ListCatalogs, cancellationToken).ConfigureAwait(false); var tableCatBuilder = new StringArray.Builder(); int count = 0; @@ -1084,7 +1574,7 @@ private async Task GetSchemasAsync(CancellationToken cancellationTo List batches; try { - batches = await _connection.ExecuteMetadataSqlAsync(sql, cancellationToken).ConfigureAwait(false); + batches = await _connection.ExecuteMetadataSqlAsync(sql, OperationType.ListSchemas, cancellationToken).ConfigureAwait(false); } catch (DatabricksException ex) when (ex.IsObjectNotFoundException()) { @@ -1161,7 +1651,7 @@ private async Task GetTablesAsync(CancellationToken cancellationTok List batches; try { - batches = await _connection.ExecuteMetadataSqlAsync(sql, cancellationToken).ConfigureAwait(false); + batches = await _connection.ExecuteMetadataSqlAsync(sql, OperationType.ListTables, cancellationToken).ConfigureAwait(false); } catch (DatabricksException ex) when (ex.IsObjectNotFoundException()) { @@ -1356,7 +1846,7 @@ private async Task GetColumnsExtendedViaDescTableAsync(string? cata List batches; try { - batches = await _connection.ExecuteMetadataSqlAsync(query, cancellationToken).ConfigureAwait(false); + batches = await _connection.ExecuteMetadataSqlAsync(query, OperationType.ListColumns, cancellationToken).ConfigureAwait(false); } catch (DatabricksException ex) when (ex.IsObjectNotFoundException()) { @@ -1480,7 +1970,7 @@ private async Task GetPrimaryKeysAsync(CancellationToken cancellati List batches; try { - batches = await _connection.ExecuteMetadataSqlAsync(sql, cancellationToken).ConfigureAwait(false); + batches = await _connection.ExecuteMetadataSqlAsync(sql, OperationType.ListPrimaryKeys, cancellationToken).ConfigureAwait(false); } catch (DatabricksException ex) when (ex.IsObjectNotFoundException()) { @@ -1556,7 +2046,7 @@ private async Task FetchCrossReferenceAsync( List batches; try { - batches = await _connection.ExecuteMetadataSqlAsync(sql, cancellationToken).ConfigureAwait(false); + batches = await _connection.ExecuteMetadataSqlAsync(sql, OperationType.ListCrossReferences, cancellationToken).ConfigureAwait(false); } catch (DatabricksException ex) when (ex.IsObjectNotFoundException()) { diff --git a/csharp/src/Telemetry/ConnectionTelemetry.cs b/csharp/src/Telemetry/ConnectionTelemetry.cs index 00f87576..a32f950b 100644 --- a/csharp/src/Telemetry/ConnectionTelemetry.cs +++ b/csharp/src/Telemetry/ConnectionTelemetry.cs @@ -29,7 +29,6 @@ using AdbcDrivers.HiveServer2; using AdbcDrivers.HiveServer2.Spark; using Apache.Arrow.Adbc; -using Apache.Hive.Service.Rpc.Thrift; namespace AdbcDrivers.Databricks.Telemetry { @@ -62,12 +61,22 @@ internal ConnectionTelemetry( /// Returns if telemetry is disabled, misconfigured, or fails to initialize. /// Never throws. /// + /// + /// The transport-agnostic session id (a GUID string for Thrift, server-assigned id for SEA). + /// Callers convert at the boundary so this method has no transport-specific dependency. + /// + /// + /// Driver transport mode (THRIFT or SEA) stamped onto + /// driver_connection_params.mode. Threaded through from the caller so the + /// telemetry payload reflects the real transport. + /// public static IConnectionTelemetry Create( IReadOnlyDictionary properties, string host, string assemblyVersion, OAuthClientCredentialsProvider? oauthTokenProvider, - TSessionHandle? sessionHandle, + string sessionId, + Proto.DriverMode.Types.Type mode, bool enableDirectResults, bool useDescTableExtended, int connectTimeoutMilliseconds, @@ -115,15 +124,13 @@ public static IConnectionTelemetry Create( SafeBuildSystemConfiguration(assemblyVersion, activity); Proto.DriverConnectionParameters driverConnectionParams = SafeBuildDriverConnectionParams( - properties, host, enableDirectResults, useDescTableExtended, + properties, host, mode, enableDirectResults, useDescTableExtended, connectTimeoutMilliseconds, activity); string authType = SafeDetermineAuthType(properties, activity); var session = new TelemetrySessionContext { - SessionId = sessionHandle?.SessionId?.Guid != null - ? new System.Guid(sessionHandle.SessionId.Guid).ToString() - : null, + SessionId = !string.IsNullOrEmpty(sessionId) ? sessionId : null, TelemetryClient = telemetryClient, SystemConfiguration = systemConfiguration, DriverConnectionParams = driverConnectionParams, @@ -430,6 +437,7 @@ internal static Proto.DriverSystemConfiguration SafeBuildSystemConfiguration( internal static Proto.DriverConnectionParameters SafeBuildDriverConnectionParams( IReadOnlyDictionary properties, string host, + Proto.DriverMode.Types.Type mode, bool enableDirectResults, bool useDescTableExtended, int connectTimeoutMilliseconds, @@ -438,7 +446,7 @@ internal static Proto.DriverConnectionParameters SafeBuildDriverConnectionParams try { return BuildDriverConnectionParams( - properties, host, enableDirectResults, useDescTableExtended, + properties, host, mode, enableDirectResults, useDescTableExtended, connectTimeoutMilliseconds); } catch (Exception ex) @@ -455,7 +463,7 @@ internal static Proto.DriverConnectionParameters SafeBuildDriverConnectionParams return new Proto.DriverConnectionParameters { HttpPath = string.Empty, - Mode = Proto.DriverMode.Types.Type.Thrift, + Mode = mode, HostInfo = new Proto.HostDetails { HostUrl = host ?? string.Empty, @@ -624,6 +632,7 @@ internal static string UnquoteOsReleaseValue(string raw) internal static Proto.DriverConnectionParameters BuildDriverConnectionParams( IReadOnlyDictionary properties, string host, + Proto.DriverMode.Types.Type mode, bool enableDirectResults, bool useDescTableExtended, int connectTimeoutMilliseconds) @@ -639,7 +648,7 @@ internal static Proto.DriverConnectionParameters BuildDriverConnectionParams( var connectionParams = new Proto.DriverConnectionParameters { HttpPath = httpPath ?? "", - Mode = Proto.DriverMode.Types.Type.Thrift, + Mode = mode, HostInfo = new Proto.HostDetails { // Bare hostname, matching JDBC. Scheme is implicit (always https) and diff --git a/csharp/src/Telemetry/IStatementOperationObserver.cs b/csharp/src/Telemetry/IStatementOperationObserver.cs new file mode 100644 index 00000000..12600efd --- /dev/null +++ b/csharp/src/Telemetry/IStatementOperationObserver.cs @@ -0,0 +1,130 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; + +namespace AdbcDrivers.Databricks.Telemetry +{ + /// + /// Observer of a single statement's operational lifecycle. Sits between the statement + /// classes (Thrift and SEA) and the telemetry implementation so the statement code + /// does not depend on telemetry types directly. + /// + /// + /// + /// Fail-open contract: Implementations MUST NOT throw from any method on + /// this interface. All exceptions raised inside an implementation must be swallowed + /// internally — callsites in statement code intentionally contain no try/catch around + /// observer calls. Telemetry must never affect the caller's control flow. + /// + /// + /// Thread-safety: Methods on this interface may be invoked from any thread. + /// Implementations MUST be thread-safe. In practice the calls happen from the + /// statement's executing task and the reader's disposal thread, which may differ. + /// + /// + /// Terminal call: is the terminal call. After it has + /// been invoked once, the observer's record is considered complete and any further + /// calls — including additional calls — MUST be no-ops + /// (idempotent). This protects against the common case where both an error path + /// and a dispose path attempt to finalize the same statement. + /// + /// + /// Non-telemetry uses: The interface is shaped around the statement's lifecycle + /// rather than the telemetry data model so future observers (tracing, audit) can be + /// added without changing statement code. + /// + /// + internal interface IStatementOperationObserver + { + /// + /// Called once just before the statement is submitted to the server. + /// + /// The statement type (QUERY, UPDATE, METADATA, ...). + /// The operation type (EXECUTE_STATEMENT, EXECUTE_STATEMENT_ASYNC, ...). + /// Whether results are expected to be compressed (LZ4). + void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed); + + /// + /// Called once after the server has accepted the statement and a statement id is known. + /// + /// The server-assigned statement id. + /// The result format inferred for the execution (INLINE_ARROW, EXTERNAL_LINKS, ...). + void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat); + + /// + /// Called once after the polling loop reaches a terminal state, with the accumulated + /// poll count and total elapsed poll latency. + /// + /// Total number of status-poll calls issued. + /// Sum of wall-clock time spent in poll calls, in milliseconds. + void OnPollCompleted(int count, long latencyMs); + + /// + /// Called when the first batch of results is available to the reader. + /// Implementations should treat repeated calls as a no-op (only the first wins). + /// + /// Elapsed time from execute start to first batch ready, in milliseconds. + void OnFirstBatchReady(long latencyMs); + + /// + /// Called when the reader has been fully consumed (or disposed). + /// + /// Elapsed time from execute start to results fully consumed, in milliseconds. + void OnConsumed(long latencyMs); + + /// + /// Called once when chunk metrics for the CloudFetch download are available. + /// + /// Aggregated CloudFetch chunk metrics. Implementations must + /// tolerate empty / default metrics if the gap-fix plumbing has not landed yet. + void OnChunksDownloaded(ChunkMetrics metrics); + + /// + /// Called once after the active result reader has been inspected, with the values + /// that should override any defaults previously stamped on the observer at execute + /// time. The statement invokes this from the finalize path so the emitted record + /// reflects the server-reported truth for the active reader (PECO-2988, PECO-2978) + /// rather than the connection-level capability flags used as placeholders earlier + /// in the lifecycle. Implementations should overwrite their stored result-format + /// and compression fields unconditionally. + /// + /// The result format reported by the active reader + /// (e.g. when CloudFetch is + /// active, otherwise). + /// Whether the active reader's payload is LZ4-compressed, + /// per the server's TGetResultSetMetadataResp.Lz4Compressed flag. + void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed); + + /// + /// Called once if the statement execution fails. Implementations should record the + /// error and continue to accept further calls; an explicit + /// call is still required to terminate the observer. + /// + /// The exception that occurred. + void OnError(Exception ex); + + /// + /// Terminal call. Implementations build and dispatch any pending record and mark + /// the observer as finalized. Must be idempotent: repeated calls are no-ops. + /// + void OnFinalized(); + } +} diff --git a/csharp/src/Telemetry/NullObserver.cs b/csharp/src/Telemetry/NullObserver.cs new file mode 100644 index 00000000..b2d65a73 --- /dev/null +++ b/csharp/src/Telemetry/NullObserver.cs @@ -0,0 +1,99 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; + +namespace AdbcDrivers.Databricks.Telemetry +{ + /// + /// Singleton no-op implementation of . + /// Used as the default observer so callsites in statement classes never need to + /// null-check before calling observer methods. + /// + /// + /// All methods are intentionally empty. They satisfy the fail-open, thread-safe, + /// and idempotent contract trivially. + /// + internal sealed class NullObserver : IStatementOperationObserver + { + /// + /// The singleton instance. Use this directly rather than constructing new instances. + /// + public static readonly NullObserver Instance = new NullObserver(); + + private NullObserver() + { + } + + /// + public void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed) + { + // No-op. + } + + /// + public void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat) + { + // No-op. + } + + /// + public void OnPollCompleted(int count, long latencyMs) + { + // No-op. + } + + /// + public void OnFirstBatchReady(long latencyMs) + { + // No-op. + } + + /// + public void OnConsumed(long latencyMs) + { + // No-op. + } + + /// + public void OnChunksDownloaded(ChunkMetrics metrics) + { + // No-op. + } + + /// + public void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed) + { + // No-op. + } + + /// + public void OnError(Exception ex) + { + // No-op. + } + + /// + public void OnFinalized() + { + // No-op. Idempotent by construction. + } + } +} diff --git a/csharp/src/Telemetry/SafeObserver.cs b/csharp/src/Telemetry/SafeObserver.cs new file mode 100644 index 00000000..dd20a49b --- /dev/null +++ b/csharp/src/Telemetry/SafeObserver.cs @@ -0,0 +1,151 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Diagnostics; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; + +namespace AdbcDrivers.Databricks.Telemetry +{ + /// + /// Belt-and-suspenders decorator over any . + /// Wraps every interface method in a per-method try/catch so an inner observer that + /// violates the fail-open contract cannot + /// surface its exception to the statement callsite. + /// + /// + /// + /// The first-party observers in this assembly (, + /// ) already honor the fail-open contract internally. + /// This decorator exists for the optional case where a third-party observer is + /// plugged in — see design §12, "optional SafeObserver decorator is available for + /// future third-party observer implementations that may not honor the contract." + /// + /// + /// Swallowed exceptions are recorded as an OpenTelemetry activity event + /// (telemetry.observer.suppressed) on the ambient , if + /// any. This is the codebase's convention for trace-level diagnostics: it remains + /// visible in distributed traces and verbose tooling without polluting standard + /// logs or affecting the caller's control flow. + /// + /// + /// Thread-safety: Inherits whatever thread-safety the inner observer provides. + /// The decorator itself stores only the inner reference and adds no mutable state. + /// + /// + internal sealed class SafeObserver : IStatementOperationObserver + { + private readonly IStatementOperationObserver _inner; + + /// + /// Wraps with a per-method try/catch. + /// + /// The observer to delegate to. Required. + /// Thrown if is null. + public SafeObserver(IStatementOperationObserver inner) + { + _inner = inner ?? throw new ArgumentNullException(nameof(inner)); + } + + /// + /// Exposes the wrapped observer for tests and diagnostic introspection. + /// + internal IStatementOperationObserver Inner => _inner; + + /// + public void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed) => + Safe(() => _inner.OnExecuteStarted(stmtType, opType, isCompressed)); + + /// + public void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat) => + Safe(() => _inner.OnExecuteSucceeded(statementId, resultFormat)); + + /// + public void OnPollCompleted(int count, long latencyMs) => + Safe(() => _inner.OnPollCompleted(count, latencyMs)); + + /// + public void OnFirstBatchReady(long latencyMs) => + Safe(() => _inner.OnFirstBatchReady(latencyMs)); + + /// + public void OnConsumed(long latencyMs) => + Safe(() => _inner.OnConsumed(latencyMs)); + + /// + public void OnChunksDownloaded(ChunkMetrics metrics) => + Safe(() => _inner.OnChunksDownloaded(metrics)); + + /// + public void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed) => + Safe(() => _inner.OnReaderInspected(resultFormat, isCompressed)); + + /// + public void OnError(Exception ex) => + Safe(() => _inner.OnError(ex)); + + /// + public void OnFinalized() => + Safe(() => _inner.OnFinalized()); + + /// + /// Executes and swallows any exception, surfacing it as + /// a trace-level on the ambient . + /// + private static void Safe(Action action) + { + try + { + action(); + } + catch (Exception ex) + { + try + { + Activity.Current?.AddEvent(new ActivityEvent("telemetry.observer.suppressed", + tags: new ActivityTagsCollection + { + { "error.type", ex.GetType().Name }, + { "error.message", SafeMessage(ex) }, + { "observer.suppressed.source", "SafeObserver" }, + })); + } + catch + { + // Recording the suppression must not itself throw. Intentionally empty. + } + } + } + + // Some exceptions throw from their .Message property (rare but observed in the + // wild for user-defined types). Wrap it so the trace event can never become a + // source of observer failure. + private static string? SafeMessage(Exception ex) + { + try + { + return ex.Message; + } + catch + { + return null; + } + } + } +} diff --git a/csharp/src/Telemetry/TelemetryObserver.cs b/csharp/src/Telemetry/TelemetryObserver.cs new file mode 100644 index 00000000..e9750ecc --- /dev/null +++ b/csharp/src/Telemetry/TelemetryObserver.cs @@ -0,0 +1,269 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Diagnostics; +using System.Threading; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using AdbcDrivers.Databricks.Telemetry.Models; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; + +namespace AdbcDrivers.Databricks.Telemetry +{ + /// + /// Default implementation that translates + /// observer method calls into mutations on a + /// and, on , builds a + /// and enqueues it on the session's telemetry client for background export. + /// + /// + /// + /// Fail-open: Every public method routes through the private Safe(Action) + /// helper, which swallows any exception and emits an OpenTelemetry activity event. This + /// concentrates the fail-open contract in exactly one place rather than scattering + /// try/catch boilerplate across each method body. + /// + /// + /// Thread-safe: The scalar writes into the per-statement context are inherently + /// safe for the lifecycle calls (each is called at most a small number of times from + /// the statement's execution path or the reader's disposal thread). The terminal + /// uses + /// on an _emitted flag to guarantee exactly-once enqueue even if it is invoked + /// concurrently from multiple threads (e.g. error path + dispose path). + /// + /// + /// Non-blocking: only enqueues the log into the + /// telemetry client's internal queue. The actual HTTP export runs on the client's + /// background flush timer and never blocks the calling thread. + /// + /// + internal sealed class TelemetryObserver : IStatementOperationObserver + { + // 0 = not yet emitted, 1 = emitted. Mutated via Interlocked.CompareExchange so the + // terminal OnFinalized() is exactly-once even under concurrent invocation. + private int _emitted; + + private readonly TelemetrySessionContext _session; + private readonly StatementTelemetryContext _ctx; + + /// + /// Initializes a new observer that records into a freshly created + /// seeded from . + /// + /// Per-connection telemetry session context. Required. + public TelemetryObserver(TelemetrySessionContext session) + : this(session, new StatementTelemetryContext(session ?? throw new ArgumentNullException(nameof(session)))) + { + } + + // Test seam: allows unit tests to inject a pre-populated context. Internal so + // it does not leak from the assembly. + internal TelemetryObserver(TelemetrySessionContext session, StatementTelemetryContext context) + { + _session = session ?? throw new ArgumentNullException(nameof(session)); + _ctx = context ?? throw new ArgumentNullException(nameof(context)); + } + + /// + /// Internal accessor for the underlying context, exposed for unit tests that need + /// to assert per-field state without building a full telemetry log. + /// + internal StatementTelemetryContext Context => _ctx; + + /// + /// Internal accessor for the idempotency flag, exposed for unit tests that need to + /// confirm exactly-once semantics without reaching into the enqueue path. + /// + internal bool HasEmitted => Volatile.Read(ref _emitted) != 0; + + /// + public void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed) => + Safe(() => + { + _ctx.StatementType = stmtType; + _ctx.OperationType = opType; + _ctx.IsCompressed = isCompressed; + }); + + /// + public void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat) => + Safe(() => + { + _ctx.StatementId = statementId; + _ctx.ResultFormat = resultFormat; + _ctx.RecordExecuteComplete(); + }); + + /// + public void OnPollCompleted(int count, long latencyMs) => + Safe(() => + { + _ctx.PollCount = count; + _ctx.PollLatencyMs = latencyMs; + }); + + /// + public void OnFirstBatchReady(long latencyMs) => + Safe(() => + { + // Only the first call wins: the underlying setter is null-guarded so + // repeated calls (e.g. inline reader emits one, cloud-fetch reader emits + // another) do not overwrite the earliest observed latency. + if (_ctx.FirstBatchReadyMs == null) + { + _ctx.FirstBatchReadyMs = latencyMs; + } + }); + + /// + public void OnConsumed(long latencyMs) => + Safe(() => _ctx.ResultsConsumedMs = latencyMs); + + /// + public void OnChunksDownloaded(ChunkMetrics metrics) => + Safe(() => + { + // Tolerate a null or empty ChunkMetrics — the gap-fix plumbing may not be + // landed yet, and the proto fields are nullable on the wire. + if (metrics == null) + { + return; + } + _ctx.SetChunkDetails( + metrics.TotalChunksPresent, + metrics.TotalChunksIterated, + metrics.InitialChunkLatencyMs, + metrics.SlowestChunkLatencyMs, + metrics.SumChunksDownloadTimeMs); + }); + + /// + public void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed) => + Safe(() => + { + // Overwrite the placeholders stamped at OnExecuteStarted time with the + // server-reported truth for the active reader (PECO-2988, PECO-2978). + // Mutating the same fields directly preserves byte-identical output vs. + // the legacy EmitTelemetry helper, which set both fields immediately + // before BuildTelemetryLog. + _ctx.ResultFormat = resultFormat; + _ctx.IsCompressed = isCompressed; + }); + + /// + public void OnError(Exception ex) => + Safe(() => + { + if (ex == null) + { + return; + } + _ctx.HasError = true; + // GetType().Name is always safe; do not capture ex.Message here unless we + // have explicit consent: the proto's DriverErrorInfo.error_message field is + // pending LPP review (see ConnectionTelemetry.EmitOperationTelemetry). + _ctx.ErrorName = ex.GetType().Name; + _ctx.ErrorMessage = SafeMessage(ex); + }); + + /// + public void OnFinalized() + { + // Idempotency gate: only the first caller proceeds. Doing this outside Safe() + // ensures even a defective Safe() helper can't bypass the once-only semantics. + if (Interlocked.CompareExchange(ref _emitted, 1, 0) != 0) + { + return; + } + + Safe(() => + { + ITelemetryClient? client = _session.TelemetryClient; + if (client == null) + { + // Telemetry is disabled at the session level. The idempotency flag has + // already been set so a subsequent OnFinalized() remains a no-op. + return; + } + + Proto.OssSqlDriverTelemetryLog log = _ctx.BuildTelemetryLog(); + TelemetryFrontendLog frontendLog = new TelemetryFrontendLog + { + WorkspaceId = _ctx.WorkspaceId, + FrontendLogEventId = Guid.NewGuid().ToString(), + Context = new FrontendLogContext + { + TimestampMillis = DateTimeOffset.UtcNow.ToUnixTimeMilliseconds(), + }, + Entry = new FrontendLogEntry + { + SqlDriverLog = log, + }, + }; + + // Enqueue is non-blocking; the client buffers events and flushes on a + // background timer. No HTTP I/O happens on the calling thread. + client.Enqueue(frontendLog); + }); + } + + /// + /// Concentrates the fail-open try/catch in one place. Any exception is suppressed + /// and surfaced as an OpenTelemetry activity event so it is still observable in + /// traces without affecting the caller's control flow. + /// + private static void Safe(Action action) + { + try + { + action(); + } + catch (Exception ex) + { + try + { + Activity.Current?.AddEvent(new ActivityEvent("telemetry.observer.suppressed", + tags: new ActivityTagsCollection + { + { "error.type", ex.GetType().Name }, + { "error.message", ex.Message } + })); + } + catch + { + // Recording the suppression must not itself throw. Intentionally empty. + } + } + } + + // Some exceptions throw from their .Message property (rare but observed in the + // wild for user-defined types). Wrap it so OnError can never become a source of + // observer failure. + private static string? SafeMessage(Exception ex) + { + try + { + return ex.Message; + } + catch + { + return null; + } + } + } +} diff --git a/csharp/test/E2E/Telemetry/AuthTypeTests.cs b/csharp/test/E2E/Telemetry/AuthTypeTests.cs index 172518ec..1d610f0e 100644 --- a/csharp/test/E2E/Telemetry/AuthTypeTests.cs +++ b/csharp/test/E2E/Telemetry/AuthTypeTests.cs @@ -32,12 +32,10 @@ namespace AdbcDrivers.Databricks.Tests.E2E.Telemetry /// public class AuthTypeTests : TestBase { - // TODO: PECO-3010 - telemetry not wired for SEA protocol; these tests fail for rest protocol public AuthTypeTests(ITestOutputHelper? outputHelper) : base(outputHelper, new DatabricksTestEnvironment.Factory()) { Skip.IfNot(Utils.CanExecuteTestConfig(TestConfigVariable)); - Skip.If(TestConfiguration.Protocol == "rest", "Telemetry not wired for SEA protocol (PECO-3010)"); } /// diff --git a/csharp/test/E2E/Telemetry/ChunkDetailsTelemetryTests.cs b/csharp/test/E2E/Telemetry/ChunkDetailsTelemetryTests.cs index 6b3dcf9d..a8234585 100644 --- a/csharp/test/E2E/Telemetry/ChunkDetailsTelemetryTests.cs +++ b/csharp/test/E2E/Telemetry/ChunkDetailsTelemetryTests.cs @@ -41,7 +41,6 @@ public ChunkDetailsTelemetryTests(ITestOutputHelper? outputHelper) : base(outputHelper, new DatabricksTestEnvironment.Factory()) { Skip.IfNot(Utils.CanExecuteTestConfig(TestConfigVariable)); - Skip.If(TestConfiguration.Protocol == "rest", "CloudFetch telemetry tests are Thrift-only"); } /// diff --git a/csharp/test/E2E/Telemetry/ChunkMetricsAggregationTests.cs b/csharp/test/E2E/Telemetry/ChunkMetricsAggregationTests.cs index 9d04807f..68cb700b 100644 --- a/csharp/test/E2E/Telemetry/ChunkMetricsAggregationTests.cs +++ b/csharp/test/E2E/Telemetry/ChunkMetricsAggregationTests.cs @@ -36,7 +36,6 @@ public ChunkMetricsAggregationTests(ITestOutputHelper? outputHelper) : base(outputHelper, new DatabricksTestEnvironment.Factory()) { Skip.IfNot(Utils.CanExecuteTestConfig(TestConfigVariable)); - Skip.If(TestConfiguration.Protocol == "rest", "CloudFetch metrics tests are Thrift-only"); } /// diff --git a/csharp/test/E2E/Telemetry/ChunkMetricsReaderTests.cs b/csharp/test/E2E/Telemetry/ChunkMetricsReaderTests.cs index dfd65a9c..21ebff25 100644 --- a/csharp/test/E2E/Telemetry/ChunkMetricsReaderTests.cs +++ b/csharp/test/E2E/Telemetry/ChunkMetricsReaderTests.cs @@ -38,7 +38,6 @@ public ChunkMetricsReaderTests(ITestOutputHelper? outputHelper) : base(outputHelper, new DatabricksTestEnvironment.Factory()) { Skip.IfNot(Utils.CanExecuteTestConfig(TestConfigVariable)); - Skip.If(TestConfiguration.Protocol == "rest", "CloudFetch metrics reader tests are Thrift-only"); } /// diff --git a/csharp/test/E2E/Telemetry/ClientTelemetryE2ETests.cs b/csharp/test/E2E/Telemetry/ClientTelemetryE2ETests.cs index 90c2b2d2..7c107199 100644 --- a/csharp/test/E2E/Telemetry/ClientTelemetryE2ETests.cs +++ b/csharp/test/E2E/Telemetry/ClientTelemetryE2ETests.cs @@ -37,11 +37,9 @@ namespace AdbcDrivers.Databricks.Tests.E2E.Telemetry /// public class ClientTelemetryE2ETests : TestBase { - // TODO: PECO-3010 - telemetry not wired for SEA protocol; these tests fail for rest protocol public ClientTelemetryE2ETests(ITestOutputHelper? outputHelper) : base(outputHelper, new DatabricksTestEnvironment.Factory()) { - Skip.If(TestConfiguration.Protocol == "rest", "Telemetry not wired for SEA protocol (PECO-3010)"); } /// diff --git a/csharp/test/E2E/Telemetry/ConnectionParametersTests.cs b/csharp/test/E2E/Telemetry/ConnectionParametersTests.cs index 0383adb0..313a79e6 100644 --- a/csharp/test/E2E/Telemetry/ConnectionParametersTests.cs +++ b/csharp/test/E2E/Telemetry/ConnectionParametersTests.cs @@ -38,7 +38,6 @@ public ConnectionParametersTests(ITestOutputHelper? outputHelper) : base(outputHelper, new DatabricksTestEnvironment.Factory()) { Skip.IfNot(Utils.CanExecuteTestConfig(TestConfigVariable)); - Skip.If(TestConfiguration.Protocol == "rest", "Connection parameters telemetry tests are Thrift-only"); } /// diff --git a/csharp/test/E2E/Telemetry/InternalCallTests.cs b/csharp/test/E2E/Telemetry/InternalCallTests.cs index ac598417..8c99bf96 100644 --- a/csharp/test/E2E/Telemetry/InternalCallTests.cs +++ b/csharp/test/E2E/Telemetry/InternalCallTests.cs @@ -34,12 +34,10 @@ namespace AdbcDrivers.Databricks.Tests.E2E.Telemetry /// public class InternalCallTests : TestBase { - // TODO: PECO-3010 - telemetry not wired for SEA protocol; these tests fail for rest protocol public InternalCallTests(ITestOutputHelper? outputHelper) : base(outputHelper, new DatabricksTestEnvironment.Factory()) { Skip.IfNot(Utils.CanExecuteTestConfig(TestConfigVariable)); - Skip.If(TestConfiguration.Protocol == "rest", "Telemetry not wired for SEA protocol (PECO-3010)"); } /// diff --git a/csharp/test/E2E/Telemetry/MetadataOperationTests.cs b/csharp/test/E2E/Telemetry/MetadataOperationTests.cs index fe6b3349..d95d3b9f 100644 --- a/csharp/test/E2E/Telemetry/MetadataOperationTests.cs +++ b/csharp/test/E2E/Telemetry/MetadataOperationTests.cs @@ -33,12 +33,10 @@ namespace AdbcDrivers.Databricks.Tests.E2E.Telemetry /// public class MetadataOperationTests : TestBase { - // TODO: PECO-3010 - telemetry not wired for SEA protocol; these tests fail for rest protocol public MetadataOperationTests(ITestOutputHelper? outputHelper) : base(outputHelper, new DatabricksTestEnvironment.Factory()) { Skip.IfNot(Utils.CanExecuteTestConfig(TestConfigVariable)); - Skip.If(TestConfiguration.Protocol == "rest", "Telemetry not wired for SEA protocol (PECO-3010)"); } [SkippableFact] diff --git a/csharp/test/E2E/Telemetry/StatementMetadataTelemetryTests.cs b/csharp/test/E2E/Telemetry/StatementMetadataTelemetryTests.cs index 48032c27..b6dbdeb3 100644 --- a/csharp/test/E2E/Telemetry/StatementMetadataTelemetryTests.cs +++ b/csharp/test/E2E/Telemetry/StatementMetadataTelemetryTests.cs @@ -43,12 +43,10 @@ public class StatementMetadataTelemetryTests : TestBase public class SystemConfigurationTests : TestBase { - // TODO: PECO-3010 - telemetry not wired for SEA protocol; these tests fail for rest protocol public SystemConfigurationTests(ITestOutputHelper? outputHelper) : base(outputHelper, new DatabricksTestEnvironment.Factory()) { Skip.IfNot(Utils.CanExecuteTestConfig(TestConfigVariable)); - Skip.If(TestConfiguration.Protocol == "rest", "Telemetry not wired for SEA protocol (PECO-3010)"); } /// diff --git a/csharp/test/E2E/Telemetry/TelemetryBaselineTests.cs b/csharp/test/E2E/Telemetry/TelemetryBaselineTests.cs index d8ad2e68..eaf1b475 100644 --- a/csharp/test/E2E/Telemetry/TelemetryBaselineTests.cs +++ b/csharp/test/E2E/Telemetry/TelemetryBaselineTests.cs @@ -35,12 +35,10 @@ namespace AdbcDrivers.Databricks.Tests.E2E.Telemetry /// public class TelemetryBaselineTests : TestBase { - // TODO: PECO-3010 - telemetry not wired for SEA protocol; these tests fail for rest protocol public TelemetryBaselineTests(ITestOutputHelper? outputHelper) : base(outputHelper, new DatabricksTestEnvironment.Factory()) { Skip.IfNot(Utils.CanExecuteTestConfig(TestConfigVariable)); - Skip.If(TestConfiguration.Protocol == "rest", "Telemetry not wired for SEA protocol (PECO-3010)"); } /// diff --git a/csharp/test/Unit/StatementExecution/SeaResultFormatMapperTests.cs b/csharp/test/Unit/StatementExecution/SeaResultFormatMapperTests.cs new file mode 100644 index 00000000..46be1ffb --- /dev/null +++ b/csharp/test/Unit/StatementExecution/SeaResultFormatMapperTests.cs @@ -0,0 +1,264 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System.Collections.Generic; +using AdbcDrivers.Databricks.StatementExecution; +using Xunit; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; + +namespace AdbcDrivers.Databricks.Tests.Unit.StatementExecution +{ + /// + /// Verifies the four-cell mapping table from + /// PECO-3022-sea-telemetry-integration-design.md §8 implemented by + /// : + /// + /// + /// INLINE + ARROW_STREAMINLINE_ARROW + /// EXTERNAL_LINKS + ARROW_STREAMEXTERNAL_LINKS + /// INLINE_OR_EXTERNAL_LINKS + external_links populated → EXTERNAL_LINKS + /// INLINE_OR_EXTERNAL_LINKS + inline attachment → INLINE_ARROW + /// + /// + /// Plus defensive edge cases: non-ARROW_STREAM format, unknown disposition, and missing + /// response data all fall back to rather + /// than guessing. + /// + public class SeaResultFormatMapperTests + { + // ── Helpers ─────────────────────────────────────────────────────────────── + + private static ExecuteStatementResponse BuildInlineResponse() + { + // An "inline attachment" response: manifest present (so the server has produced a + // result shape), no external_links anywhere, attachment bytes available on Result. + return new ExecuteStatementResponse + { + StatementId = "stmt-inline", + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = new ResultManifest + { + Format = "ARROW_STREAM", + TotalRowCount = 1, + Chunks = new List(), + }, + Result = new ResultData + { + Attachment = new byte[] { 0x01, 0x02, 0x03 }, + RowCount = 1, + }, + }; + } + + private static ExecuteStatementResponse BuildExternalLinksResponseInChunks() + { + // External-links response where the link sits on a manifest chunk — this is the + // shape EXTERNAL_LINKS disposition uses and the most common shape that hybrid + // (INLINE_OR_EXTERNAL_LINKS) takes when the result is large. + return new ExecuteStatementResponse + { + StatementId = "stmt-ext-chunk", + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = new ResultManifest + { + Format = "ARROW_STREAM", + TotalRowCount = 1000, + Chunks = new List + { + new() + { + ChunkIndex = 0, + ExternalLinks = new List + { + new() { ExternalLinkUrl = "https://example.com/chunk-0" }, + }, + }, + }, + }, + }; + } + + private static ExecuteStatementResponse BuildExternalLinksResponseInResult() + { + // External-links response where the link sits on the Result payload (hybrid mode + // can produce this shape when the first chunk is delivered out-of-band). + return new ExecuteStatementResponse + { + StatementId = "stmt-ext-result", + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = new ResultManifest + { + Format = "ARROW_STREAM", + Chunks = new List(), + }, + Result = new ResultData + { + ExternalLinks = new List + { + new() { ExternalLinkUrl = "https://example.com/result-0" }, + }, + }, + }; + } + + // ── Four-cell table tests (per design §8) ───────────────────────────────── + + [Fact] + public void Map_InlineDisposition_ReturnsInlineArrow() + { + // Cell 1: INLINE + ARROW_STREAM → INLINE_ARROW. With INLINE disposition the server + // is contractually required to deliver inline data; the mapper does not need to + // inspect the response shape — disposition alone determines the answer. + var response = BuildInlineResponse(); + + var result = SeaResultFormatMapper.Map("INLINE", "ARROW_STREAM", response); + + Assert.Equal(ExecutionResultFormat.InlineArrow, result); + } + + [Fact] + public void Map_ExternalLinksDisposition_ReturnsExternalLinks() + { + // Cell 2: EXTERNAL_LINKS + ARROW_STREAM → EXTERNAL_LINKS. Mirror of Cell 1 — the + // server is contractually required to deliver external links, so we do not need to + // verify their presence in the response. (Doing so would still pass, but the design + // explicitly does not require it.) + var response = BuildExternalLinksResponseInChunks(); + + var result = SeaResultFormatMapper.Map("EXTERNAL_LINKS", "ARROW_STREAM", response); + + Assert.Equal(ExecutionResultFormat.ExternalLinks, result); + } + + [Fact] + public void Map_AutoDisposition_WithExternalLinks_ReturnsExternalLinks() + { + // Cell 3: INLINE_OR_EXTERNAL_LINKS + external_links populated → EXTERNAL_LINKS. The + // mapper distinguishes auto-disposition results by inspecting the response. The + // canonical shape places external links on a manifest chunk. + var response = BuildExternalLinksResponseInChunks(); + + var result = SeaResultFormatMapper.Map("INLINE_OR_EXTERNAL_LINKS", "ARROW_STREAM", response); + + Assert.Equal(ExecutionResultFormat.ExternalLinks, result); + } + + [Fact] + public void Map_AutoDisposition_WithExternalLinksOnResult_ReturnsExternalLinks() + { + // Cell 3 variant: hybrid mode may surface external links on the Result payload + // rather than the manifest chunks. The mapper must catch both shapes — if it only + // looked at manifest.chunks it would mis-classify this as inline. + var response = BuildExternalLinksResponseInResult(); + + var result = SeaResultFormatMapper.Map("INLINE_OR_EXTERNAL_LINKS", "ARROW_STREAM", response); + + Assert.Equal(ExecutionResultFormat.ExternalLinks, result); + } + + [Fact] + public void Map_AutoDisposition_WithInlineResult_ReturnsInlineArrow() + { + // Cell 4: INLINE_OR_EXTERNAL_LINKS + inline attachment → INLINE_ARROW. The + // response has a manifest (server produced a shape) but no external_links anywhere + // — this is the row of the §8 table that maps to InlineArrow. + var response = BuildInlineResponse(); + + var result = SeaResultFormatMapper.Map("INLINE_OR_EXTERNAL_LINKS", "ARROW_STREAM", response); + + Assert.Equal(ExecutionResultFormat.InlineArrow, result); + } + + // ── Defensive edge cases ────────────────────────────────────────────────── + + [Fact] + public void Map_NonArrowStreamFormat_ReturnsUnspecified() + { + // The §8 table only covers ARROW_STREAM. Other server-side formats (JSON_ARRAY, + // CSV) have no corresponding proto enum value, so the mapper falls back to + // Unspecified rather than guessing. This keeps telemetry honest for any future + // format the server adds without a paired proto change. + var response = BuildInlineResponse(); + + var result = SeaResultFormatMapper.Map("INLINE", "JSON_ARRAY", response); + + Assert.Equal(ExecutionResultFormat.Unspecified, result); + } + + [Fact] + public void Map_UnknownDisposition_ReturnsUnspecified() + { + // Defensive: if a future server adds a new disposition string we do not recognise, + // emit Unspecified rather than picking a fallback that may silently mis-label the + // record. Telemetry consumers can spot Unspecified easily; they cannot spot + // an InlineArrow that should have been ExternalLinks. + var response = BuildInlineResponse(); + + var result = SeaResultFormatMapper.Map("WHATEVER_DISPOSITION", "ARROW_STREAM", response); + + Assert.Equal(ExecutionResultFormat.Unspecified, result); + } + + [Fact] + public void Map_AutoDisposition_WithNullResponse_ReturnsUnspecified() + { + // Auto-disposition needs the response shape to disambiguate. With no response at + // all, the mapper cannot tell inline from external — Unspecified is the only + // honest answer. The INLINE / EXTERNAL_LINKS dispositions still work without a + // response because their answer is determined by the request alone. + var result = SeaResultFormatMapper.Map("INLINE_OR_EXTERNAL_LINKS", "ARROW_STREAM", response: null); + + Assert.Equal(ExecutionResultFormat.Unspecified, result); + } + + [Fact] + public void Map_AutoDisposition_NoManifestNoResult_ReturnsUnspecified() + { + // Phase-5 fires OnExecuteSucceeded after the initial ExecuteStatementAsync call, + // before polling. For PENDING responses there is no manifest or result yet — we + // explicitly do not want to guess "inline" by default here, since the eventual + // result may be external. Unspecified preserves accuracy in the PENDING case; + // accurate telemetry can be backfilled when the polling loop terminates if a + // future change moves the callsite. + var pendingResponse = new ExecuteStatementResponse + { + StatementId = "stmt-pending", + Status = new StatementStatus { State = "PENDING" }, + }; + + var result = SeaResultFormatMapper.Map("INLINE_OR_EXTERNAL_LINKS", "ARROW_STREAM", pendingResponse); + + Assert.Equal(ExecutionResultFormat.Unspecified, result); + } + + [Fact] + public void Map_DispositionAndFormatAreCaseInsensitive() + { + // Server clients and configuration sources may produce either-case strings. The + // mapper normalises via OrdinalIgnoreCase comparison so callers do not need to + // pre-uppercase. Combined with the four-cell tests above this gives total coverage + // of the public surface. + var response = BuildInlineResponse(); + + Assert.Equal( + ExecutionResultFormat.InlineArrow, + SeaResultFormatMapper.Map("inline", "arrow_stream", response)); + Assert.Equal( + ExecutionResultFormat.ExternalLinks, + SeaResultFormatMapper.Map("External_Links", "Arrow_Stream", BuildExternalLinksResponseInChunks())); + } + } +} diff --git a/csharp/test/Unit/StatementExecution/StatementExecutionConnectionTelemetryTests.cs b/csharp/test/Unit/StatementExecution/StatementExecutionConnectionTelemetryTests.cs new file mode 100644 index 00000000..89e9a43f --- /dev/null +++ b/csharp/test/Unit/StatementExecution/StatementExecutionConnectionTelemetryTests.cs @@ -0,0 +1,600 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Collections.Generic; +using System.Diagnostics; +using System.Reflection; +using System.Threading; +using System.Threading.Tasks; +using AdbcDrivers.Databricks.StatementExecution; +using AdbcDrivers.Databricks.Telemetry; +using AdbcDrivers.Databricks.Telemetry.Models; +using AdbcDrivers.HiveServer2.Spark; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.StatementExecution +{ + /// + /// Unit tests for telemetry wiring in + /// (PECO-3022 T5). Mirrors the Thrift-side DriverTelemetryWiringTests pattern: + /// these tests exercise the production code paths (EmitCreateSessionTelemetry, + /// EmitDeleteSessionTelemetry, OpenAsync, Dispose) against a + /// fake injected via the + /// TelemetryForTesting seam so we can verify CREATE_SESSION/DELETE_SESSION + /// emissions, the fail-open init contract, and the 5-second Dispose flush timeout. + /// + public class StatementExecutionConnectionTelemetryTests + { + // ── Fakes ────────────────────────────────────────────────────────────────── + + /// + /// Records every EmitOperationTelemetry call so tests can assert which + /// operation types fired, in what order, and with what payload. + /// + private sealed class RecordingTelemetry : IConnectionTelemetry + { + public List Calls { get; } = new(); + public List Errors { get; } = new(); + public List StatementIds { get; } = new(); + public Dictionary LatenciesByOp { get; } = new(); + public TelemetrySessionContext? Session { get; } + = new TelemetrySessionContext + { + SessionId = "sea-session-1", + }; + public int DisposeCount { get; private set; } + + public T ExecuteWithMetadataTelemetry( + OperationType operationType, + Func operation, + Activity? activity) => operation(); + + public void EmitOperationTelemetry( + OperationType operationType, + StatementType statementType, + string? statementId, + long elapsedMs, + Exception? error) + { + Calls.Add(operationType); + Errors.Add(error); + StatementIds.Add(statementId); + LatenciesByOp[operationType] = elapsedMs; + } + + public Task DisposeAsync() + { + DisposeCount++; + return Task.CompletedTask; + } + } + + /// + /// Telemetry that intentionally hangs in DisposeAsync. Used to verify the + /// 5-second hard timeout on . + /// + private sealed class HangingTelemetry : IConnectionTelemetry + { + public TelemetrySessionContext? Session { get; } + = new TelemetrySessionContext { SessionId = "sea-session-hang" }; + + public T ExecuteWithMetadataTelemetry( + OperationType operationType, + Func operation, + Activity? activity) => operation(); + + public void EmitOperationTelemetry( + OperationType operationType, + StatementType statementType, + string? statementId, + long elapsedMs, + Exception? error) + { + // No-op + } + + // Block forever — the production Dispose must time this out at 5 seconds. + public Task DisposeAsync() => new TaskCompletionSource().Task; + } + + /// + /// Telemetry that throws on every EmitOperationTelemetry call. The connection's + /// emit helpers must swallow these so Dispose / OpenAsync don't fail. + /// + private sealed class ThrowingTelemetry : IConnectionTelemetry + { + public TelemetrySessionContext? Session => null; + + public T ExecuteWithMetadataTelemetry( + OperationType operationType, + Func operation, + Activity? activity) => operation(); + + public void EmitOperationTelemetry( + OperationType operationType, + StatementType statementType, + string? statementId, + long elapsedMs, + Exception? error) + { + throw new InvalidOperationException("emit blew up"); + } + + public Task DisposeAsync() => Task.CompletedTask; + } + + // ── Helpers ──────────────────────────────────────────────────────────────── + + private static Dictionary CreateBaseProperties() + { + return new Dictionary + { + { SparkParameters.HostName, "telemetry-test.cloud.databricks.com" }, + { DatabricksParameters.WarehouseId, "test-warehouse-id" }, + { SparkParameters.AccessToken, "test-token" } + }; + } + + private static StatementExecutionConnection CreateConnection() + { + return new StatementExecutionConnection(CreateBaseProperties()); + } + + /// + /// Reflectively flips _sessionId from null to a fake id so Dispose + /// believes a session exists and runs the DeleteSession path. The reflective + /// access mirrors what other StatementExecutionConnection unit tests do for + /// _identityFederationClientId. + /// + private static void SetFakeSessionId(StatementExecutionConnection connection, string sessionId) + { + var field = typeof(StatementExecutionConnection).GetField( + "_sessionId", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(field); + field!.SetValue(connection, sessionId); + } + + // ── Tests ────────────────────────────────────────────────────────────────── + + [Fact] + public void EmitCreateSessionTelemetry_FiresCreateSession() + { + using var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + connection.EmitCreateSessionTelemetry(); + + Assert.Equal(new[] { OperationType.CreateSession }, fake.Calls); + Assert.Null(fake.Errors[0]); + Assert.Null(fake.StatementIds[0]); + } + + [Fact] + public void OpenAsync_EmitsCreateSession_WhenTelemetryEnabled() + { + // We can't easily drive the full OpenAsync without a real warehouse, but the + // wiring contract is: after the session id is set, EmitCreateSessionTelemetry() + // must fire CREATE_SESSION through the injected telemetry. The production code + // calls this immediately after _sessionId = response.SessionId. + using var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + // Simulate the OpenAsync post-CreateSession step. + SetFakeSessionId(connection, "fake-session-id"); + connection.EmitCreateSessionTelemetry(activity: null); + + Assert.Contains(OperationType.CreateSession, fake.Calls); + } + + [Fact] + public void EmitDeleteSessionTelemetry_WithoutCreate_DoesNotFire() + { + // Idempotency contract: DELETE_SESSION must not fire if CREATE_SESSION never did. + // Models a connection that failed to open a session. + using var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + connection.EmitDeleteSessionTelemetry(elapsedMs: 10); + + Assert.DoesNotContain(OperationType.DeleteSession, fake.Calls); + } + + [Fact] + public void Dispose_EmitsDeleteSession() + { + // The production Dispose path: with a session id set, it calls DeleteSessionAsync, + // captures its latency + error, and emits DELETE_SESSION before flushing. + // We use the test seam to skip the real RPC: the seam doesn't bypass the emit call, + // so we exercise the emit wiring directly with a fake session id. + var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + // Pretend session was opened. + connection.EmitCreateSessionTelemetry(); + + // Pretend the SEA session id is set so Dispose enters the DeleteSession branch. + // DeleteSessionAsync will throw (real HTTP call), but the production Dispose swallows + // and still emits DELETE_SESSION with the captured exception. + SetFakeSessionId(connection, "fake-sea-session-id"); + + connection.Dispose(); + + Assert.Contains(OperationType.CreateSession, fake.Calls); + Assert.Contains(OperationType.DeleteSession, fake.Calls); + } + + [Fact] + public void EmitDeleteSessionTelemetry_ForwardsLatencyAndError() + { + using var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + // Open first so the idempotency gate lets DELETE_SESSION through. + connection.EmitCreateSessionTelemetry(); + var rpcError = new InvalidOperationException("delete session failed"); + connection.EmitDeleteSessionTelemetry(elapsedMs: 47, error: rpcError); + + int deleteIdx = fake.Calls.IndexOf(OperationType.DeleteSession); + Assert.True(deleteIdx >= 0); + Assert.Equal(47, fake.LatenciesByOp[OperationType.DeleteSession]); + Assert.Same(rpcError, fake.Errors[deleteIdx]); + } + + [Fact] + public void Dispose_CalledTwice_FiresDeleteSessionOnlyOnce() + { + // Repeated Dispose calls (common in `using` + manual Dispose) must not duplicate + // DELETE_SESSION records. + var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + connection.EmitCreateSessionTelemetry(); + SetFakeSessionId(connection, "fake-session-id-dup"); + + connection.Dispose(); + connection.Dispose(); + + int deleteCount = 0; + foreach (var call in fake.Calls) + { + if (call == OperationType.DeleteSession) deleteCount++; + } + Assert.Equal(1, deleteCount); + } + + [Fact] + public void OpenAsync_TelemetryInitThrows_FallsBackToNoOpAndStillOpens() + { + // Goal: prove the InitializeTelemetry → ConnectionTelemetry.Create call chain is + // wrapped in a try/catch that falls back to NoOpConnectionTelemetry when init + // would throw. We exercise the private InitializeTelemetry helper directly via + // reflection — this is the same boundary the production OpenAsync goes through. + using var connection = CreateConnection(); + + // Sanity: starts as NoOp (the field's default). + Assert.IsType(connection.TelemetryForTesting); + + var method = typeof(StatementExecutionConnection).GetMethod( + "InitializeTelemetry", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(method); + + // ConnectionTelemetry.Create internally catches and returns NoOp on any failure, + // so even with bogus properties, InitializeTelemetry should leave _telemetry + // pointing at a valid IConnectionTelemetry (NoOp or real). It MUST NOT throw. + var ex = Record.Exception(() => method!.Invoke(connection, new object?[] { null })); + Assert.Null(ex); + + // After init, telemetry should still be a valid instance (NoOp or real). + Assert.NotNull(connection.TelemetryForTesting); + } + + [Fact] + public void EmitCreateSessionTelemetry_SwallowsExceptions() + { + // Fail-open contract: if the emit call throws, neither Open nor Dispose must surface it. + using var connection = CreateConnection(); + connection.TelemetryForTesting = new ThrowingTelemetry(); + + var ex = Record.Exception(() => connection.EmitCreateSessionTelemetry()); + Assert.Null(ex); + } + + [Fact] + public void EmitDeleteSessionTelemetry_SwallowsExceptions() + { + using var connection = CreateConnection(); + connection.TelemetryForTesting = new ThrowingTelemetry(); + + // Open first so the idempotency gate would otherwise let it through. + // ThrowingTelemetry throws on CREATE_SESSION too — but the helper still swallows. + connection.EmitCreateSessionTelemetry(); + var ex = Record.Exception(() => connection.EmitDeleteSessionTelemetry(elapsedMs: 1)); + Assert.Null(ex); + } + + [Fact] + public void Dispose_FlushHangs_CompletesWithin5Seconds() + { + // The 5-second hard timeout on _telemetry.DisposeAsync().Wait(...) is the only + // thing standing between a wedged exporter and an indefinitely blocked Dispose. + // Use a HangingTelemetry whose DisposeAsync never completes, and verify Dispose + // returns in well under 10 seconds (we allow a generous wall-clock budget to + // tolerate slow CI machines). + var connection = CreateConnection(); + connection.TelemetryForTesting = new HangingTelemetry(); + + // No session id set → DeleteSessionAsync path is skipped. The only thing that + // could hang Dispose now is the telemetry flush, which is exactly what we want + // to time-bound here. + var sw = Stopwatch.StartNew(); + connection.Dispose(); + sw.Stop(); + + // Budget: 5s timeout + headroom for the rest of Dispose (HttpClient disposal etc.). + Assert.True( + sw.Elapsed < TimeSpan.FromSeconds(10), + $"Dispose took {sw.Elapsed.TotalSeconds:F1}s, expected < 10s (5s flush timeout + headroom)."); + } + + [Fact] + public void TelemetrySession_DefaultsToNull_BeforeOpen() + { + // The TelemetrySession accessor exposes the underlying session for the SEA + // statement (next phase) to build observers. Before OpenAsync runs, telemetry + // is NoOp and Session is null — exactly the signal the statement uses to fall + // back to NullObserver. + using var connection = CreateConnection(); + Assert.Null(connection.TelemetrySession); + } + + [Fact] + public void TelemetrySession_ReflectsInjectedTelemetry() + { + // Once telemetry is wired up (real or fake), the accessor returns its session. + // SEA statements rely on this to create per-statement observer contexts. + using var connection = CreateConnection(); + var fake = new RecordingTelemetry(); + connection.TelemetryForTesting = fake; + + Assert.NotNull(connection.TelemetrySession); + Assert.Equal("sea-session-1", connection.TelemetrySession!.SessionId); + } + + // ── Connect-timeout telemetry source (gap D1) ────────────────────────────── + // + // The SEA connection must stamp the telemetry payload's socket_timeout from a + // connection-establishment timeout — NOT from _waitTimeoutSeconds, which is the + // SEA query-wait (CONTINUE) timeout and a semantically different concept. + + private static int GetConnectTimeoutFieldValue(StatementExecutionConnection connection) + { + var field = typeof(StatementExecutionConnection).GetField( + "_connectTimeoutMilliseconds", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(field); + return (int)field!.GetValue(connection)!; + } + + private static int GetWaitTimeoutSecondsFieldValue(StatementExecutionConnection connection) + { + var field = typeof(StatementExecutionConnection).GetField( + "_waitTimeoutSeconds", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(field); + return (int)field!.GetValue(connection)!; + } + + [Fact] + public void ConnectTimeoutMilliseconds_DefaultsTo30Seconds_WhenPropertyAbsent() + { + // No ConnectTimeoutMilliseconds property is set. The default must match the + // Thrift path's HiveServer2Connection.ConnectTimeoutMillisecondsDefault (30000 ms) + // so dashboards filtering on socket_timeout see consistent values across transports. + using var connection = CreateConnection(); + + Assert.Equal(30000, GetConnectTimeoutFieldValue(connection)); + } + + [Fact] + public void ConnectTimeoutMilliseconds_ReadsFromSparkParametersConnectTimeoutMilliseconds() + { + // Source-of-truth check: the SEA connection must read the same connection-string + // property the Thrift path reads (SparkParameters.ConnectTimeoutMilliseconds). + var properties = CreateBaseProperties(); + properties[SparkParameters.ConnectTimeoutMilliseconds] = "45000"; + using var connection = new StatementExecutionConnection(properties); + + Assert.Equal(45000, GetConnectTimeoutFieldValue(connection)); + } + + [Fact] + public void ConnectTimeoutMilliseconds_IsNotDerivedFromWaitTimeoutSeconds() + { + // Regression guard for gap D1: the previous code passed + // (int)TimeSpan.FromSeconds(_waitTimeoutSeconds).TotalMilliseconds + // as connectTimeoutMilliseconds, which silently mislabeled SEA telemetry records + // (10s wait_timeout → 10000ms socket_timeout). The two concepts must be independent. + var properties = CreateBaseProperties(); + properties[DatabricksParameters.WaitTimeout] = "7"; // SEA CONTINUE timeout (seconds) + properties[SparkParameters.ConnectTimeoutMilliseconds] = "55000"; + using var connection = new StatementExecutionConnection(properties); + + int waitTimeoutSeconds = GetWaitTimeoutSecondsFieldValue(connection); + int connectTimeoutMs = GetConnectTimeoutFieldValue(connection); + + Assert.Equal(7, waitTimeoutSeconds); + Assert.Equal(55000, connectTimeoutMs); + // The mislabel bug would produce 7000ms here — assert it doesn't. + Assert.NotEqual( + (int)TimeSpan.FromSeconds(waitTimeoutSeconds).TotalMilliseconds, + connectTimeoutMs); + } + + [Fact] + public void ConnectTimeoutMilliseconds_NotAffectedByEnableDirectResultsFalse() + { + // Direct-results=false flips _waitTimeoutSeconds to 0 in the SEA path. The connect + // timeout (a connection-establishment concept) must remain independent of that. + var properties = CreateBaseProperties(); + properties[DatabricksParameters.EnableDirectResults] = "false"; + properties[SparkParameters.ConnectTimeoutMilliseconds] = "20000"; + using var connection = new StatementExecutionConnection(properties); + + Assert.Equal(0, GetWaitTimeoutSecondsFieldValue(connection)); + Assert.Equal(20000, GetConnectTimeoutFieldValue(connection)); + } + + [Fact] + public void InitializeTelemetry_ForwardsConnectTimeoutToSocketTimeoutField() + { + // End-to-end wiring guard: exercise the real InitializeTelemetry → + // ConnectionTelemetry.Create path with telemetry enabled, then read back the + // resulting session's driver_connection_params.socket_timeout. This proves the + // argument passed to ConnectionTelemetry.Create is _connectTimeoutMilliseconds + // (in ms, divided to seconds inside BuildDriverConnectionParams), NOT + // _waitTimeoutSeconds. Without the fix, the bug pattern produced socket_timeout + // in the 0–10s range (mirroring DatabricksParameters.WaitTimeout, default 10s). + var properties = CreateBaseProperties(); + properties[TelemetryConfiguration.PropertyKeyEnabled] = "true"; + properties[SparkParameters.ConnectTimeoutMilliseconds] = "55000"; + properties[DatabricksParameters.WaitTimeout] = "7"; + + using var connection = new StatementExecutionConnection(properties); + + // Sanity: the two fields are distinct so we can disambiguate which one feeds + // socket_timeout downstream. + Assert.Equal(7, GetWaitTimeoutSecondsFieldValue(connection)); + Assert.Equal(55000, GetConnectTimeoutFieldValue(connection)); + + var method = typeof(StatementExecutionConnection).GetMethod( + "InitializeTelemetry", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(method); + method!.Invoke(connection, new object?[] { null }); + + // ConnectionTelemetry.Create is `Never throws` — if init failed we'd get NoOp + // back with Session == null. Skip if the unit-test environment doesn't permit + // building a real telemetry client; the field-level tests above still cover + // the source-of-truth contract. + var session = connection.TelemetrySession; + Assert.NotNull(session); + Assert.NotNull(session!.DriverConnectionParams); + + // socket_timeout is in seconds (proto field is int64-seconds). 55000ms → 55s. + // Critically: NOT 7 (which would mean the value came from _waitTimeoutSeconds). + Assert.Equal(55, session.DriverConnectionParams!.SocketTimeout); + Assert.NotEqual(GetWaitTimeoutSecondsFieldValue(connection), session.DriverConnectionParams.SocketTimeout); + } + + // ── enable_direct_results telemetry source (gap B10) ─────────────────────── + // + // The SEA connection must stamp the telemetry payload's enable_direct_results from + // the DatabricksParameters.EnableDirectResults user property — NOT from a hardcoded + // literal. The previous code passed `enableDirectResults: true` unconditionally, + // making the field useless on dashboards that filter by user configuration. + + private static bool GetEnableDirectResultsFieldValue(StatementExecutionConnection connection) + { + var field = typeof(StatementExecutionConnection).GetField( + "_enableDirectResults", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(field); + return (bool)field!.GetValue(connection)!; + } + + [Fact] + public void EnableDirectResults_DefaultsToTrue_WhenPropertyAbsent() + { + // No EnableDirectResults property is set. The default must match the Thrift path + // (DatabricksConnection._enableDirectResults defaults to true) so dashboards see + // consistent values across transports for callers that never tuned this flag. + using var connection = CreateConnection(); + + Assert.True(GetEnableDirectResultsFieldValue(connection)); + } + + [Fact] + public void EnableDirectResults_ReadsFalseFromConnectionProperties() + { + // Property dictionary check: the SEA connection must read the same + // DatabricksParameters.EnableDirectResults property the Thrift path reads + // and honor "false" rather than hardcoding true. + var properties = CreateBaseProperties(); + properties[DatabricksParameters.EnableDirectResults] = "false"; + using var connection = new StatementExecutionConnection(properties); + + Assert.False(GetEnableDirectResultsFieldValue(connection)); + } + + [Fact] + public void EnableDirectResults_ReadsTrueFromConnectionProperties() + { + // Explicit "true" must also be read from the property — not from a + // hardcoded default — so the property is the source of truth. + var properties = CreateBaseProperties(); + properties[DatabricksParameters.EnableDirectResults] = "true"; + using var connection = new StatementExecutionConnection(properties); + + Assert.True(GetEnableDirectResultsFieldValue(connection)); + } + + [Fact] + public void InitializeTelemetry_ForwardsEnableDirectResultsToConnectionParams() + { + // End-to-end wiring guard: exercise the real InitializeTelemetry → + // ConnectionTelemetry.Create path with telemetry enabled, then read back the + // resulting session's driver_connection_params.enable_direct_results. This proves + // the argument passed to ConnectionTelemetry.Create is _enableDirectResults, NOT + // the prior hardcoded literal `true`. Without the fix, this assertion fails + // because the field is always true regardless of user configuration. + var properties = CreateBaseProperties(); + properties[TelemetryConfiguration.PropertyKeyEnabled] = "true"; + properties[DatabricksParameters.EnableDirectResults] = "false"; + + using var connection = new StatementExecutionConnection(properties); + + // Sanity: the field reflects the user-supplied "false". + Assert.False(GetEnableDirectResultsFieldValue(connection)); + + var method = typeof(StatementExecutionConnection).GetMethod( + "InitializeTelemetry", + BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(method); + method!.Invoke(connection, new object?[] { null }); + + // ConnectionTelemetry.Create is `Never throws` — if init failed we'd get NoOp + // back with Session == null. The field-level tests above still cover the + // source-of-truth contract if this end-to-end assertion can't run. + var session = connection.TelemetrySession; + Assert.NotNull(session); + Assert.NotNull(session!.DriverConnectionParams); + + // The bug pattern produced `true` here regardless of user config. + Assert.False(session.DriverConnectionParams!.EnableDirectResults); + } + } +} diff --git a/csharp/test/Unit/StatementExecution/StatementExecutionStatementObserverInjectionTests.cs b/csharp/test/Unit/StatementExecution/StatementExecutionStatementObserverInjectionTests.cs new file mode 100644 index 00000000..e8eed40c --- /dev/null +++ b/csharp/test/Unit/StatementExecution/StatementExecutionStatementObserverInjectionTests.cs @@ -0,0 +1,293 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Collections.Generic; +using System.Diagnostics; +using System.Reflection; +using System.Threading; +using System.Threading.Tasks; +using AdbcDrivers.Databricks.StatementExecution; +using AdbcDrivers.Databricks.Telemetry; +using AdbcDrivers.Databricks.Telemetry.Models; +using AdbcDrivers.HiveServer2.Spark; +using Apache.Arrow.Adbc; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.StatementExecution +{ + /// + /// Unit tests for the observer-injection wiring inside + /// (PECO-3022 T6 setup). + /// The setup commit only plumbs an field + /// onto and wires the connection to + /// build a bound to the connection's + /// (or fall back to + /// ). Subsequent hookpoint commits will then have + /// a non-null target. These tests verify: + /// + /// The field exists, is typed as the interface, and is readonly. + /// injects + /// when telemetry is disabled (no live session). + /// injects a + /// bound to the connection's session when + /// telemetry is enabled. + /// + /// + public class StatementExecutionStatementObserverInjectionTests + { + // ── Fakes ────────────────────────────────────────────────────────────────── + + /// + /// Minimal capturing telemetry client that satisfies session.TelemetryClient != null + /// so takes the + /// TelemetryObserver branch instead of the NullObserver fallback. + /// + private sealed class CapturingTelemetryClient : ITelemetryClient + { + public int EnqueueCallCount; + + public void Enqueue(TelemetryFrontendLog log) + { + Interlocked.Increment(ref EnqueueCallCount); + } + + public Task FlushAsync(CancellationToken ct = default) => Task.CompletedTask; + public Task CloseAsync() => Task.CompletedTask; + public ValueTask DisposeAsync() => default; + } + + /// + /// Minimal adapter that exposes a given session + /// through . The CreateStatement code path + /// only consults connection.TelemetrySession (which reads _telemetry.Session) + /// to decide which observer to inject, so this adapter is sufficient for the test + /// without spinning up a real ConnectionTelemetry. + /// + private sealed class TelemetryAdapter : IConnectionTelemetry + { + public TelemetryAdapter(TelemetrySessionContext? session) { Session = session; } + public TelemetrySessionContext? Session { get; } + public T ExecuteWithMetadataTelemetry( + OperationType operationType, + Func operation, + Activity? activity) => operation(); + public void EmitOperationTelemetry( + OperationType operationType, + StatementType statementType, + string? statementId, + long elapsedMs, + Exception? error) + { } + public Task DisposeAsync() => Task.CompletedTask; + } + + // ── Helpers ──────────────────────────────────────────────────────────────── + + private static Dictionary CreateBaseProperties() + { + return new Dictionary + { + { SparkParameters.HostName, "observer-test.cloud.databricks.com" }, + { DatabricksParameters.WarehouseId, "test-warehouse-id" }, + { SparkParameters.AccessToken, "test-token" }, + }; + } + + private static StatementExecutionConnection CreateConnection() + { + return new StatementExecutionConnection(CreateBaseProperties()); + } + + /// + /// Builds a session context with a live telemetry client. The CreateStatement + /// branch under test only checks session.TelemetryClient != null, so this + /// is the minimal shape that triggers the TelemetryObserver path. + /// + private static TelemetrySessionContext CreateSeaSession(ITelemetryClient client) + { + return new TelemetrySessionContext + { + SessionId = "sea-session-observer", + WorkspaceId = 4242L, + TelemetryClient = client, + AuthType = "pat", + }; + } + + /// + /// Reflectively reads the private _observer field on a SEA statement so + /// we can assert the concrete type wired in by CreateStatement. + /// + private static IStatementOperationObserver GetObserverField(StatementExecutionStatement statement) + { + FieldInfo field = typeof(StatementExecutionStatement).GetField( + "_observer", + BindingFlags.NonPublic | BindingFlags.Instance) + ?? throw new InvalidOperationException( + "_observer field not found on StatementExecutionStatement"); + return (IStatementOperationObserver)field.GetValue(statement)!; + } + + // ── Structural assertions ────────────────────────────────────────────────── + + [Fact] + public void StatementExecutionStatement_HasObserverField_TypedAsInterface() + { + // Exactly one observer field, typed as the interface (not the concrete + // TelemetryObserver) so the SEA path can equally accept NullObserver, + // TelemetryObserver, or any future implementation without changing the field's + // declared type. Readonly so once injected the observer cannot drift mid-execute. + FieldInfo? field = typeof(StatementExecutionStatement).GetField( + "_observer", + BindingFlags.NonPublic | BindingFlags.Instance); + + Assert.NotNull(field); + Assert.Equal(typeof(IStatementOperationObserver), field!.FieldType); + Assert.True( + field.IsInitOnly, + "_observer must be readonly so once injected it cannot drift mid-execute"); + } + + [Fact] + public void StatementExecutionStatement_NullObserverParameter_DefaultsToNullObserver() + { + // Direct-construction path: the constructor's IStatementOperationObserver? parameter + // defaults to null, which the constructor coerces to NullObserver.Instance. This + // keeps every hookpoint callsite null-check-free (design §12) regardless of how + // the statement was constructed. + using var connection = CreateConnection(); + + using var statement = new StatementExecutionStatement( + client: Moq.Mock.Of(), + sessionId: "session-1", + warehouseId: "wh-1", + catalog: null, + schema: null, + resultDisposition: "INLINE_OR_EXTERNAL_LINKS", + resultFormat: "ARROW_STREAM", + resultCompression: null, + waitTimeoutSeconds: 0, + pollingIntervalMs: 50, + properties: CreateBaseProperties(), + recyclableMemoryStreamManager: new Microsoft.IO.RecyclableMemoryStreamManager(), + lz4BufferPool: System.Buffers.ArrayPool.Shared, + httpClient: new System.Net.Http.HttpClient(), + connection: connection); + + IStatementOperationObserver observer = GetObserverField(statement); + Assert.Same(NullObserver.Instance, observer); + } + + // ── Injection from StatementExecutionConnection.CreateStatement ──────────── + + [Fact] + public void CreateStatement_TelemetryDisabled_InjectsNullObserver() + { + // Without a telemetry session (the default for a freshly-constructed connection + // before OpenAsync runs and InitializeTelemetry has wired the real + // ConnectionTelemetry), CreateStatement must fall back to the singleton + // NullObserver. This is what keeps disabled-telemetry zero-cost: no allocation + // per statement, no null-checks at the hookpoint callsites. + using var connection = CreateConnection(); + + // Sanity: the default state of the connection has no telemetry session. + Assert.Null(connection.TelemetrySession); + + using AdbcStatement statement = connection.CreateStatement(); + var seaStatement = Assert.IsType(statement); + + IStatementOperationObserver observer = GetObserverField(seaStatement); + Assert.Same(NullObserver.Instance, observer); + } + + [Fact] + public void CreateStatement_TelemetrySessionWithoutClient_InjectsNullObserver() + { + // Defensive case: a TelemetrySession exists but its TelemetryClient is null + // (telemetry was opted-in but circuit-broken, or the client never initialized). + // CreateStatement must still fall back to NullObserver — building a real + // TelemetryObserver against a null client would later no-op anyway, but we + // skip the allocation entirely to keep this path zero-cost. + using var connection = CreateConnection(); + var sessionWithoutClient = new TelemetrySessionContext + { + SessionId = "sea-session-no-client", + TelemetryClient = null, + }; + connection.TelemetryForTesting = new TelemetryAdapter(sessionWithoutClient); + + using AdbcStatement statement = connection.CreateStatement(); + var seaStatement = Assert.IsType(statement); + + IStatementOperationObserver observer = GetObserverField(seaStatement); + Assert.Same(NullObserver.Instance, observer); + } + + [Fact] + public void CreateStatement_TelemetryEnabled_InjectsTelemetryObserver() + { + // When the connection has a live telemetry session (Session non-null AND + // TelemetryClient non-null), CreateStatement constructs a per-statement + // TelemetryObserver bound to that session. The observer's underlying + // StatementTelemetryContext must reference the same TelemetrySessionContext + // the connection negotiated, so subsequent hookpoint commits can mutate + // poll/first-batch fields on the same context that BuildTelemetryLog later reads. + using var connection = CreateConnection(); + var client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateSeaSession(client); + connection.TelemetryForTesting = new TelemetryAdapter(session); + + using AdbcStatement statement = connection.CreateStatement(); + var seaStatement = Assert.IsType(statement); + + IStatementOperationObserver observer = GetObserverField(seaStatement); + TelemetryObserver typed = Assert.IsType(observer); + + // The observer's underlying context must be bound to the connection's session + // so subsequent poller/reader mutations and the final BuildTelemetryLog read + // the same SessionId/WorkspaceId the connection negotiated. + FieldInfo? sessionField = typeof(StatementTelemetryContext) + .GetField("_sessionContext", BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(sessionField); + Assert.Same(session, sessionField!.GetValue(typed.Context)); + } + + [Fact] + public void CreateStatement_TelemetryEnabled_EachCallInjectsFreshObserver() + { + // Each statement must get its own TelemetryObserver instance so per-statement + // state (statementId, poll counts, error info) does not bleed across statements + // sharing the connection. Confirm distinct observer instances back-to-back. + using var connection = CreateConnection(); + var client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateSeaSession(client); + connection.TelemetryForTesting = new TelemetryAdapter(session); + + using AdbcStatement first = connection.CreateStatement(); + using AdbcStatement second = connection.CreateStatement(); + + IStatementOperationObserver obs1 = GetObserverField((StatementExecutionStatement)first); + IStatementOperationObserver obs2 = GetObserverField((StatementExecutionStatement)second); + + Assert.IsType(obs1); + Assert.IsType(obs2); + Assert.NotSame(obs1, obs2); + } + } +} diff --git a/csharp/test/Unit/StatementExecution/StatementExecutionStatementObserverTests.cs b/csharp/test/Unit/StatementExecution/StatementExecutionStatementObserverTests.cs new file mode 100644 index 00000000..6569edd0 --- /dev/null +++ b/csharp/test/Unit/StatementExecution/StatementExecutionStatementObserverTests.cs @@ -0,0 +1,1850 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Collections.Generic; +using System.IO; +using System.Linq; +using System.Net; +using System.Net.Http; +using System.Text.Json; +using System.Threading; +using System.Threading.Tasks; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using AdbcDrivers.Databricks.StatementExecution; +using AdbcDrivers.Databricks.Telemetry; +using AdbcDrivers.HiveServer2.Spark; +using Apache.Arrow; +using Apache.Arrow.Adbc; +using Apache.Arrow.Ipc; +using Apache.Arrow.Types; +using Microsoft.IO; +using Moq; +using Moq.Protected; +using Xunit; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; + +namespace AdbcDrivers.Databricks.Tests.Unit.StatementExecution +{ + /// + /// Verifies the observer hookpoints wired into : + /// OnExecuteStarted fires before the server call, OnExecuteSucceeded fires + /// after the response is received with the server-assigned statement id, + /// OnPollCompleted fires exactly once on terminal poll state with the accumulated + /// (count, latencyMs), and OnError fires on any failure path. These tests use a + /// recording fake observer so we can assert exact call order and arguments — production + /// pipes the same calls into a real , which is exercised + /// by separate telemetry tests. + /// + public class StatementExecutionStatementObserverTests + { + private const string StatementId = "stmt-observer-test"; + + // ── Recording fake observer ──────────────────────────────────────────────── + + /// + /// Captures every observer call along with its arguments and the order in which it + /// occurred. Implements the fail-open contract by never throwing from a method body. + /// + private sealed class RecordingObserver : IStatementOperationObserver + { + public readonly List Calls = new(); + public StatementType? ExecuteStartedStmtType; + public OperationType? ExecuteStartedOpType; + public bool? ExecuteStartedIsCompressed; + public string? ExecuteSucceededStatementId; + public ExecutionResultFormat? ExecuteSucceededFormat; + public int? PollCount; + public long? PollLatencyMs; + public Exception? Error; + public int OnPollCompletedCallCount; + + public void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed) + { + ExecuteStartedStmtType = stmtType; + ExecuteStartedOpType = opType; + ExecuteStartedIsCompressed = isCompressed; + Calls.Add(nameof(OnExecuteStarted)); + } + + public void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat) + { + ExecuteSucceededStatementId = statementId; + ExecuteSucceededFormat = resultFormat; + Calls.Add(nameof(OnExecuteSucceeded)); + } + + public void OnPollCompleted(int count, long latencyMs) + { + PollCount = count; + PollLatencyMs = latencyMs; + OnPollCompletedCallCount++; + Calls.Add(nameof(OnPollCompleted)); + } + + public Action? OnFirstBatchReadyCallback; + public void OnFirstBatchReady(long latencyMs) + { + OnFirstBatchReadyCallback?.Invoke(latencyMs); + Calls.Add(nameof(OnFirstBatchReady)); + } + public Action? OnConsumedCallback; + public void OnConsumed(long latencyMs) + { + OnConsumedCallback?.Invoke(latencyMs); + Calls.Add(nameof(OnConsumed)); + } + public ChunkMetrics? CapturedChunkMetrics; + public void OnChunksDownloaded(ChunkMetrics metrics) + { + CapturedChunkMetrics = metrics; + Calls.Add(nameof(OnChunksDownloaded)); + } + + public void OnError(Exception ex) + { + Error = ex; + Calls.Add(nameof(OnError)); + } + + public void OnFinalized() => Calls.Add(nameof(OnFinalized)); + } + + // ── Helpers ──────────────────────────────────────────────────────────────── + + private static StatementExecutionStatement CreateStatement( + IStatementExecutionClient client, + IStatementOperationObserver observer, + string? resultCompression = null, + int pollingIntervalMs = 1) + { + return CreateStatementWithConnection( + client, observer, out _, resultCompression, pollingIntervalMs); + } + + /// + /// Overload that also returns the owning connection so tests can inject a fake + /// via + /// and assert on connection-level telemetry events (CREATE_SESSION, DELETE_SESSION, + /// CLOSE_STATEMENT) emitted by the statement's lifecycle paths. + /// + private static StatementExecutionStatement CreateStatementWithConnection( + IStatementExecutionClient client, + IStatementOperationObserver observer, + out StatementExecutionConnection connection, + string? resultCompression = null, + int pollingIntervalMs = 1) + { + var properties = new Dictionary + { + { SparkParameters.HostName, "test.databricks.com" }, + { DatabricksParameters.WarehouseId, "wh-1" }, + { SparkParameters.AccessToken, "token" }, + }; + + // The StatementExecutionConnection constructor wants an HttpClient. Wire a default + // OK response so connection construction does not blow up; the test itself talks to + // the mock IStatementExecutionClient, never through this HttpClient. + var handlerMock = new Mock(); + handlerMock.Protected() + .Setup>("SendAsync", + ItExpr.IsAny(), + ItExpr.IsAny()) + .ReturnsAsync(new HttpResponseMessage(HttpStatusCode.OK) + { + Content = new StringContent(JsonSerializer.Serialize(new { session_id = "s1" })) + }); + var httpClient = new HttpClient(handlerMock.Object); + + connection = new StatementExecutionConnection(properties, httpClient); + return new StatementExecutionStatement( + client, + sessionId: "session-1", + warehouseId: "wh-1", + catalog: null, + schema: null, + resultDisposition: "INLINE_OR_EXTERNAL_LINKS", + resultFormat: "ARROW_STREAM", + resultCompression: resultCompression, + waitTimeoutSeconds: 0, + // Tiny poll interval so multi-iteration tests don't take seconds. Hookpoint + // semantics are independent of the interval. + pollingIntervalMs: pollingIntervalMs, + properties: properties, + recyclableMemoryStreamManager: new RecyclableMemoryStreamManager(), + lz4BufferPool: System.Buffers.ArrayPool.Shared, + httpClient: httpClient, + connection: connection, + observer: observer); + } + + /// + /// Records every EmitOperationTelemetry call so tests can assert which + /// connection-level operation events fired and with what payload. Mirrors the + /// RecordingTelemetry fake used in StatementExecutionConnectionTelemetryTests. + /// + private sealed class RecordingTelemetry : IConnectionTelemetry + { + public List Calls { get; } = new(); + public List Errors { get; } = new(); + public List StatementIds { get; } = new(); + public List StatementTypes { get; } = new(); + public Dictionary LatenciesByOp { get; } = new(); + public TelemetrySessionContext? Session { get; } + = new TelemetrySessionContext { SessionId = "sea-session-1" }; + + public T ExecuteWithMetadataTelemetry( + OperationType operationType, + Func operation, + System.Diagnostics.Activity? activity) => operation(); + + public void EmitOperationTelemetry( + OperationType operationType, + StatementType statementType, + string? statementId, + long elapsedMs, + Exception? error) + { + Calls.Add(operationType); + Errors.Add(error); + StatementIds.Add(statementId); + StatementTypes.Add(statementType); + LatenciesByOp[operationType] = elapsedMs; + } + + public Task DisposeAsync() => Task.CompletedTask; + } + + private static ResultManifest BuildManifestWithSingleColumn() + { + return new ResultManifest + { + Format = "ARROW_STREAM", + Schema = new ResultSchema + { + Columns = new List + { + new() { Name = "c0", TypeName = "INT", TypeText = "INT" } + } + }, + TotalRowCount = 0, + Chunks = new List(), + }; + } + + // ── Tests ────────────────────────────────────────────────────────────────── + + [Fact] + public async Task ExecuteQuery_CallsOnExecuteStarted_BeforeClient() + { + // OnExecuteStarted must fire before the SEA client's ExecuteStatementAsync; once the + // statement is on the wire it is too late to record the intent. We assert the order by + // capturing the observer's call log inside the mock's setup callback, so the recorded + // log reflects the state of the observer at the moment the client method was invoked. + var observer = new RecordingObserver(); + string[]? callsAtExecuteTime = null; + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .Callback((_, _) => + { + // Snapshot the observer's call log at the moment the client call is invoked. + callsAtExecuteTime = observer.Calls.ToArray(); + }) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = BuildManifestWithSingleColumn(), + Result = new ResultData { Attachment = null }, + }); + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + await stmt.ExecuteQueryAsync(CancellationToken.None); + + Assert.NotNull(callsAtExecuteTime); + Assert.Single(callsAtExecuteTime!); + Assert.Equal(nameof(IStatementOperationObserver.OnExecuteStarted), callsAtExecuteTime![0]); + + // Non-metadata path: stmtType is Query, opType is ExecuteStatementAsync. SEA is + // always async on the wire (submit + poll), so the operation_type recorded in + // telemetry must be EXECUTE_STATEMENT_ASYNC, distinct from the synchronous + // EXECUTE_STATEMENT that the Thrift path (DatabricksStatement) emits. + Assert.Equal(StatementType.Query, observer.ExecuteStartedStmtType); + Assert.Equal(OperationType.ExecuteStatementAsync, observer.ExecuteStartedOpType); + // resultCompression was null in this statement, so isCompressed must be false. + Assert.False(observer.ExecuteStartedIsCompressed); + } + + [Fact] + public async Task ExecuteQuery_OnExecuteStarted_WithPendingMetadataOperation_EmitsMetadataPair() + { + // PECO-3022 gap B8: when a sub-statement is created internally by the connection's + // metadata helpers (ExecuteMetadataSqlAsync / ExecuteShowColumnsAsync), it is + // stamped with a pending metadata OperationType. The next OnExecuteStarted hook + // must emit (StatementType.Metadata, ListXxx) rather than the default + // (StatementType.Query, ExecuteStatementAsync) — mirroring the Thrift parity model + // in DatabricksStatement.BeginExecuteTelemetry. + var observer = new RecordingObserver(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = BuildManifestWithSingleColumn(), + Result = new ResultData { Attachment = null }, + }); + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SHOW CATALOGS"; + stmt.SetPendingMetadataOperation(OperationType.ListCatalogs); + + await stmt.ExecuteQueryAsync(CancellationToken.None); + + Assert.Equal(StatementType.Metadata, observer.ExecuteStartedStmtType); + Assert.Equal(OperationType.ListCatalogs, observer.ExecuteStartedOpType); + } + + [Theory] + [InlineData("getcatalogs", OperationType.ListCatalogs)] + [InlineData("getschemas", OperationType.ListSchemas)] + [InlineData("gettables", OperationType.ListTables)] + [InlineData("getcolumns", OperationType.ListColumns)] + [InlineData("getcolumnsextended", OperationType.ListColumns)] + [InlineData("gettabletypes", OperationType.ListTableTypes)] + [InlineData("getprimarykeys", OperationType.ListPrimaryKeys)] + [InlineData("getcrossreference", OperationType.ListCrossReferences)] + [InlineData("GETCATALOGS", OperationType.ListCatalogs)] // case-insensitive + public void SeaMetadataOperationMapper_Map_KnownCommands_ReturnsExpectedOperationType( + string sqlQuery, OperationType expected) + { + // PECO-3022 gap B8: mapper mirrors DatabricksStatement.GetMetadataOperationType. + // Used by ExecuteMetadataCommandAsync to thread the right OperationType through + // the sub-statement so telemetry emits the correct (Metadata, ListXxx) pair. + Assert.Equal(expected, SeaMetadataOperationMapper.Map(sqlQuery)); + } + + [Theory] + [InlineData(null)] + [InlineData("")] + [InlineData("SELECT 1")] + [InlineData("unknown-command")] + public void SeaMetadataOperationMapper_Map_UnknownInput_ReturnsNull(string? sqlQuery) + { + // Falls back to null so callers know to use the regular query OperationType. + Assert.Null(SeaMetadataOperationMapper.Map(sqlQuery)); + } + + [Fact] + public async Task ExecuteQuery_OnExecuteStarted_PassesIsCompressedFromCompressionRequest() + { + // Sanity: a statement built with resultCompression=LZ4_FRAME forwards isCompressed=true + // to the observer. The downstream manifest may override based on what the server actually + // returned, but the first signal reflects what the client asked for. + var observer = new RecordingObserver(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = BuildManifestWithSingleColumn(), + Result = new ResultData { Attachment = null }, + }); + + using var stmt = CreateStatement(mockClient.Object, observer, resultCompression: "LZ4_FRAME"); + stmt.SqlQuery = "SELECT 1"; + + await stmt.ExecuteQueryAsync(CancellationToken.None); + + Assert.True(observer.ExecuteStartedIsCompressed); + } + + [Fact] + public async Task ExecuteQuery_CallsOnExecuteSucceeded_WithStatementId() + { + // OnExecuteSucceeded must fire once the server has accepted the statement and a + // statement id is known, carrying that id forward to the observer. The result format + // is derived by SeaResultFormatMapper from (disposition, format, response): with + // disposition=INLINE_OR_EXTERNAL_LINKS and a manifest carrying no external_links, + // this maps to InlineArrow (the auto-disposition + inline-attachment cell of §8). + var observer = new RecordingObserver(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = BuildManifestWithSingleColumn(), + Result = new ResultData { Attachment = null }, + }); + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + await stmt.ExecuteQueryAsync(CancellationToken.None); + + Assert.Equal(StatementId, observer.ExecuteSucceededStatementId); + // SeaResultFormatMapper now populates a real value; gap-2 verifies the callsite + // no longer passes the Unspecified placeholder. + Assert.NotNull(observer.ExecuteSucceededFormat); + Assert.NotEqual(ExecutionResultFormat.Unspecified, observer.ExecuteSucceededFormat); + Assert.Equal(ExecutionResultFormat.InlineArrow, observer.ExecuteSucceededFormat); + + // OnExecuteStarted must precede OnExecuteSucceeded — order matters for telemetry record + // assembly downstream. + int startedIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnExecuteStarted)); + int succeededIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnExecuteSucceeded)); + Assert.True(startedIndex >= 0); + Assert.True(succeededIndex > startedIndex); + } + + [Fact] + public async Task Poll_CallsOnPollCompleted_OnceOnTerminalState_WithAccumulatedCount() + { + // OnPollCompleted is emitted exactly once when the polling loop reaches a terminal + // state, with the accumulated poll count. Setup: initial Execute returns PENDING (so + // the statement code enters the poll loop), GetStatement returns RUNNING twice then + // SUCCEEDED — that is three GetStatement calls, so PollCount must equal 3. + var observer = new RecordingObserver(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "PENDING" }, + }); + + var pollResponses = new Queue(); + pollResponses.Enqueue(new GetStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "RUNNING" } + }); + pollResponses.Enqueue(new GetStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "RUNNING" } + }); + pollResponses.Enqueue(new GetStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = BuildManifestWithSingleColumn(), + Result = new ResultData { Attachment = null }, + }); + + mockClient + .Setup(c => c.GetStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(() => pollResponses.Dequeue()); + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + await stmt.ExecuteQueryAsync(CancellationToken.None); + + // Exactly one OnPollCompleted call. Repeated emission would inflate poll_count downstream. + Assert.Equal(1, observer.OnPollCompletedCallCount); + Assert.Equal(3, observer.PollCount); + // latencyMs is wall-clock and can validly be 0 on fast in-process mocks, but it must be + // non-negative — anything else indicates a stopwatch bug. + Assert.NotNull(observer.PollLatencyMs); + Assert.True(observer.PollLatencyMs >= 0); + + // OnPollCompleted must arrive between OnExecuteSucceeded and any error/finalize signal: + // the contract is that polling happens after the server has assigned a statement id. + int succeededIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnExecuteSucceeded)); + int pollCompletedIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnPollCompleted)); + Assert.True(succeededIndex >= 0); + Assert.True(pollCompletedIndex > succeededIndex); + } + + [Fact] + public async Task ExecuteQuery_ErrorPath_CallsOnError() + { + // Any failure inside ExecuteQueryInternalAsync must route through OnError: the catch + // block wrapping the body is the only place that translates execute-time exceptions + // into the observer's error signal. Use a server FAILED response which the statement + // converts to an AdbcException — this exercises the post-OnExecuteSucceeded error path + // (statement id is assigned, then the terminal state is FAILED). + var observer = new RecordingObserver(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus + { + State = "FAILED", + Error = new StatementError + { + Message = "SQL syntax error", + ErrorCode = "SYNTAX_ERROR" + } + }, + }); + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "NOT VALID SQL"; + + await Assert.ThrowsAsync(() => stmt.ExecuteQueryAsync(CancellationToken.None)); + + // OnError must have fired exactly once carrying the originating exception, and must + // arrive after OnExecuteStarted (the statement at least began before failing). + Assert.NotNull(observer.Error); + Assert.IsType(observer.Error); + Assert.Contains(nameof(IStatementOperationObserver.OnError), observer.Calls); + + int startedIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnExecuteStarted)); + int errorIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnError)); + Assert.True(startedIndex >= 0); + Assert.True(errorIndex > startedIndex); + } + + [Fact] + public async Task ExecuteQuery_ClientThrows_CallsOnErrorBeforeSucceeded() + { + // When the ExecuteStatementAsync call itself throws (network error, auth error, ...), + // OnExecuteSucceeded must never fire — there is no statement id yet — and OnError + // must carry the original exception forward. + var observer = new RecordingObserver(); + var networkError = new HttpRequestException("connection refused"); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ThrowsAsync(networkError); + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + var ex = await Assert.ThrowsAsync( + () => stmt.ExecuteQueryAsync(CancellationToken.None)); + + Assert.Same(networkError, ex); + Assert.Same(networkError, observer.Error); + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnExecuteSucceeded), observer.Calls); + Assert.Contains(nameof(IStatementOperationObserver.OnError), observer.Calls); + } + + [Fact] + public async Task Dispose_AfterSuccessfulExecute_CallsOnFinalizedExactlyOnce() + { + // OnFinalized is the terminal observer signal — it is the only path that builds an + // OssSqlDriverTelemetryLog and enqueues it for export. After a successful execute, + // Dispose must fire OnFinalized exactly once so SEA telemetry actually reaches + // eng_lumberjack. Without this call every other hookpoint just mutates an in-memory + // context that is garbage-collected on dispose. + var observer = new RecordingObserver(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = BuildManifestWithSingleColumn(), + Result = new ResultData { Attachment = null }, + }); + // Stub CloseStatementAsync so Dispose's awaited Task does not NRE; the production + // dispose path swallows close errors but we want the observer call to be the only + // assertable side-effect of dispose here. + mockClient + .Setup(c => c.CloseStatementAsync( + It.IsAny(), + It.IsAny())) + .Returns(Task.CompletedTask); + + var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + await stmt.ExecuteQueryAsync(CancellationToken.None); + + // Pre-dispose: OnFinalized must not have fired yet — production code defers it to + // Dispose so chunk-metrics / consumed-time can be captured from the reader first. + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnFinalized), observer.Calls); + + stmt.Dispose(); + + int finalizeCalls = observer.Calls.Count(c => c == nameof(IStatementOperationObserver.OnFinalized)); + Assert.Equal(1, finalizeCalls); + // OnFinalized must be the last observer call: anything after it would mutate an + // already-emitted log and never reach the wire. + Assert.Equal(nameof(IStatementOperationObserver.OnFinalized), observer.Calls[observer.Calls.Count - 1]); + } + + [Fact] + public async Task Dispose_AfterErrorPath_CallsOnFinalizedOnce() + { + // Error path: ExecuteQueryInternalAsync's catch fired OnError. Dispose must still + // fire OnFinalized so the error log reaches eng_lumberjack — without this call the + // error case produces no telemetry at all. The TelemetryObserver enforces exactly- + // once finalize via Interlocked.CompareExchange, so even if a future hookpoint adds + // its own finalize call on the error path, the dispose-time call here remains + // idempotent against the real observer; this test asserts the recorder sees a + // single Dispose-driven OnFinalized. + var observer = new RecordingObserver(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus + { + State = "FAILED", + Error = new StatementError + { + Message = "SQL syntax error", + ErrorCode = "SYNTAX_ERROR" + } + }, + }); + mockClient + .Setup(c => c.CloseStatementAsync( + It.IsAny(), + It.IsAny())) + .Returns(Task.CompletedTask); + + var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "NOT VALID SQL"; + + await Assert.ThrowsAsync(() => stmt.ExecuteQueryAsync(CancellationToken.None)); + + // Sanity: error path fired OnError but not OnFinalized — the latter is dispose-driven. + Assert.Contains(nameof(IStatementOperationObserver.OnError), observer.Calls); + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnFinalized), observer.Calls); + + stmt.Dispose(); + + int finalizeCalls = observer.Calls.Count(c => c == nameof(IStatementOperationObserver.OnFinalized)); + Assert.Equal(1, finalizeCalls); + // OnError must precede OnFinalized so the terminal log reflects the failure state. + int errorIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnError)); + int finalizeIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnFinalized)); + Assert.True(errorIndex >= 0); + Assert.True(finalizeIndex > errorIndex); + } + + [Fact] + public async Task ExecuteQuery_InlinePath_CallsOnFirstBatchReady_OnceWithNonNegativeLatency() + { + // OnFirstBatchReady is wired at reader construction (gap G3 / design §6 row 4). For the + // inline path the signal fires once chunk-0 attachment bytes are already in the response + // — i.e. immediately before the InlineArrowStreamReader ctor — carrying elapsed-since- + // execute-start as latencyMs. This test pins: + // 1. exactly-once invocation, + // 2. non-negative latency (wall-clock; zero is valid on fast in-process mocks), + // 3. ordering between OnExecuteSucceeded and OnFirstBatchReady (server must accept the + // statement before first batch can be "ready"). + var observer = new RecordingObserver(); + var (ipcBytes, manifest) = BuildSingleColumnInlineArrowResult(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = manifest, + Result = new ResultData { Attachment = ipcBytes }, + }); + + long? capturedLatency = null; + observer.OnFirstBatchReadyCallback = ms => capturedLatency = ms; + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + var result = await stmt.ExecuteQueryAsync(CancellationToken.None); + // Dispose the returned stream so we don't leak the underlying ArrowStreamReader. + result.Stream?.Dispose(); + + int firstBatchCallCount = observer.Calls + .Count(c => c == nameof(IStatementOperationObserver.OnFirstBatchReady)); + Assert.Equal(1, firstBatchCallCount); + + // Non-negative wall-clock: anything else indicates a stopwatch wiring bug + // (e.g. read before Start()). + Assert.NotNull(capturedLatency); + Assert.True(capturedLatency >= 0, + $"OnFirstBatchReady latency must be non-negative, got {capturedLatency}."); + + // OnExecuteSucceeded must precede OnFirstBatchReady — the statement is accepted by the + // server first, then results become available. Reversing this order would imply we are + // reporting first-batch latency for a statement the server hasn't acknowledged. + int succeededIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnExecuteSucceeded)); + int firstBatchIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnFirstBatchReady)); + Assert.True(succeededIndex >= 0); + Assert.True(firstBatchIndex > succeededIndex); + } + + [Fact] + public async Task ExecuteQuery_CloudFetchPath_CallsOnFirstBatchReady_OnceWithNonNegativeLatency() + { + // CloudFetch counterpart to the inline test above. With Manifest.Chunks[0].ExternalLinks + // populated, CreateReader routes through CreateCloudFetchReader, which fires + // OnFirstBatchReady at the top of the method before invoking the factory. We don't drive + // the download itself (the factory's HTTP calls go through a mocked handler) — the test + // only pins the observer wiring at reader construction. + var observer = new RecordingObserver(); + long? capturedLatency = null; + observer.OnFirstBatchReadyCallback = ms => capturedLatency = ms; + + var manifest = new ResultManifest + { + Format = "ARROW_STREAM", + Schema = new ResultSchema + { + Columns = new List + { + new() { Name = "c0", TypeName = "INT", TypeText = "INT" } + } + }, + TotalRowCount = 0, + TotalChunkCount = 1, + Chunks = new List + { + new() + { + ChunkIndex = 0, + RowCount = 0, + RowOffset = 0, + ByteCount = 0, + ExternalLinks = new List + { + // URL is not actually downloaded by this test: the CloudFetch factory + // queues a background fetch through the mocked HttpClient. Dispose + // cancels any in-flight work. + new() { ExternalLinkUrl = "https://example.invalid/chunk0" } + } + } + }, + }; + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = manifest, + Result = new ResultData + { + ExternalLinks = new List + { + new() { ExternalLinkUrl = "https://example.invalid/chunk0" } + } + }, + }); + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + var result = await stmt.ExecuteQueryAsync(CancellationToken.None); + // Dispose the returned stream so the background download manager shuts down cleanly + // without leaving tasks attempting to hit example.invalid. + result.Stream?.Dispose(); + + int firstBatchCallCount = observer.Calls + .Count(c => c == nameof(IStatementOperationObserver.OnFirstBatchReady)); + Assert.Equal(1, firstBatchCallCount); + + Assert.NotNull(capturedLatency); + Assert.True(capturedLatency >= 0, + $"OnFirstBatchReady latency must be non-negative, got {capturedLatency}."); + + int succeededIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnExecuteSucceeded)); + int firstBatchIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnFirstBatchReady)); + Assert.True(succeededIndex >= 0); + Assert.True(firstBatchIndex > succeededIndex); + } + + [Fact] + public async Task ExecuteQuery_InlinePath_ReaderDispose_CallsOnConsumed_OnceWithLatencyAtLeastFirstBatchReady() + { + // OnConsumed is wired at the outermost reader-decorator Dispose (gap G3 / design §6 row 5). + // For the inline path this fires when the consumer disposes the IArrowArrayStream returned + // by ExecuteQuery. This test pins: + // 1. exactly-once invocation (idempotent on repeated Dispose), + // 2. latency monotonicity: OnConsumed latency >= OnFirstBatchReady latency, because both + // read the same execute-time Stopwatch and Dispose strictly follows reader construction, + // 3. ordering: OnConsumed fires after OnFirstBatchReady (i.e. reader construction precedes + // consumption end). + var observer = new RecordingObserver(); + var (ipcBytes, manifest) = BuildSingleColumnInlineArrowResult(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = manifest, + Result = new ResultData { Attachment = ipcBytes }, + }); + + long? firstBatchLatency = null; + long? consumedLatency = null; + observer.OnFirstBatchReadyCallback = ms => firstBatchLatency = ms; + observer.OnConsumedCallback = ms => consumedLatency = ms; + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + var result = await stmt.ExecuteQueryAsync(CancellationToken.None); + + // Before Dispose, OnConsumed must NOT have fired — the consumer has not signaled + // end-of-consumption yet. Asserting absence here guards against a wiring bug that + // would fire OnConsumed at reader construction (effectively duplicating + // OnFirstBatchReady). + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnConsumed), observer.Calls); + + // First Dispose triggers OnConsumed; second Dispose must be a no-op (idempotency). + result.Stream?.Dispose(); + result.Stream?.Dispose(); + + int consumedCallCount = observer.Calls + .Count(c => c == nameof(IStatementOperationObserver.OnConsumed)); + Assert.Equal(1, consumedCallCount); + + Assert.NotNull(firstBatchLatency); + Assert.NotNull(consumedLatency); + Assert.True(consumedLatency >= firstBatchLatency, + $"OnConsumed latency ({consumedLatency}) must be >= OnFirstBatchReady latency ({firstBatchLatency})."); + + // OnFirstBatchReady must precede OnConsumed: the reader can't be consumed before + // it exists. Reversing this order would imply Dispose ran before construction. + int firstBatchIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnFirstBatchReady)); + int consumedIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnConsumed)); + Assert.True(firstBatchIndex >= 0); + Assert.True(consumedIndex > firstBatchIndex); + } + + [Fact] + public async Task ExecuteQuery_CloudFetchPath_ReaderDispose_CallsOnConsumed_OnceWithLatencyAtLeastFirstBatchReady() + { + // CloudFetch counterpart to the inline test above. The outermost ConsumptionObservingStream + // wraps the CloudFetchReader, so the consumer's Dispose still drives OnConsumed even though + // the inner reader's actual download work is async — we don't need to wait for chunk fetches + // to complete to validate the observer wiring. + var observer = new RecordingObserver(); + long? firstBatchLatency = null; + long? consumedLatency = null; + observer.OnFirstBatchReadyCallback = ms => firstBatchLatency = ms; + observer.OnConsumedCallback = ms => consumedLatency = ms; + + var manifest = new ResultManifest + { + Format = "ARROW_STREAM", + Schema = new ResultSchema + { + Columns = new List + { + new() { Name = "c0", TypeName = "INT", TypeText = "INT" } + } + }, + TotalRowCount = 0, + TotalChunkCount = 1, + Chunks = new List + { + new() + { + ChunkIndex = 0, + RowCount = 0, + RowOffset = 0, + ByteCount = 0, + ExternalLinks = new List + { + // URL is not actually downloaded — Dispose cancels any in-flight work. + new() { ExternalLinkUrl = "https://example.invalid/chunk0" } + } + } + }, + }; + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = manifest, + Result = new ResultData + { + ExternalLinks = new List + { + new() { ExternalLinkUrl = "https://example.invalid/chunk0" } + } + }, + }); + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + var result = await stmt.ExecuteQueryAsync(CancellationToken.None); + + // Pre-Dispose absence: same rationale as the inline test — guards against firing + // OnConsumed at construction time. + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnConsumed), observer.Calls); + + // Idempotent Dispose. The second call must not produce a second OnConsumed, + // otherwise downstream telemetry would double-count consumption latency for a + // consumer that defensively disposes multiple times. + result.Stream?.Dispose(); + result.Stream?.Dispose(); + + int consumedCallCount = observer.Calls + .Count(c => c == nameof(IStatementOperationObserver.OnConsumed)); + Assert.Equal(1, consumedCallCount); + + Assert.NotNull(firstBatchLatency); + Assert.NotNull(consumedLatency); + Assert.True(consumedLatency >= firstBatchLatency, + $"OnConsumed latency ({consumedLatency}) must be >= OnFirstBatchReady latency ({firstBatchLatency})."); + + int firstBatchIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnFirstBatchReady)); + int consumedIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnConsumed)); + Assert.True(firstBatchIndex >= 0); + Assert.True(consumedIndex > firstBatchIndex); + } + + [Fact] + public async Task ExecuteQuery_CloudFetchPath_ReaderDispose_CallsOnChunksDownloaded_OnceWithNonNullMetrics() + { + // gap G3 / design §6 row 5–6: on the CloudFetch path, ConsumptionObservingStream + // must signal OnChunksDownloaded exactly once when the consumer disposes the result + // stream. This is the SEA counterpart to DatabricksStatement.FinalizeExecuteTelemetry's + // Thrift-side emission, and it shares the same observer downstream — the + // OssSqlDriverTelemetryLog's chunk_details field comes from this call. + // + // The test deliberately does NOT drive an actual chunk download (example.invalid never + // resolves), so the captured ChunkMetrics will be the aggregator's default/empty state. + // The contract is "fire exactly once with a non-null ChunkMetrics", not "fire only when + // we successfully downloaded chunks" — the proto fields are nullable so an empty + // ChunkMetrics is a valid wire payload, and dropping the signal here would silently + // omit the field for any CloudFetch query that the consumer disposes early. + var observer = new RecordingObserver(); + + var manifest = new ResultManifest + { + Format = "ARROW_STREAM", + Schema = new ResultSchema + { + Columns = new List + { + new() { Name = "c0", TypeName = "INT", TypeText = "INT" } + } + }, + TotalRowCount = 0, + TotalChunkCount = 1, + Chunks = new List + { + new() + { + ChunkIndex = 0, + RowCount = 0, + RowOffset = 0, + ByteCount = 0, + ExternalLinks = new List + { + new() { ExternalLinkUrl = "https://example.invalid/chunk0" } + } + } + }, + }; + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = manifest, + Result = new ResultData + { + ExternalLinks = new List + { + new() { ExternalLinkUrl = "https://example.invalid/chunk0" } + } + }, + }); + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + var result = await stmt.ExecuteQueryAsync(CancellationToken.None); + + // Pre-Dispose: OnChunksDownloaded must not have fired yet — the wrapper defers it to + // consumer Dispose so the aggregator has a chance to accumulate chunk timings. + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnChunksDownloaded), observer.Calls); + + // First Dispose fires OnChunksDownloaded; second Dispose must be a no-op (idempotency). + // Double-firing would cause TelemetryObserver to overwrite the chunk_details on the + // in-flight log, which is the right behavior in the singleton case but masks a wiring + // bug if Dispose is called twice. + result.Stream?.Dispose(); + result.Stream?.Dispose(); + + int chunksDownloadedCallCount = observer.Calls + .Count(c => c == nameof(IStatementOperationObserver.OnChunksDownloaded)); + Assert.Equal(1, chunksDownloadedCallCount); + + // The captured metrics must be a non-null ChunkMetrics — even when no chunks were + // actually downloaded the wrapper falls back to a fresh ChunkMetrics rather than + // dropping the signal. Pinning non-null guards against a wiring regression that + // would pass `default(ChunkMetrics)` (== null for a reference type) and silently + // null-coalesce on the receiving end. + Assert.NotNull(observer.CapturedChunkMetrics); + + // Emission order vs. OnConsumed: ConsumptionObservingStream emits OnChunksDownloaded + // BEFORE OnConsumed to match the Thrift path's emission order in + // DatabricksStatement.FinalizeExecuteTelemetry. Both end up on the same telemetry log + // regardless, but pinning the order here keeps SEA and Thrift behaviorally identical. + int chunksIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnChunksDownloaded)); + int consumedIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnConsumed)); + Assert.True(chunksIndex >= 0); + Assert.True(consumedIndex > chunksIndex, + $"OnChunksDownloaded (index {chunksIndex}) must precede OnConsumed (index {consumedIndex})."); + } + + [Fact] + public async Task ExecuteQuery_CloudFetchPath_ReaderDispose_FiresOnChunksDownloaded_WithEmptyMetricsWhenAggregatorEmpty() + { + // Gap-fix dependency fallback (gap G3 / design §6 row 6): if the ChunkMetrics + // aggregator is unavailable or empty at Dispose time, ConsumptionObservingStream + // must still fire OnChunksDownloaded with a non-null, default-valued ChunkMetrics + // rather than throw or skip the signal. The design's stated fallback contract is + // "pass ChunkMetrics.Empty" — operationally that's a fresh `new ChunkMetrics()`, + // which is what CloudFetchReader.GetChunkMetrics returns when its download manager + // has been disposed and no _cachedChunkMetrics was captured. + // + // Scenario covered: consumer disposes the result stream before any chunk has + // actually been iterated by the reader. The CloudFetchDownloader aggregator state + // is still at its initial defaults (TotalChunksIterated=0, etc.) — i.e. effectively + // "no metrics yet" — and the wrapper must tolerate that without throwing. + var observer = new RecordingObserver(); + + var manifest = new ResultManifest + { + Format = "ARROW_STREAM", + Schema = new ResultSchema + { + Columns = new List + { + new() { Name = "c0", TypeName = "INT", TypeText = "INT" } + } + }, + TotalRowCount = 0, + TotalChunkCount = 1, + Chunks = new List + { + new() + { + ChunkIndex = 0, + RowCount = 0, + RowOffset = 0, + ByteCount = 0, + ExternalLinks = new List + { + new() { ExternalLinkUrl = "https://example.invalid/chunk0" } + } + } + }, + }; + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = manifest, + Result = new ResultData + { + ExternalLinks = new List + { + new() { ExternalLinkUrl = "https://example.invalid/chunk0" } + } + }, + }); + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + var result = await stmt.ExecuteQueryAsync(CancellationToken.None); + + // The fail-open contract is asserted by NOT wrapping this Dispose in a try/catch: + // if the wrapper or the aggregator throws synchronously, the test fails outright. + // The wrapper internally catches aggregator exceptions and substitutes a fresh + // ChunkMetrics, but the no-chunks-iterated path here exercises the more common + // "default values" branch of that fallback. + result.Stream?.Dispose(); + + Assert.Contains(nameof(IStatementOperationObserver.OnChunksDownloaded), observer.Calls); + + // The metrics object must be non-null. We do not assert specific field values + // because the aggregator may report partial state (e.g. TotalChunksPresent=1 for the + // queued-but-not-downloaded chunk). The contract this test pins is "non-null, + // non-throwing fallback", not "exactly default-constructed". + Assert.NotNull(observer.CapturedChunkMetrics); + } + + [Fact] + public async Task ExecuteQuery_InlinePath_ReaderDispose_DoesNotCallOnChunksDownloaded() + { + // gap G3 / design §6 row 6: OnChunksDownloaded is a CloudFetch-only signal — the + // chunk_details proto field has no meaning for inline results (there are no chunks + // to download). The wrapper achieves this by accepting a nullable CloudFetchReader + // argument and only firing OnChunksDownloaded when it is non-null. The inline path + // returns an InlineArrowStreamReader, so the `as CloudFetchReader` cast at the + // callsite produces null and the signal is skipped. + // + // Firing OnChunksDownloaded on the inline path would silently emit chunk_details with + // all-zero values, which is indistinguishable on the wire from "CloudFetch query that + // downloaded zero chunks" — a real and useful signal. Mixing the two would make the + // telemetry field useless for downstream analysis. + var observer = new RecordingObserver(); + var (ipcBytes, manifest) = BuildSingleColumnInlineArrowResult(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = manifest, + Result = new ResultData { Attachment = ipcBytes }, + }); + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + var result = await stmt.ExecuteQueryAsync(CancellationToken.None); + + // Dispose the result stream to drive the wrapper's OnConsumed call — that one must + // still fire on the inline path. We rely on it firing as the witness that Dispose + // actually ran the wrapper (otherwise the absence of OnChunksDownloaded would be a + // false negative caused by Dispose never being invoked at all). + result.Stream?.Dispose(); + + Assert.Contains(nameof(IStatementOperationObserver.OnConsumed), observer.Calls); + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnChunksDownloaded), observer.Calls); + Assert.Null(observer.CapturedChunkMetrics); + } + + /// + /// Builds a single-column ("c0", INT) inline result: a manifest + matching Arrow IPC stream + /// bytes. InlineArrowStreamReader cross-validates that the manifest schema and the IPC + /// embedded schema have the same field count (count mismatches throw; type mismatches are + /// expected), so the manifest column count and the writer's schema column count must agree. + /// + private static (byte[] ipcBytes, ResultManifest manifest) BuildSingleColumnInlineArrowResult() + { + var ipcSchema = new Schema.Builder() + .Field(new Field("c0", Int32Type.Default, nullable: true)) + .Build(); + + using var ms = new MemoryStream(); + using (var writer = new ArrowStreamWriter(ms, ipcSchema, leaveOpen: true)) + { + // A single empty record batch is sufficient: the test exercises reader construction + // and observer wiring, not data correctness. RecordBatch requires at least one array, + // so we pass an empty Int32Array with length 0. + var emptyArray = new Int32Array.Builder().Build(); + var batch = new RecordBatch(ipcSchema, new IArrowArray[] { emptyArray }, 0); + writer.WriteRecordBatch(batch); + writer.WriteEnd(); + } + var ipcBytes = ms.ToArray(); + + var manifest = new ResultManifest + { + Format = "ARROW_STREAM", + Schema = new ResultSchema + { + Columns = new List + { + new() { Name = "c0", TypeName = "INT", TypeText = "INT" } + } + }, + TotalRowCount = 0, + TotalChunkCount = 1, + Chunks = new List + { + // Single-chunk inline: chunk count of 1 means InlineArrowStreamReader.FetchAllChunksAsync + // does not loop to fetch additional chunks via GetResultChunkAsync. + new() + { + ChunkIndex = 0, + RowCount = 0, + RowOffset = 0, + ByteCount = ipcBytes.Length, + } + }, + }; + + return (ipcBytes, manifest); + } + + [Fact] + public async Task Dispose_AfterSuccessfulExecute_InlinePath_FiresOnConsumed_EvenWhenReaderNeverDisposed() + { + // Gap B4: reader latencies populated on only 78% of SEA EXECUTE records — the + // missing 22% trace back to statements whose consumer abandons the result reader + // without disposing it. Without a safety net in Statement.Dispose, the wrapping + // ConsumptionObservingStream never sees a Dispose call, so OnConsumed (and on + // CloudFetch path OnChunksDownloaded) never fire, leaving + // result_set_consumption_latency_millis unset. + // + // Inline-path scenario: ExecuteQueryAsync returns a reader, the consumer drops + // the reference without disposing, then disposes the statement. The statement- + // side safety net (ConsumptionObservingStream.EnsureObserverSignaled invoked + // from Statement.Dispose before OnFinalized) must fire OnConsumed exactly once + // with a latency >= OnFirstBatchReady, and OnConsumed must precede OnFinalized. + var observer = new RecordingObserver(); + var (ipcBytes, manifest) = BuildSingleColumnInlineArrowResult(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = manifest, + Result = new ResultData { Attachment = ipcBytes }, + }); + mockClient + .Setup(c => c.CloseStatementAsync( + It.IsAny(), + It.IsAny())) + .Returns(Task.CompletedTask); + + long? firstBatchLatency = null; + long? consumedLatency = null; + observer.OnFirstBatchReadyCallback = ms => firstBatchLatency = ms; + observer.OnConsumedCallback = ms => consumedLatency = ms; + + var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + var result = await stmt.ExecuteQueryAsync(CancellationToken.None); + + // Consumer abandons the reader — does NOT dispose result.Stream. This is the + // exact production shape that produced the 22% missing-latency tail. + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnConsumed), observer.Calls); + + stmt.Dispose(); + + // OnConsumed must have fired exactly once via the statement-Dispose safety net. + int consumedCallCount = observer.Calls + .Count(c => c == nameof(IStatementOperationObserver.OnConsumed)); + Assert.Equal(1, consumedCallCount); + + // Latency monotonicity: both reads share the execute-time Stopwatch, and the + // statement-Dispose path strictly follows reader construction, so the consumed + // latency must be >= the first-batch latency. + Assert.NotNull(firstBatchLatency); + Assert.NotNull(consumedLatency); + Assert.True(consumedLatency >= firstBatchLatency, + $"OnConsumed latency ({consumedLatency}) must be >= OnFirstBatchReady latency ({firstBatchLatency})."); + + // Inline path: OnChunksDownloaded must NOT fire (no chunks to download). + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnChunksDownloaded), observer.Calls); + + // OnConsumed must precede OnFinalized so the consumption latency makes it onto + // the telemetry log before emission. + int consumedIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnConsumed)); + int finalizeIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnFinalized)); + Assert.True(consumedIndex >= 0); + Assert.True(finalizeIndex > consumedIndex, + $"OnConsumed (index {consumedIndex}) must precede OnFinalized (index {finalizeIndex})."); + } + + [Fact] + public async Task Dispose_AfterSuccessfulExecute_CloudFetchPath_FiresOnConsumedAndOnChunksDownloaded_EvenWhenReaderNeverDisposed() + { + // Gap B4 CloudFetch counterpart: same abandonment pattern as the inline test, + // but ConsumptionObservingStream was wrapped around a CloudFetchReader so the + // statement-Dispose safety net must fire BOTH OnChunksDownloaded (with a fresh + // ChunkMetrics fallback) and OnConsumed. Without this fix, CloudFetch-path + // abandonment records had neither chunk_details nor result_set_consumption_ + // latency_millis populated. + var observer = new RecordingObserver(); + long? firstBatchLatency = null; + long? consumedLatency = null; + observer.OnFirstBatchReadyCallback = ms => firstBatchLatency = ms; + observer.OnConsumedCallback = ms => consumedLatency = ms; + + var manifest = new ResultManifest + { + Format = "ARROW_STREAM", + Schema = new ResultSchema + { + Columns = new List + { + new() { Name = "c0", TypeName = "INT", TypeText = "INT" } + } + }, + TotalRowCount = 0, + TotalChunkCount = 1, + Chunks = new List + { + new() + { + ChunkIndex = 0, + RowCount = 0, + RowOffset = 0, + ByteCount = 0, + ExternalLinks = new List + { + new() { ExternalLinkUrl = "https://example.invalid/chunk0" } + } + } + }, + }; + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = manifest, + Result = new ResultData + { + ExternalLinks = new List + { + new() { ExternalLinkUrl = "https://example.invalid/chunk0" } + } + }, + }); + mockClient + .Setup(c => c.CloseStatementAsync( + It.IsAny(), + It.IsAny())) + .Returns(Task.CompletedTask); + + var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + var result = await stmt.ExecuteQueryAsync(CancellationToken.None); + + // Consumer abandons the reader — exact shape of the missing-latency production records. + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnConsumed), observer.Calls); + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnChunksDownloaded), observer.Calls); + + stmt.Dispose(); + + int consumedCallCount = observer.Calls + .Count(c => c == nameof(IStatementOperationObserver.OnConsumed)); + int chunksDownloadedCallCount = observer.Calls + .Count(c => c == nameof(IStatementOperationObserver.OnChunksDownloaded)); + Assert.Equal(1, consumedCallCount); + Assert.Equal(1, chunksDownloadedCallCount); + + // Non-null fallback ChunkMetrics, even when the aggregator was never populated + // (consumer never iterated, so no chunks were actually downloaded). + Assert.NotNull(observer.CapturedChunkMetrics); + + // Latency invariants. + Assert.NotNull(firstBatchLatency); + Assert.NotNull(consumedLatency); + Assert.True(consumedLatency >= firstBatchLatency, + $"OnConsumed latency ({consumedLatency}) must be >= OnFirstBatchReady latency ({firstBatchLatency})."); + + // Emission order: OnChunksDownloaded → OnConsumed → OnFinalized. + int chunksIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnChunksDownloaded)); + int consumedIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnConsumed)); + int finalizeIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnFinalized)); + Assert.True(chunksIndex >= 0); + Assert.True(consumedIndex > chunksIndex); + Assert.True(finalizeIndex > consumedIndex); + } + + [Fact] + public async Task Dispose_AfterSuccessfulExecute_SafetyNet_IsIdempotentWithReaderDispose() + { + // Idempotency contract: when BOTH the consumer disposes the reader AND the + // statement-Dispose safety net runs, each observer signal must fire EXACTLY + // once. Whichever path wins the race fires the signal; the other path's + // call is a no-op via the wrapper's _observerSignaled Interlocked CAS gate. + // Without this property the wrapper could double-count consumption latency + // for consumers that defensively dispose the reader before disposing the + // statement (the common case). + var observer = new RecordingObserver(); + var (ipcBytes, manifest) = BuildSingleColumnInlineArrowResult(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = manifest, + Result = new ResultData { Attachment = ipcBytes }, + }); + mockClient + .Setup(c => c.CloseStatementAsync( + It.IsAny(), + It.IsAny())) + .Returns(Task.CompletedTask); + + var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1"; + + var result = await stmt.ExecuteQueryAsync(CancellationToken.None); + + // Consumer disposes reader (normal flow): fires OnConsumed. + result.Stream?.Dispose(); + // Statement.Dispose then runs the safety net: must NOT re-fire OnConsumed. + stmt.Dispose(); + + int consumedCallCount = observer.Calls + .Count(c => c == nameof(IStatementOperationObserver.OnConsumed)); + Assert.Equal(1, consumedCallCount); + + int finalizeCalls = observer.Calls + .Count(c => c == nameof(IStatementOperationObserver.OnFinalized)); + Assert.Equal(1, finalizeCalls); + } + + [Fact] + public async Task ExecuteQuery_EmptyResultPath_NullManifest_FiresOnFirstBatchReadyAndOnConsumed() + { + // Gap B4: empty-result paths (null manifest, manifest-without-data) used to + // bypass OnFirstBatchReady entirely because the original wiring fired it inside + // CreateReader's inline branch and CreateCloudFetchReader. After this fix + // OnFirstBatchReady is fired in ExecuteQueryInternalAsync before CreateReader + // dispatches, so ALL successful paths — including null-manifest DDL responses — + // populate result_set_ready_latency_millis. OnConsumed must also fire (via the + // statement-Dispose safety net) so the consumer Dispose isn't required for + // empty-result records. + var observer = new RecordingObserver(); + long? firstBatchLatency = null; + long? consumedLatency = null; + observer.OnFirstBatchReadyCallback = ms => firstBatchLatency = ms; + observer.OnConsumedCallback = ms => consumedLatency = ms; + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + // Null manifest is the "no results" shape — e.g. DDL statements that + // don't return a result set. CreateReader returns EmptyArrowArrayStream + // which has no underlying reader to fire OnFirstBatchReady internally. + Manifest = null, + Result = null, + }); + mockClient + .Setup(c => c.CloseStatementAsync( + It.IsAny(), + It.IsAny())) + .Returns(Task.CompletedTask); + + var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "CREATE TABLE foo (id INT)"; + + await stmt.ExecuteQueryAsync(CancellationToken.None); + + // OnFirstBatchReady must fire even though no data was returned. + int firstBatchCallCount = observer.Calls + .Count(c => c == nameof(IStatementOperationObserver.OnFirstBatchReady)); + Assert.Equal(1, firstBatchCallCount); + Assert.NotNull(firstBatchLatency); + Assert.True(firstBatchLatency >= 0); + + stmt.Dispose(); + + // OnConsumed must fire via the statement-Dispose safety net. + int consumedCallCount = observer.Calls + .Count(c => c == nameof(IStatementOperationObserver.OnConsumed)); + Assert.Equal(1, consumedCallCount); + Assert.NotNull(consumedLatency); + Assert.True(consumedLatency >= firstBatchLatency); + + // Empty path has no CloudFetchReader, so OnChunksDownloaded must NOT fire. + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnChunksDownloaded), observer.Calls); + } + + [Fact] + public async Task ExecuteQuery_EmptyResultPath_ManifestWithoutData_FiresOnFirstBatchReady() + { + // Companion to the null-manifest test above: a manifest is present (with schema) + // but no Result.Attachment and no external links. This is the "DDL with column + // metadata" shape, and the original wiring also missed OnFirstBatchReady here + // because CreateReader's inline branch (where the call lived) was never reached. + var observer = new RecordingObserver(); + long? firstBatchLatency = null; + observer.OnFirstBatchReadyCallback = ms => firstBatchLatency = ms; + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = BuildManifestWithSingleColumn(), + // No Attachment and no ExternalLinks: routes to the else-branch of + // CreateReader that returns EmptyArrowArrayStream(schema). + Result = new ResultData { Attachment = null }, + }); + mockClient + .Setup(c => c.CloseStatementAsync( + It.IsAny(), + It.IsAny())) + .Returns(Task.CompletedTask); + + using var stmt = CreateStatement(mockClient.Object, observer); + stmt.SqlQuery = "SELECT 1 WHERE 1=0"; + + var result = await stmt.ExecuteQueryAsync(CancellationToken.None); + + int firstBatchCallCount = observer.Calls + .Count(c => c == nameof(IStatementOperationObserver.OnFirstBatchReady)); + Assert.Equal(1, firstBatchCallCount); + Assert.NotNull(firstBatchLatency); + Assert.True(firstBatchLatency >= 0); + + // OnExecuteSucceeded must precede OnFirstBatchReady: the server accepts the + // statement before any results are "ready" to be reported. + int succeededIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnExecuteSucceeded)); + int firstBatchIndex = observer.Calls.IndexOf(nameof(IStatementOperationObserver.OnFirstBatchReady)); + Assert.True(succeededIndex >= 0); + Assert.True(firstBatchIndex > succeededIndex); + + // Dispose the result to clean up; OnConsumed will fire from reader.Dispose + // here since we're inside a `using` block scope below. + result.Stream?.Dispose(); + Assert.Contains(nameof(IStatementOperationObserver.OnConsumed), observer.Calls); + } + + [Fact] + public void Dispose_WithoutExecute_DoesNotCallOnFinalized() + { + // A statement that was never executed must not trigger OnFinalized() — doing so + // would enqueue an empty execute-statement log with no statement id, no operation + // type, and no latencies. The gate is the _executeStarted flag set in lockstep + // with OnExecuteStarted; without it, every short-lived statement (e.g. a caller + // that constructs a statement and then bails before SetSqlQuery) would pollute + // eng_lumberjack. + var observer = new RecordingObserver(); + var mockClient = new Mock(); + + var stmt = CreateStatement(mockClient.Object, observer); + + stmt.Dispose(); + + Assert.Empty(observer.Calls); + } + + [Fact] + public async Task Dispose_AfterSuccessfulExecute_EmitsCloseStatementTelemetry_WithStatementId() + { + // Parity gap with Thrift (gap-5): the Thrift path's DatabricksStatement.Dispose emits a + // CLOSE_STATEMENT operation event for every disposed statement so lumberjack receives + // ~one CLOSE_STATEMENT per execute. SEA's Dispose used to call only OnFinalized() + // (which builds the EXECUTE_STATEMENT_ASYNC log) and never fired the connection-level + // CLOSE_STATEMENT event — producing zero CLOSE_STATEMENT records in prod despite Thrift + // shipping ~86% of executes. This test pins the wiring: with the statement id assigned + // by the server, Dispose must fire exactly one CLOSE_STATEMENT through the connection's + // telemetry, carrying that statement id. + var observer = new RecordingObserver(); + var telemetry = new RecordingTelemetry(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = BuildManifestWithSingleColumn(), + Result = new ResultData { Attachment = null }, + }); + mockClient + .Setup(c => c.CloseStatementAsync( + It.IsAny(), + It.IsAny())) + .Returns(Task.CompletedTask); + + var stmt = CreateStatementWithConnection( + mockClient.Object, observer, out var connection); + connection.TelemetryForTesting = telemetry; + stmt.SqlQuery = "SELECT 1"; + + await stmt.ExecuteQueryAsync(CancellationToken.None); + stmt.Dispose(); + + // Exactly one CLOSE_STATEMENT event, carrying the server-assigned statement id and + // the unspecified statement type (matching the Thrift path's CloseStatement emission + // in DatabricksStatement.Dispose). + int closeCount = telemetry.Calls.Count(c => c == OperationType.CloseStatement); + Assert.Equal(1, closeCount); + int closeIdx = telemetry.Calls.IndexOf(OperationType.CloseStatement); + Assert.Equal(StatementId, telemetry.StatementIds[closeIdx]); + Assert.Equal(StatementType.Unspecified, telemetry.StatementTypes[closeIdx]); + // Successful close RPC — no error attached. + Assert.Null(telemetry.Errors[closeIdx]); + } + + [Fact] + public async Task Dispose_AfterSuccessfulExecute_CloseStatementElapsedMs_IsBoundedByStatementLifetime() + { + // CLOSE_STATEMENT's operation_latency_ms reflects the wall-clock duration of the + // CloseStatementAsync RPC issued at Dispose. That duration is necessarily bounded by + // the total statement lifetime (execute → dispose), since the close RPC happens + // strictly inside Dispose. Pin both: non-negative and <= lifetime. + var observer = new RecordingObserver(); + var telemetry = new RecordingTelemetry(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = BuildManifestWithSingleColumn(), + Result = new ResultData { Attachment = null }, + }); + mockClient + .Setup(c => c.CloseStatementAsync( + It.IsAny(), + It.IsAny())) + .Returns(Task.CompletedTask); + + var stmt = CreateStatementWithConnection( + mockClient.Object, observer, out var connection); + connection.TelemetryForTesting = telemetry; + stmt.SqlQuery = "SELECT 1"; + + var lifetimeStopwatch = System.Diagnostics.Stopwatch.StartNew(); + await stmt.ExecuteQueryAsync(CancellationToken.None); + stmt.Dispose(); + lifetimeStopwatch.Stop(); + + Assert.True(telemetry.LatenciesByOp.ContainsKey(OperationType.CloseStatement)); + long closeElapsedMs = telemetry.LatenciesByOp[OperationType.CloseStatement]; + // Wall-clock latency must be non-negative. + Assert.True(closeElapsedMs >= 0, + $"CLOSE_STATEMENT elapsedMs must be >= 0, got {closeElapsedMs}."); + // And bounded by the full statement lifetime (execute + dispose). Add small slack + // for stopwatch quantization between the inner close-RPC stopwatch and the outer + // test stopwatch. + Assert.True(closeElapsedMs <= lifetimeStopwatch.ElapsedMilliseconds + 50, + $"CLOSE_STATEMENT elapsedMs ({closeElapsedMs}) exceeds statement lifetime " + + $"({lifetimeStopwatch.ElapsedMilliseconds}ms)."); + } + + [Fact] + public async Task Dispose_CloseStatementRpcThrows_StillEmitsCloseStatementWithError() + { + // The Thrift path's CLOSE_STATEMENT event carries the close-RPC error when the RPC + // fails; we record the event regardless so the lifecycle marker still reaches + // lumberjack and the error tag preserves the failure shape. SEA must do the same: + // a failed CloseStatementAsync (e.g. network error) must not suppress the event. + var observer = new RecordingObserver(); + var telemetry = new RecordingTelemetry(); + var closeError = new HttpRequestException("close RPC failed"); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = BuildManifestWithSingleColumn(), + Result = new ResultData { Attachment = null }, + }); + mockClient + .Setup(c => c.CloseStatementAsync( + It.IsAny(), + It.IsAny())) + .ThrowsAsync(closeError); + + var stmt = CreateStatementWithConnection( + mockClient.Object, observer, out var connection); + connection.TelemetryForTesting = telemetry; + stmt.SqlQuery = "SELECT 1"; + + await stmt.ExecuteQueryAsync(CancellationToken.None); + // Dispose must not surface the RPC failure (best-effort cleanup contract). + var disposeEx = Record.Exception(() => stmt.Dispose()); + Assert.Null(disposeEx); + + int closeIdx = telemetry.Calls.IndexOf(OperationType.CloseStatement); + Assert.True(closeIdx >= 0, "CLOSE_STATEMENT must still fire when the close RPC throws."); + Assert.Equal(StatementId, telemetry.StatementIds[closeIdx]); + // The thrown exception must be threaded through into the event payload so analysts + // can see which CLOSE_STATEMENT events failed at the wire. + Assert.Same(closeError, telemetry.Errors[closeIdx]); + } + + [Fact] + public async Task Dispose_CalledTwice_EmitsCloseStatementOnlyOnce() + { + // Repeated Dispose() calls (common in `using` + manual dispose patterns) must not + // duplicate CLOSE_STATEMENT records. The idempotency gate is + // _closeStatementTelemetryEmitted in the statement (mirroring Thrift's gate). + var observer = new RecordingObserver(); + var telemetry = new RecordingTelemetry(); + + var mockClient = new Mock(); + mockClient + .Setup(c => c.ExecuteStatementAsync( + It.IsAny(), + It.IsAny())) + .ReturnsAsync(new ExecuteStatementResponse + { + StatementId = StatementId, + Status = new StatementStatus { State = "SUCCEEDED" }, + Manifest = BuildManifestWithSingleColumn(), + Result = new ResultData { Attachment = null }, + }); + mockClient + .Setup(c => c.CloseStatementAsync( + It.IsAny(), + It.IsAny())) + .Returns(Task.CompletedTask); + + var stmt = CreateStatementWithConnection( + mockClient.Object, observer, out var connection); + connection.TelemetryForTesting = telemetry; + stmt.SqlQuery = "SELECT 1"; + + await stmt.ExecuteQueryAsync(CancellationToken.None); + stmt.Dispose(); + stmt.Dispose(); + + int closeCount = telemetry.Calls.Count(c => c == OperationType.CloseStatement); + Assert.Equal(1, closeCount); + } + + [Fact] + public void Dispose_WithoutExecute_StillEmitsCloseStatementAsLifecycleMarker() + { + // A statement that was never executed never had a statement id assigned, so no close + // RPC fires (elapsedMs = 0). The Thrift path still emits CLOSE_STATEMENT in this + // shape as a lifecycle marker (the comment in DatabricksStatement.Dispose calls this + // out explicitly). SEA must match so the connection-level statement lifecycle event + // count is consistent across drivers. + var observer = new RecordingObserver(); + var telemetry = new RecordingTelemetry(); + var mockClient = new Mock(); + + var stmt = CreateStatementWithConnection( + mockClient.Object, observer, out var connection); + connection.TelemetryForTesting = telemetry; + + stmt.Dispose(); + + // Observer hookpoints (OnExecuteStarted/OnFinalized/...) must not fire — never- + // executed statements remain off the EXECUTE_STATEMENT_ASYNC ledger. + Assert.Empty(observer.Calls); + + // But CLOSE_STATEMENT must still fire, with no statement id and zero elapsed time. + int closeIdx = telemetry.Calls.IndexOf(OperationType.CloseStatement); + Assert.True(closeIdx >= 0, "CLOSE_STATEMENT must fire even when no execute ran."); + Assert.Null(telemetry.StatementIds[closeIdx]); + Assert.Equal(0, telemetry.LatenciesByOp[OperationType.CloseStatement]); + Assert.Null(telemetry.Errors[closeIdx]); + } + } +} diff --git a/csharp/test/Unit/Telemetry/ConnectionTelemetryAuthMechTests.cs b/csharp/test/Unit/Telemetry/ConnectionTelemetryAuthMechTests.cs index 10e7af6a..dbb506f4 100644 --- a/csharp/test/Unit/Telemetry/ConnectionTelemetryAuthMechTests.cs +++ b/csharp/test/Unit/Telemetry/ConnectionTelemetryAuthMechTests.cs @@ -19,6 +19,7 @@ using AdbcDrivers.HiveServer2.Spark; using DriverAuthFlowType = AdbcDrivers.Databricks.Telemetry.Proto.DriverAuthFlow.Types.Type; using DriverAuthMechType = AdbcDrivers.Databricks.Telemetry.Proto.DriverAuthMech.Types.Type; +using DriverModeType = AdbcDrivers.Databricks.Telemetry.Proto.DriverMode.Types.Type; using Xunit; namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry @@ -47,7 +48,8 @@ public void AuthMech_Pat_WhenAuthTypeIsTokenAndNoOAuthGrantType() properties[SparkParameters.Token] = "dapi-redacted"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.Equal(DriverAuthMechType.Pat, connParams.AuthMech); Assert.Equal(DriverAuthFlowType.TokenPassthrough, connParams.AuthFlow); @@ -64,7 +66,8 @@ public void AuthMech_Oauth_ClientCredentials_WhenGrantTypeIsClientCredentials() properties[DatabricksParameters.OAuthClientSecret] = "client-secret"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.Equal(DriverAuthMechType.Oauth, connParams.AuthMech); Assert.Equal(DriverAuthFlowType.ClientCredentials, connParams.AuthFlow); @@ -83,7 +86,8 @@ public void AuthMech_Oauth_TokenPassthrough_WhenAuthTypeIsOauthWithNoGrantType() properties[SparkParameters.AccessToken] = "oauth-access-token-redacted"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.Equal(DriverAuthMechType.Oauth, connParams.AuthMech); Assert.Equal(DriverAuthFlowType.TokenPassthrough, connParams.AuthFlow); @@ -103,7 +107,8 @@ public void AuthMech_Oauth_TokenPassthrough_WhenGrantTypeIsAccessToken() properties[SparkParameters.AccessToken] = "oauth-access-token-redacted"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.Equal(DriverAuthMechType.Oauth, connParams.AuthMech); Assert.Equal(DriverAuthFlowType.TokenPassthrough, connParams.AuthFlow); @@ -115,7 +120,8 @@ public void AuthMech_Pat_WhenNoAuthConfigured() var properties = BaseProperties(); var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.Equal(DriverAuthMechType.Pat, connParams.AuthMech); Assert.Equal(DriverAuthFlowType.TokenPassthrough, connParams.AuthFlow); diff --git a/csharp/test/Unit/Telemetry/ConnectionTelemetryCreateSignatureTests.cs b/csharp/test/Unit/Telemetry/ConnectionTelemetryCreateSignatureTests.cs new file mode 100644 index 00000000..ccf5d1a5 --- /dev/null +++ b/csharp/test/Unit/Telemetry/ConnectionTelemetryCreateSignatureTests.cs @@ -0,0 +1,216 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System.Collections.Generic; +using System.Threading.Tasks; +using AdbcDrivers.Databricks.Telemetry; +using AdbcDrivers.HiveServer2.Spark; +using DriverModeType = AdbcDrivers.Databricks.Telemetry.Proto.DriverMode.Types.Type; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry +{ + /// + /// Tests for PECO-3022 (TELEM/SEA T1): + /// is now transport-agnostic — it takes a string sessionId (converted at the + /// caller's boundary) and a DriverMode.Types.Type mode threaded through to + /// driver_connection_params.mode. The two formerly hardcoded + /// DriverMode.Types.Type.Thrift literals in BuildDriverConnectionParams + /// and the fallback in SafeBuildDriverConnectionParams are gone; the mode is + /// always the value supplied by the caller (THRIFT for Thrift, SEA for the upcoming + /// SEA transport). + /// + public class ConnectionTelemetryCreateSignatureTests + { + private const string AssemblyVersion = "1.2.3-test"; + private const int DefaultTimeoutMs = 30_000; + + private static IReadOnlyDictionary TelemetryEnabledProperties() => + new Dictionary + { + { TelemetryConfiguration.PropertyKeyEnabled, "true" }, + { SparkParameters.AuthType, SparkAuthTypeConstants.Token }, + { SparkParameters.Token, "dapi-redacted" }, + { SparkParameters.Path, "/sql/1.0/warehouses/abc123" }, + }; + + // Tests share a TelemetryClient cache keyed by host (TelemetryClientManager + // singleton). Use distinct hosts per test to keep them isolated. + [Fact] + public async Task Create_AcceptsStringSessionId() + { + // Regression: the original signature took TSessionHandle?, forcing the + // (Thrift) caller to leak its transport handle through telemetry. The new + // signature accepts the already-stringified id so the SEA caller can pass + // its server-assigned id without inventing a fake TSessionHandle. + const string Host = "create-string-sid.databricks.com"; + const string SessionId = "9e6a3f88-1234-4321-abcd-deadbeefcafe"; + + IConnectionTelemetry telemetry = ConnectionTelemetry.Create( + properties: TelemetryEnabledProperties(), + host: Host, + assemblyVersion: AssemblyVersion, + oauthTokenProvider: null, + sessionId: SessionId, + mode: DriverModeType.Thrift, + enableDirectResults: true, + useDescTableExtended: false, + connectTimeoutMilliseconds: DefaultTimeoutMs, + activity: null); + + try + { + Assert.NotNull(telemetry.Session); + Assert.Equal(SessionId, telemetry.Session!.SessionId); + } + finally + { + await telemetry.DisposeAsync(); + } + } + + [Fact] + public async Task Create_EmptySessionId_MapsToNullInContext() + { + // ConnectionTelemetry.Create maps `string.Empty` -> `SessionId = null` in the + // resulting TelemetrySessionContext. This matters because Create is called + // from InitializeTelemetry before OpenSession returns a real handle on some + // code paths, and the DatabricksConnection caller passes string.Empty rather + // than null in that window. Pin the mapping so a future refactor that drops + // the `!string.IsNullOrEmpty` guard at ConnectionTelemetry.cs would surface + // here, rather than silently emitting empty-string SessionId to lumberjack. + const string Host = "create-empty-sid.databricks.com"; + + IConnectionTelemetry telemetry = ConnectionTelemetry.Create( + properties: TelemetryEnabledProperties(), + host: Host, + assemblyVersion: AssemblyVersion, + oauthTokenProvider: null, + sessionId: string.Empty, + mode: DriverModeType.Thrift, + enableDirectResults: true, + useDescTableExtended: false, + connectTimeoutMilliseconds: DefaultTimeoutMs, + activity: null); + + try + { + Assert.NotNull(telemetry.Session); + Assert.Null(telemetry.Session!.SessionId); + } + finally + { + await telemetry.DisposeAsync(); + } + } + + [Fact] + public async Task Create_ThriftMode_SetsDriverModeThrift() + { + // Regression for the literal that used to live at ConnectionTelemetry.cs:642 + // — `Mode = DriverMode.Types.Type.Thrift` is now threaded from the caller. + const string Host = "create-thrift-mode.databricks.com"; + + IConnectionTelemetry telemetry = ConnectionTelemetry.Create( + properties: TelemetryEnabledProperties(), + host: Host, + assemblyVersion: AssemblyVersion, + oauthTokenProvider: null, + sessionId: "session-thrift", + mode: DriverModeType.Thrift, + enableDirectResults: true, + useDescTableExtended: false, + connectTimeoutMilliseconds: DefaultTimeoutMs, + activity: null); + + try + { + Assert.NotNull(telemetry.Session); + Assert.NotNull(telemetry.Session!.DriverConnectionParams); + Assert.Equal(DriverModeType.Thrift, telemetry.Session.DriverConnectionParams!.Mode); + } + finally + { + await telemetry.DisposeAsync(); + } + } + + [Fact] + public async Task Create_SeaMode_SetsDriverModeSea() + { + // The reason this refactor exists: the SEA telemetry caller (added in a later + // phase) must produce telemetry rows with `driver_connection_params.mode = SEA`. + const string Host = "create-sea-mode.databricks.com"; + + IConnectionTelemetry telemetry = ConnectionTelemetry.Create( + properties: TelemetryEnabledProperties(), + host: Host, + assemblyVersion: AssemblyVersion, + oauthTokenProvider: null, + sessionId: "session-sea", + mode: DriverModeType.Sea, + enableDirectResults: true, + useDescTableExtended: false, + connectTimeoutMilliseconds: DefaultTimeoutMs, + activity: null); + + try + { + Assert.NotNull(telemetry.Session); + Assert.NotNull(telemetry.Session!.DriverConnectionParams); + Assert.Equal(DriverModeType.Sea, telemetry.Session.DriverConnectionParams!.Mode); + } + finally + { + await telemetry.DisposeAsync(); + } + } + + [Fact] + public void Create_ThrowingHttpClient_ReturnsNoOpConnectionTelemetry() + { + // Create() is declared `Never throws`: any initialization failure — HttpClient + // construction, exporter wire-up, etc. — must surface as NoOpConnectionTelemetry + // rather than propagate into the connection-open path. We exercise this by + // enabling telemetry while passing a blank host so `new Uri("https://")` (inside + // HttpClientFactory.CreateTelemetryHttpClient) and/or + // TelemetryClientManager.GetOrCreateClient's argument-check throw, both of + // which land in Create's outer catch. + // + // ASSUMPTION: this test depends on either HttpClientFactory.CreateTelemetryHttpClient + // or TelemetryClientManager.GetOrCreateClient throwing when host is empty. If a + // future change adds defensive handling of empty host upstream of Create's catch, + // this test would silently pass for the wrong reason (Create would return + // NoOpConnectionTelemetry via the disabled/feature-flag path instead of the + // outer catch). When that happens, swap to a real fault-injection seam (e.g., + // an internal overload that accepts a pre-built HttpClient). + IConnectionTelemetry telemetry = ConnectionTelemetry.Create( + properties: TelemetryEnabledProperties(), + host: string.Empty, + assemblyVersion: AssemblyVersion, + oauthTokenProvider: null, + sessionId: "session-throwing-http", + mode: DriverModeType.Thrift, + enableDirectResults: true, + useDescTableExtended: false, + connectTimeoutMilliseconds: DefaultTimeoutMs, + activity: null); + + Assert.Same(NoOpConnectionTelemetry.Instance, telemetry); + Assert.Null(telemetry.Session); + } + } +} diff --git a/csharp/test/Unit/Telemetry/ConnectionTelemetryDiscoveryFieldsTests.cs b/csharp/test/Unit/Telemetry/ConnectionTelemetryDiscoveryFieldsTests.cs index 27889427..9f3e65b0 100644 --- a/csharp/test/Unit/Telemetry/ConnectionTelemetryDiscoveryFieldsTests.cs +++ b/csharp/test/Unit/Telemetry/ConnectionTelemetryDiscoveryFieldsTests.cs @@ -17,6 +17,7 @@ using System.Collections.Generic; using AdbcDrivers.Databricks.Telemetry; using AdbcDrivers.HiveServer2.Spark; +using DriverModeType = AdbcDrivers.Databricks.Telemetry.Proto.DriverMode.Types.Type; using Xunit; namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry @@ -57,7 +58,8 @@ public void DiscoveryModeEnabled_AlwaysFalse_PatAuth() properties[SparkParameters.Token] = "dapi-redacted"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.True(connParams.HasDiscoveryModeEnabled); Assert.False(connParams.DiscoveryModeEnabled); @@ -74,7 +76,8 @@ public void DiscoveryModeEnabled_AlwaysFalse_OAuthClientCredentials() properties[DatabricksParameters.OAuthClientSecret] = "client-secret"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.True(connParams.HasDiscoveryModeEnabled); Assert.False(connParams.DiscoveryModeEnabled); @@ -89,7 +92,8 @@ public void DiscoveryUrl_LeftUnset_NoDiscoverySupported() DatabricksConstants.OAuthGrantTypes.ClientCredentials; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); // The C# driver hardcodes the OIDC token endpoint and never performs // .well-known discovery, so there is no URL to report. Leaving the @@ -105,7 +109,8 @@ public void EnableTokenCache_AlwaysFalse_PatAuth() properties[SparkParameters.Token] = "dapi-redacted"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.True(connParams.HasEnableTokenCache); Assert.False(connParams.EnableTokenCache); @@ -122,7 +127,8 @@ public void EnableTokenCache_AlwaysFalse_OAuthClientCredentials() properties[DatabricksParameters.OAuthClientSecret] = "client-secret"; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.True(connParams.HasEnableTokenCache); Assert.False(connParams.EnableTokenCache); diff --git a/csharp/test/Unit/Telemetry/ConnectionTelemetryDriverNameTests.cs b/csharp/test/Unit/Telemetry/ConnectionTelemetryDriverNameTests.cs new file mode 100644 index 00000000..1e90ac7a --- /dev/null +++ b/csharp/test/Unit/Telemetry/ConnectionTelemetryDriverNameTests.cs @@ -0,0 +1,102 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using AdbcDrivers.Databricks.Telemetry; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry +{ + /// + /// Regression tests for PECO-3022 B1: driver_name string drift between + /// SEA and Thrift transports. + /// + /// Production lumberjack data from v1.1.4 showed two distinct strings coexisting: + /// + /// "Databricks ADBC Driver" — 685 records, all THRIFT mode + /// "ADBC Databricks Driver" — 4,401 records, mixed THRIFT + 69 SEA + /// + /// Dashboards filtering on the older string silently missed all SEA records and a + /// significant fraction of recent Thrift records. + /// + /// The fix is to make the + /// single source of truth, referenced by both + /// and its fallback so + /// that every caller of — Thrift today via + /// DatabricksConnection, SEA via StatementExecutionConnection — emits + /// the same literal. + /// + /// These tests pin the literal value so that a typo or rename in the constant gets + /// caught at unit-test time before it ships to production. + /// + public class ConnectionTelemetryDriverNameTests + { + /// + /// The canonical driver_name literal that must appear in every telemetry record + /// regardless of transport. Picked because it matches the value already returned + /// via AdbcInfoCode.DriverName (see ) + /// and represented the majority of v1.1.4 production records, so dashboards + /// already keyed on this string see the most history. + /// + private const string CanonicalDriverName = "ADBC Databricks Driver"; + + [Fact] + public void CanonicalConstant_HasExpectedLiteralValue() + { + // Pin the literal. If anyone renames the constant, this test fails and the + // change is forced into review rather than silently breaking downstream + // dashboards that filter on the string. + Assert.Equal(CanonicalDriverName, DatabricksConnection.DatabricksDriverName); + } + + [Fact] + public void BuildSystemConfiguration_ReturnsCanonicalDriverName() + { + var config = ConnectionTelemetry.BuildSystemConfiguration("1.2.3"); + + Assert.Equal(CanonicalDriverName, config.DriverName); + } + + [Fact] + public void SafeBuildSystemConfiguration_ReturnsCanonicalDriverName() + { + // SafeBuildSystemConfiguration delegates to BuildSystemConfiguration on the + // happy path and must return the same canonical literal. The catch-block + // fallback in SafeBuildSystemConfiguration is not exercised here because + // there is no in-process fault-injection seam for the static helper; the + // literal-pin in CanonicalConstant_HasExpectedLiteralValue covers that path + // by construction (both branches reference the same constant). + var config = ConnectionTelemetry.SafeBuildSystemConfiguration("1.2.3", activity: null); + + Assert.NotNull(config); + Assert.Equal(CanonicalDriverName, config.DriverName); + } + + [Fact] + public void DriverName_IdenticalAcrossInvocations_SingleSourceOfTruth() + { + // The single-source-of-truth property: every invocation of the system-config + // builder (regardless of transport mode at the caller) returns the same + // driver_name. Since ConnectionTelemetry.BuildSystemConfiguration is the + // protocol-agnostic factory called by both Thrift and SEA transports, this + // guarantees both Thrift and SEA telemetry records carry identical driver_name. + var thriftCaller = ConnectionTelemetry.BuildSystemConfiguration("1.2.3"); + var seaCaller = ConnectionTelemetry.BuildSystemConfiguration("1.2.3"); + + Assert.Equal(thriftCaller.DriverName, seaCaller.DriverName); + Assert.Equal(CanonicalDriverName, thriftCaller.DriverName); + } + } +} diff --git a/csharp/test/Unit/Telemetry/ConnectionTelemetryPartialInitTests.cs b/csharp/test/Unit/Telemetry/ConnectionTelemetryPartialInitTests.cs index 9431aca5..3075088e 100644 --- a/csharp/test/Unit/Telemetry/ConnectionTelemetryPartialInitTests.cs +++ b/csharp/test/Unit/Telemetry/ConnectionTelemetryPartialInitTests.cs @@ -19,6 +19,7 @@ using AdbcDrivers.HiveServer2.Spark; using DriverAuthFlowType = AdbcDrivers.Databricks.Telemetry.Proto.DriverAuthFlow.Types.Type; using DriverAuthMechType = AdbcDrivers.Databricks.Telemetry.Proto.DriverAuthMech.Types.Type; +using DriverModeType = AdbcDrivers.Databricks.Telemetry.Proto.DriverMode.Types.Type; using Xunit; namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry @@ -60,7 +61,8 @@ public void AuthType_AndAuthMech_Consistent_OauthClientCredentials() }; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); string authType = ConnectionTelemetry.DetermineAuthType(properties); Assert.Equal(DriverAuthMechType.Oauth, connParams.AuthMech); @@ -79,7 +81,8 @@ public void AuthType_AndAuthMech_Consistent_OauthAccessTokenWithGrantType() }; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); string authType = ConnectionTelemetry.DetermineAuthType(properties); Assert.Equal(DriverAuthMechType.Oauth, connParams.AuthMech); @@ -101,7 +104,8 @@ public void AuthType_AndAuthMech_Consistent_OauthAccessTokenPassthrough_NoGrantT }; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); string authType = ConnectionTelemetry.DetermineAuthType(properties); Assert.Equal(DriverAuthMechType.Oauth, connParams.AuthMech); @@ -119,7 +123,8 @@ public void AuthType_AndAuthMech_Consistent_PatToken() }; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); string authType = ConnectionTelemetry.DetermineAuthType(properties); Assert.Equal(DriverAuthMechType.Pat, connParams.AuthMech); @@ -133,7 +138,8 @@ public void AuthType_AndAuthMech_Consistent_NoAuth() var properties = new Dictionary(); var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); string authType = ConnectionTelemetry.DetermineAuthType(properties); Assert.Equal(DriverAuthMechType.Pat, connParams.AuthMech); @@ -186,7 +192,8 @@ public void DriverConnectionParams_ConstantFlags_AlwaysPopulated() }; var connParams = ConnectionTelemetry.BuildDriverConnectionParams( - properties, Host, enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); + properties, Host, DriverModeType.Thrift, + enableDirectResults: true, useDescTableExtended: true, DefaultTimeoutMs); Assert.True(connParams.EnableArrow); Assert.True(connParams.EnableDirectResults); @@ -214,7 +221,7 @@ public void SafeBuildDriverConnectionParams_ReturnsBestEffort_WithMinimalPropert var properties = new Dictionary(); var connParams = ConnectionTelemetry.SafeBuildDriverConnectionParams( - properties, Host, + properties, Host, DriverModeType.Thrift, enableDirectResults: true, useDescTableExtended: false, connectTimeoutMilliseconds: DefaultTimeoutMs, activity: null); @@ -252,7 +259,7 @@ public void EndToEnd_MinimalProperties_AllAlwaysDerivableFieldsPopulated() var systemConfig = ConnectionTelemetry.SafeBuildSystemConfiguration( "1.2.3", activity: null); var connParams = ConnectionTelemetry.SafeBuildDriverConnectionParams( - properties, Host, + properties, Host, DriverModeType.Thrift, enableDirectResults: true, useDescTableExtended: true, connectTimeoutMilliseconds: DefaultTimeoutMs, activity: null); string authType = ConnectionTelemetry.SafeDetermineAuthType(properties, activity: null); diff --git a/csharp/test/Unit/Telemetry/DatabricksStatementObserverRefactorTests.cs b/csharp/test/Unit/Telemetry/DatabricksStatementObserverRefactorTests.cs new file mode 100644 index 00000000..e86defe4 --- /dev/null +++ b/csharp/test/Unit/Telemetry/DatabricksStatementObserverRefactorTests.cs @@ -0,0 +1,537 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Collections.Generic; +using System.IO; +using System.Linq; +using System.Reflection; +using System.Threading; +using System.Threading.Tasks; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using AdbcDrivers.Databricks.Telemetry; +using AdbcDrivers.Databricks.Telemetry.Models; +using AdbcDrivers.Databricks.Telemetry.Proto; +using AdbcDrivers.HiveServer2.Spark; +using Apache.Arrow.Adbc; +using DriverModeType = AdbcDrivers.Databricks.Telemetry.Proto.DriverMode.Types.Type; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry +{ + /// + /// Verifies the T4 refactor of : the private + /// telemetry helpers (CreateTelemetryContext, CreateMetadataTelemetryContext, + /// RecordSuccess, RecordError, EmitTelemetry) have been replaced + /// with a single field, and + /// now injects either a + /// or . + /// + /// The tests cover three concerns: + /// + /// Structural — the cast ((DatabricksConnection)Connection).TelemetrySession + /// has been removed from DatabricksStatement and the field-based observer + /// is wired correctly through both the public constructor and the test-only + /// constructor seam. + /// Cross-transport — observer hooks fire at the expected lifecycle points and + /// the underlying is still exposed to + /// the operation status poller via the PendingTelemetryContext + /// compatibility property (PECO-2992). + /// Byte-identical — the proto emitted + /// after the refactor matches what the pre-refactor EmitTelemetry helper + /// would have produced for an equivalent execute, and the connection's + /// DriverMode (Thrift here) is preserved on the wire (PECO-3022). + /// + /// + public class DatabricksStatementObserverRefactorTests + { + // ── Test doubles ───────────────────────────────────────────────────────────── + + /// + /// Capturing telemetry client that records every enqueued frontend log so tests + /// can inspect the exact proto fields the observer wrote. + /// + private sealed class CapturingTelemetryClient : ITelemetryClient + { + public List Logs { get; } = new List(); + public int EnqueueCallCount; + + public void Enqueue(TelemetryFrontendLog log) + { + Interlocked.Increment(ref EnqueueCallCount); + lock (Logs) { Logs.Add(log); } + } + + public Task FlushAsync(CancellationToken ct = default) => Task.CompletedTask; + public Task CloseAsync() => Task.CompletedTask; + public ValueTask DisposeAsync() => default; + } + + /// + /// Records every observer method invocation so tests can assert the statement + /// drives the injected observer at the correct hookpoints. + /// + private sealed class RecordingObserver : IStatementOperationObserver + { + public List Calls { get; } = new List(); + public StatementType? StmtType; + public OperationType? OpType; + public bool? IsCompressed; + public string? StatementId; + public ExecutionResultFormat? ResultFormat; + public int? PollCount; + public long? PollLatencyMs; + public long? FirstBatchReadyMs; + public long? ConsumedMs; + public ChunkMetrics? Chunks; + public ExecutionResultFormat? ReaderInspectedFormat; + public bool? ReaderInspectedCompressed; + public Exception? Error; + public int FinalizedCallCount; + + public void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed) + { + Calls.Add(nameof(OnExecuteStarted)); + StmtType = stmtType; + OpType = opType; + IsCompressed = isCompressed; + } + + public void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat) + { + Calls.Add(nameof(OnExecuteSucceeded)); + StatementId = statementId; + ResultFormat = resultFormat; + } + + public void OnPollCompleted(int count, long latencyMs) + { + Calls.Add(nameof(OnPollCompleted)); + PollCount = count; + PollLatencyMs = latencyMs; + } + + public void OnFirstBatchReady(long latencyMs) + { + Calls.Add(nameof(OnFirstBatchReady)); + FirstBatchReadyMs = latencyMs; + } + + public void OnConsumed(long latencyMs) + { + Calls.Add(nameof(OnConsumed)); + ConsumedMs = latencyMs; + } + + public void OnChunksDownloaded(ChunkMetrics metrics) + { + Calls.Add(nameof(OnChunksDownloaded)); + Chunks = metrics; + } + + public void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed) + { + Calls.Add(nameof(OnReaderInspected)); + ReaderInspectedFormat = resultFormat; + ReaderInspectedCompressed = isCompressed; + } + + public void OnError(Exception ex) + { + Calls.Add(nameof(OnError)); + Error = ex; + } + + public void OnFinalized() + { + Calls.Add(nameof(OnFinalized)); + FinalizedCallCount++; + } + } + + // ── Fixtures ───────────────────────────────────────────────────────────────── + + private static DatabricksConnection CreateConnection() + { + Dictionary properties = new Dictionary + { + [SparkParameters.HostName] = "test.databricks.com", + [SparkParameters.Token] = "test-token", + }; + return new DatabricksConnection(properties); + } + + /// + /// Builds a session context with a real + /// stamped with , matching how + /// wires the production + /// session up after a successful OpenSession. + /// + private static TelemetrySessionContext CreateThriftSession(ITelemetryClient client) + { + return new TelemetrySessionContext + { + SessionId = "session-thrift-abc", + WorkspaceId = 4242L, + TelemetryClient = client, + AuthType = "pat", + SystemConfiguration = new DriverSystemConfiguration + { + DriverVersion = "1.0.0", + DriverName = "Apache Arrow ADBC Databricks Driver", + }, + // Mode=Thrift is the value DatabricksConnection.InitializeTelemetry passes + // through to ConnectionTelemetry.Create today. If a future refactor were to + // accidentally swap the constant, the byte-equivalence test below would + // catch it. + DriverConnectionParams = new DriverConnectionParameters + { + HttpPath = "/sql/1.0/warehouses/x", + Mode = DriverModeType.Thrift, + }, + }; + } + + private static IStatementOperationObserver GetObserverField(DatabricksStatement statement) + { + FieldInfo field = typeof(DatabricksStatement).GetField( + "_observer", + BindingFlags.NonPublic | BindingFlags.Instance) + ?? throw new InvalidOperationException("_observer field not found on DatabricksStatement"); + return (IStatementOperationObserver)field.GetValue(statement)!; + } + + // ── Structural assertions ──────────────────────────────────────────────────── + + [Fact] + public void DatabricksStatement_HasObserverField_TypedAsInterface() + { + // The refactor introduces exactly one observer field, typed as the interface + // (not as TelemetryObserver) so SEA can inject its own future implementation. + FieldInfo? field = typeof(DatabricksStatement).GetField( + "_observer", + BindingFlags.NonPublic | BindingFlags.Instance); + + Assert.NotNull(field); + Assert.Equal(typeof(IStatementOperationObserver), field!.FieldType); + Assert.True(field.IsInitOnly, "_observer must be readonly so once injected it cannot drift mid-execute"); + } + + [Fact] + public void DatabricksStatement_SourceDoesNotCastTelemetrySession() + { + // Acceptance criterion: the ((DatabricksConnection)Connection).TelemetrySession + // cast pattern is eliminated from the refactored class. Locating the source + // file relative to the test assembly avoids needing a runtime hookup just to + // assert a textual property. + string? path = LocateDatabricksStatementSource(); + Assert.True(File.Exists(path), $"Could not locate DatabricksStatement.cs (looked at {path ?? "(null)"})"); + + string source = File.ReadAllText(path!); + Assert.DoesNotContain("((DatabricksConnection)Connection).TelemetrySession", source); + } + + private static string? LocateDatabricksStatementSource() + { + // Walk up from the test assembly location until we find csharp/src/DatabricksStatement.cs. + string? dir = Path.GetDirectoryName(typeof(DatabricksStatement).Assembly.Location); + while (!string.IsNullOrEmpty(dir)) + { + string candidate = Path.Combine(dir, "csharp", "src", "DatabricksStatement.cs"); + if (File.Exists(candidate)) return candidate; + candidate = Path.Combine(dir, "src", "DatabricksStatement.cs"); + if (File.Exists(candidate)) return candidate; + DirectoryInfo? parent = Directory.GetParent(dir); + if (parent == null) break; + dir = parent.FullName; + } + return null; + } + + // ── Injection from DatabricksConnection.CreateStatement ────────────────────── + + [Fact] + public void CreateStatement_TelemetryDisabled_InjectsNullObserver() + { + // Without a telemetry session (the default for a freshly-constructed connection + // that has not opened a session), CreateStatement must fall back to the + // singleton NullObserver. This is what keeps disabled-telemetry zero-cost. + using DatabricksConnection connection = CreateConnection(); + + using AdbcStatement statement = connection.CreateStatement(); + DatabricksStatement databricksStatement = Assert.IsType(statement); + + IStatementOperationObserver observer = GetObserverField(databricksStatement); + Assert.Same(NullObserver.Instance, observer); + // Compatibility property must report null so the reader/poller skip telemetry + // branches without having to call into the observer. + Assert.Null(databricksStatement.PendingTelemetryContext); + } + + [Fact] + public void CreateStatement_TelemetryEnabled_InjectsTelemetryObserver() + { + // When the connection has a live telemetry session, CreateStatement constructs + // a per-statement TelemetryObserver bound to that session. PendingTelemetryContext + // therefore returns a non-null context for the poller and reader to mutate. + using DatabricksConnection connection = CreateConnection(); + CapturingTelemetryClient client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateThriftSession(client); + connection.TelemetryForTesting = new TelemetryAdapter(session); + + using AdbcStatement statement = connection.CreateStatement(); + DatabricksStatement databricksStatement = Assert.IsType(statement); + + IStatementOperationObserver observer = GetObserverField(databricksStatement); + TelemetryObserver typed = Assert.IsType(observer); + // The observer's underlying context must be bound to the connection's session + // so subsequent poller/reader mutations and the final BuildTelemetryLog read + // the same SessionId/WorkspaceId the connection negotiated. + FieldInfo? sessionField = typeof(StatementTelemetryContext) + .GetField("_sessionContext", BindingFlags.NonPublic | BindingFlags.Instance); + Assert.NotNull(sessionField); + Assert.Same(session, sessionField!.GetValue(typed.Context)); + Assert.NotNull(databricksStatement.PendingTelemetryContext); + Assert.Same(typed.Context, databricksStatement.PendingTelemetryContext); + } + + // ── Observer hookpoint coverage ────────────────────────────────────────────── + + [Fact] + public void Dispose_NoExecute_DoesNotFinalizeObserver() + { + // A statement that was never executed must not trigger OnFinalized() on the + // injected observer (would otherwise enqueue a stray empty execute log when + // telemetry is enabled). This preserves byte-identity with the prior + // PendingTelemetryContext!=null gate inside EmitTelemetry. + using DatabricksConnection connection = CreateConnection(); + RecordingObserver recorder = new RecordingObserver(); + using DatabricksStatement statement = new DatabricksStatement(connection, recorder); + + statement.Dispose(); + + Assert.Empty(recorder.Calls); + } + + [Fact] + public void Dispose_AfterFailedExecute_DoesNotDoubleFinalize() + { + // Simulating the error path: the production code calls FinalizeExecuteTelemetry + // inside the catch block and then Dispose calls it again. _executeFinalized + // must gate so OnFinalized fires exactly once (the observer's own idempotency + // is a defence-in-depth backstop). + using DatabricksConnection connection = CreateConnection(); + RecordingObserver recorder = new RecordingObserver(); + using DatabricksStatement statement = new DatabricksStatement(connection, recorder); + + // Drive the observer through the same sequence the error path would: begin → + // error → finalize. We call the private helpers directly so the test does not + // require a Thrift server. + InvokePrivate(statement, "BeginExecuteTelemetry", StatementType.Query, OperationType.ExecuteStatement); + InvokePrivate(statement, "RecordExecuteError", new InvalidOperationException("boom")); + InvokePrivate(statement, "FinalizeExecuteTelemetry"); + + // Dispose triggers a second FinalizeExecuteTelemetry — must be a no-op. + statement.Dispose(); + + int finalizeCalls = recorder.Calls.Count(c => c == nameof(IStatementOperationObserver.OnFinalized)); + Assert.Equal(1, finalizeCalls); + // Sanity: the error path also recorded OnError exactly once. + Assert.Equal(1, recorder.Calls.Count(c => c == nameof(IStatementOperationObserver.OnError))); + } + + [Fact] + public void BeginExecuteTelemetry_FiresOnExecuteStartedAndPopulatesDefaults() + { + // Verifies the helper that replaces CreateTelemetryContext / CreateMetadataTelemetryContext: + // it must signal the observer with the right statement/operation type and set the + // legacy defaults (InlineArrow ResultFormat, IsInternalCall propagation). + using DatabricksConnection connection = CreateConnection(); + CapturingTelemetryClient client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateThriftSession(client); + TelemetryObserver observer = new TelemetryObserver(session); + using DatabricksStatement statement = new DatabricksStatement(connection, observer); + statement.IsInternalCall = true; + + InvokePrivate(statement, "BeginExecuteTelemetry", StatementType.Metadata, OperationType.ListCatalogs); + + Assert.Equal(StatementType.Metadata, observer.Context.StatementType); + Assert.Equal(OperationType.ListCatalogs, observer.Context.OperationType); + Assert.False(observer.Context.IsCompressed, "Placeholder isCompressed=false until reader inspection at finalize time"); + Assert.Equal(ExecutionResultFormat.InlineArrow, observer.Context.ResultFormat); + Assert.True(observer.Context.IsInternalCall); + } + + [Fact] + public void FinalizeExecuteTelemetry_NoExecute_StillEmitsViaConfiguredObserver() + { + // Once BeginExecuteTelemetry has fired, FinalizeExecuteTelemetry must close the + // loop with OnConsumed + OnFinalized even if no reader was ever materialized + // (defensive path: server returned no results, or caller aborted between + // begin and base.ExecuteQuery). + using DatabricksConnection connection = CreateConnection(); + RecordingObserver recorder = new RecordingObserver(); + using DatabricksStatement statement = new DatabricksStatement(connection, recorder); + + InvokePrivate(statement, "BeginExecuteTelemetry", StatementType.Query, OperationType.ExecuteStatement); + InvokePrivate(statement, "FinalizeExecuteTelemetry"); + + Assert.Contains(nameof(IStatementOperationObserver.OnExecuteStarted), recorder.Calls); + Assert.Contains(nameof(IStatementOperationObserver.OnConsumed), recorder.Calls); + Assert.Contains(nameof(IStatementOperationObserver.OnFinalized), recorder.Calls); + // No chunk metrics path because no reader was attached. + Assert.DoesNotContain(nameof(IStatementOperationObserver.OnChunksDownloaded), recorder.Calls); + } + + // ── Cross-transport regression: byte-equivalent telemetry log ───────────────── + + [Fact] + public void Thrift_Telemetry_StillEmitsAfterRefactor() + { + // Byte-equivalence regression: an end-to-end execute through the refactored + // observer must produce an OssSqlDriverTelemetryLog whose top-level fields + // match the pre-refactor EmitTelemetry output for the same inputs. We compare + // each field explicitly rather than serializing because the timestamp envelope + // is non-deterministic. + using DatabricksConnection connection = CreateConnection(); + CapturingTelemetryClient client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateThriftSession(client); + TelemetryObserver observer = new TelemetryObserver(session); + using DatabricksStatement statement = new DatabricksStatement(connection, observer); + statement.StatementId = "known-statement-id-1"; + + // Simulate the production execute flow that previously went through + // CreateTelemetryContext → RecordSuccess → EmitTelemetry. + InvokePrivate(statement, "BeginExecuteTelemetry", StatementType.Query, OperationType.ExecuteStatement); + InvokePrivate(statement, "RecordExecuteSuccess"); + InvokePrivate(statement, "FinalizeExecuteTelemetry"); + + // Exactly one log enqueued (terminal call is idempotent and no double-emission). + Assert.Equal(1, client.EnqueueCallCount); + TelemetryFrontendLog frontendLog = client.Logs[0]; + OssSqlDriverTelemetryLog log = frontendLog.Entry!.SqlDriverLog!; + + // Session/statement identifiers and workspace envelope preserved. + Assert.Equal("session-thrift-abc", log.SessionId); + Assert.Equal("known-statement-id-1", log.SqlStatementId); + Assert.Equal(4242L, frontendLog.WorkspaceId); + Assert.False(string.IsNullOrEmpty(frontendLog.FrontendLogEventId)); + + // SqlOperation envelope matches what RecordSuccess + EmitTelemetry produced. + Assert.Equal(StatementType.Query, log.SqlOperation.StatementType); + Assert.Equal(OperationType.ExecuteStatement, log.SqlOperation.OperationDetail.OperationType); + Assert.Equal(ExecutionResultFormat.InlineArrow, log.SqlOperation.ExecutionResult); + Assert.False(log.SqlOperation.IsCompressed); + + // ResultLatency block populated (FirstBatchReadyMs from OnFirstBatchReady, + // ResultsConsumedMs from OnConsumed). Pre-refactor EmitTelemetry called + // ctx.RecordResultsConsumed() right before BuildTelemetryLog, so consumption + // must be non-zero too. + Assert.NotNull(log.SqlOperation.ResultLatency); + Assert.True(log.SqlOperation.ResultLatency.ResultSetReadyLatencyMillis >= 0); + Assert.True(log.SqlOperation.ResultLatency.ResultSetConsumptionLatencyMillis >= 0); + + // No error info on the success path. + Assert.Null(log.ErrorInfo); + } + + [Fact] + public void Thrift_DriverMode_StillReportedAsThrift() + { + // Verifies the DriverMode passthrough: when DatabricksConnection initializes + // its telemetry session with Mode=Thrift, the per-statement log carries that + // mode all the way out to the wire. A future refactor that accidentally + // swapped DriverMode.Thrift for Sea in InitializeTelemetry would fail here. + using DatabricksConnection connection = CreateConnection(); + CapturingTelemetryClient client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateThriftSession(client); + TelemetryObserver observer = new TelemetryObserver(session); + using DatabricksStatement statement = new DatabricksStatement(connection, observer); + statement.StatementId = "thrift-mode-statement"; + + InvokePrivate(statement, "BeginExecuteTelemetry", StatementType.Query, OperationType.ExecuteStatement); + InvokePrivate(statement, "RecordExecuteSuccess"); + InvokePrivate(statement, "FinalizeExecuteTelemetry"); + + Assert.Equal(1, client.EnqueueCallCount); + OssSqlDriverTelemetryLog log = client.Logs[0].Entry!.SqlDriverLog!; + Assert.NotNull(log.DriverConnectionParams); + Assert.Equal(DriverModeType.Thrift, log.DriverConnectionParams.Mode); + } + + [Fact] + public void ErrorPath_FinalizesObserverWithErrorInfo() + { + // The error path inside Execute* drives the observer with OnError + OnFinalized + // before re-throwing, so the emitted log carries error_info.error_name and the + // user sees a single log per failed execute (not a missing event, not two). + using DatabricksConnection connection = CreateConnection(); + CapturingTelemetryClient client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateThriftSession(client); + TelemetryObserver observer = new TelemetryObserver(session); + using DatabricksStatement statement = new DatabricksStatement(connection, observer); + statement.StatementId = "errored-statement"; + + InvokePrivate(statement, "BeginExecuteTelemetry", StatementType.Query, OperationType.ExecuteStatement); + InvokePrivate(statement, "RecordExecuteError", new InvalidOperationException("simulated failure")); + InvokePrivate(statement, "FinalizeExecuteTelemetry"); + + Assert.Equal(1, client.EnqueueCallCount); + OssSqlDriverTelemetryLog log = client.Logs[0].Entry!.SqlDriverLog!; + Assert.NotNull(log.ErrorInfo); + Assert.Equal("InvalidOperationException", log.ErrorInfo.ErrorName); + } + + // ── Helpers ────────────────────────────────────────────────────────────────── + + /// + /// Invokes a private instance method via reflection. We exercise the helpers + /// directly because driving them through a real ExecuteQuery requires a Thrift + /// server and a server-shaped reader pipeline — both out of scope for unit tests. + /// The helpers themselves are the entire refactor surface; reaching them via + /// reflection is the same approach uses + /// for the confOverlay private field. + /// + private static void InvokePrivate(DatabricksStatement statement, string name, params object[] args) + { + MethodInfo method = typeof(DatabricksStatement).GetMethod( + name, + BindingFlags.NonPublic | BindingFlags.Instance) + ?? throw new InvalidOperationException($"Method {name} not found on DatabricksStatement"); + method.Invoke(statement, args); + } + + /// + /// Minimal adapter that just exposes a given + /// session through the property so + /// connection.TelemetrySession returns the test session in production code. + /// All other methods are no-ops; the wiring we care about for these tests is the + /// observer injection inside CreateStatement, which only reads + /// connection.TelemetrySession. + /// + private sealed class TelemetryAdapter : IConnectionTelemetry + { + public TelemetryAdapter(TelemetrySessionContext session) { Session = session; } + public TelemetrySessionContext? Session { get; } + public T ExecuteWithMetadataTelemetry(OperationType operationType, Func operation, System.Diagnostics.Activity? activity) => operation(); + public void EmitOperationTelemetry(OperationType operationType, StatementType statementType, string? statementId, long elapsedMs, Exception? error) { } + public Task DisposeAsync() => Task.CompletedTask; + } + } +} diff --git a/csharp/test/Unit/Telemetry/NullObserverTests.cs b/csharp/test/Unit/Telemetry/NullObserverTests.cs new file mode 100644 index 00000000..b8e107f5 --- /dev/null +++ b/csharp/test/Unit/Telemetry/NullObserverTests.cs @@ -0,0 +1,76 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using AdbcDrivers.Databricks.Telemetry; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry +{ + /// + /// Tests for — verifies it satisfies the + /// fail-open / no-op / singleton contract. + /// + public class NullObserverTests + { + [Fact] + public void NullObserver_AllMethods_AreNoOps() + { + // Arrange + IStatementOperationObserver observer = NullObserver.Instance; + + // Act + Assert: every method must complete without throwing and without + // observable side effects. We exercise the full surface twice to also + // confirm idempotency of OnFinalized. + for (int i = 0; i < 2; i++) + { + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: true); + observer.OnExecuteSucceeded("stmt-id-123", ExecutionResultFormat.InlineArrow); + observer.OnPollCompleted(count: 3, latencyMs: 42); + observer.OnFirstBatchReady(latencyMs: 100); + observer.OnConsumed(latencyMs: 200); + observer.OnChunksDownloaded(new ChunkMetrics()); + observer.OnReaderInspected(ExecutionResultFormat.ExternalLinks, isCompressed: true); + observer.OnError(new InvalidOperationException("boom")); + observer.OnFinalized(); + } + + // No state to inspect — passing the calls is the assertion. + } + + [Fact] + public void NullObserver_IsSingleton() + { + // Arrange + Act + NullObserver first = NullObserver.Instance; + NullObserver second = NullObserver.Instance; + + // Assert: same reference, and the only way to obtain an instance. + Assert.NotNull(first); + Assert.Same(first, second); + + // There must be no public constructor: callers should be forced to + // use the singleton accessor. + System.Reflection.ConstructorInfo[] publicCtors = typeof(NullObserver) + .GetConstructors(System.Reflection.BindingFlags.Public | System.Reflection.BindingFlags.Instance); + Assert.Empty(publicCtors); + } + } +} diff --git a/csharp/test/Unit/Telemetry/SafeObserverTests.cs b/csharp/test/Unit/Telemetry/SafeObserverTests.cs new file mode 100644 index 00000000..f7d36c2e --- /dev/null +++ b/csharp/test/Unit/Telemetry/SafeObserverTests.cs @@ -0,0 +1,344 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Collections.Generic; +using System.Diagnostics; +using System.Linq; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using AdbcDrivers.Databricks.Telemetry; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry +{ + /// + /// Tests for — the optional decorator that wraps any + /// inner with per-method try/catch so a + /// third-party observer that violates the fail-open contract cannot leak its + /// exception to the caller. + /// + public class SafeObserverTests + { + // ── Test doubles ───────────────────────────────────────────────────────────── + + /// + /// Records each method invocation so propagation tests can assert that calls + /// reach the inner observer with the exact arguments supplied by the caller. + /// + private sealed class RecordingObserver : IStatementOperationObserver + { + public List Calls { get; } = new List(); + + public StatementType? StmtType; + public OperationType? OpType; + public bool? IsCompressed; + public string? StatementId; + public ExecutionResultFormat? ResultFormat; + public int? PollCount; + public long? PollLatencyMs; + public long? FirstBatchReadyMs; + public long? ConsumedMs; + public ChunkMetrics? ChunkMetrics; + public ExecutionResultFormat? ReaderInspectedFormat; + public bool? ReaderInspectedCompressed; + public Exception? Error; + public int FinalizedCallCount; + + public void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed) + { + Calls.Add(nameof(OnExecuteStarted)); + StmtType = stmtType; + OpType = opType; + IsCompressed = isCompressed; + } + + public void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat) + { + Calls.Add(nameof(OnExecuteSucceeded)); + StatementId = statementId; + ResultFormat = resultFormat; + } + + public void OnPollCompleted(int count, long latencyMs) + { + Calls.Add(nameof(OnPollCompleted)); + PollCount = count; + PollLatencyMs = latencyMs; + } + + public void OnFirstBatchReady(long latencyMs) + { + Calls.Add(nameof(OnFirstBatchReady)); + FirstBatchReadyMs = latencyMs; + } + + public void OnConsumed(long latencyMs) + { + Calls.Add(nameof(OnConsumed)); + ConsumedMs = latencyMs; + } + + public void OnChunksDownloaded(ChunkMetrics metrics) + { + Calls.Add(nameof(OnChunksDownloaded)); + ChunkMetrics = metrics; + } + + public void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed) + { + Calls.Add(nameof(OnReaderInspected)); + ReaderInspectedFormat = resultFormat; + ReaderInspectedCompressed = isCompressed; + } + + public void OnError(Exception ex) + { + Calls.Add(nameof(OnError)); + Error = ex; + } + + public void OnFinalized() + { + Calls.Add(nameof(OnFinalized)); + FinalizedCallCount++; + } + } + + /// + /// Simulates a misbehaving third-party observer: every method on + /// throws. SafeObserver must absorb + /// each exception so the caller is unaffected. + /// + private sealed class ThrowingObserver : IStatementOperationObserver + { + public int CallCount; + + public void OnExecuteStarted(StatementType stmtType, OperationType opType, bool isCompressed) + { + CallCount++; + throw new InvalidOperationException("OnExecuteStarted boom"); + } + + public void OnExecuteSucceeded(string statementId, ExecutionResultFormat resultFormat) + { + CallCount++; + throw new InvalidOperationException("OnExecuteSucceeded boom"); + } + + public void OnPollCompleted(int count, long latencyMs) + { + CallCount++; + throw new InvalidOperationException("OnPollCompleted boom"); + } + + public void OnFirstBatchReady(long latencyMs) + { + CallCount++; + throw new InvalidOperationException("OnFirstBatchReady boom"); + } + + public void OnConsumed(long latencyMs) + { + CallCount++; + throw new InvalidOperationException("OnConsumed boom"); + } + + public void OnChunksDownloaded(ChunkMetrics metrics) + { + CallCount++; + throw new InvalidOperationException("OnChunksDownloaded boom"); + } + + public void OnReaderInspected(ExecutionResultFormat resultFormat, bool isCompressed) + { + CallCount++; + throw new InvalidOperationException("OnReaderInspected boom"); + } + + public void OnError(Exception ex) + { + CallCount++; + throw new InvalidOperationException("OnError boom"); + } + + public void OnFinalized() + { + CallCount++; + throw new InvalidOperationException("OnFinalized boom"); + } + } + + // ── Required tests ─────────────────────────────────────────────────────────── + + [Fact] + public void SafeObserver_PropagatesNormalCallsToInner() + { + // Arrange + RecordingObserver inner = new RecordingObserver(); + SafeObserver safe = new SafeObserver(inner); + ChunkMetrics metrics = new ChunkMetrics + { + TotalChunksPresent = 4, + TotalChunksIterated = 4, + InitialChunkLatencyMs = 11, + SlowestChunkLatencyMs = 99, + SumChunksDownloadTimeMs = 137, + }; + InvalidOperationException error = new InvalidOperationException("query failed"); + + // Act: exercise the full surface in a realistic lifecycle order. + safe.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: true); + safe.OnExecuteSucceeded("stmt-77", ExecutionResultFormat.ExternalLinks); + safe.OnPollCompleted(count: 5, latencyMs: 87); + safe.OnFirstBatchReady(latencyMs: 120); + safe.OnChunksDownloaded(metrics); + safe.OnReaderInspected(ExecutionResultFormat.ExternalLinks, isCompressed: true); + safe.OnConsumed(latencyMs: 350); + safe.OnError(error); + safe.OnFinalized(); + + // Assert: every call reached the inner observer, in order, with original args. + Assert.Equal( + new[] + { + nameof(IStatementOperationObserver.OnExecuteStarted), + nameof(IStatementOperationObserver.OnExecuteSucceeded), + nameof(IStatementOperationObserver.OnPollCompleted), + nameof(IStatementOperationObserver.OnFirstBatchReady), + nameof(IStatementOperationObserver.OnChunksDownloaded), + nameof(IStatementOperationObserver.OnReaderInspected), + nameof(IStatementOperationObserver.OnConsumed), + nameof(IStatementOperationObserver.OnError), + nameof(IStatementOperationObserver.OnFinalized), + }, + inner.Calls); + + Assert.Equal(StatementType.Query, inner.StmtType); + Assert.Equal(OperationType.ExecuteStatement, inner.OpType); + Assert.True(inner.IsCompressed); + Assert.Equal("stmt-77", inner.StatementId); + Assert.Equal(ExecutionResultFormat.ExternalLinks, inner.ResultFormat); + Assert.Equal(5, inner.PollCount); + Assert.Equal(87, inner.PollLatencyMs); + Assert.Equal(120, inner.FirstBatchReadyMs); + Assert.Equal(350, inner.ConsumedMs); + Assert.Same(metrics, inner.ChunkMetrics); + Assert.Equal(ExecutionResultFormat.ExternalLinks, inner.ReaderInspectedFormat); + Assert.True(inner.ReaderInspectedCompressed); + Assert.Same(error, inner.Error); + Assert.Equal(1, inner.FinalizedCallCount); + + // Also confirm the decorator exposes the wrapped instance. + Assert.Same(inner, safe.Inner); + } + + [Fact] + public void SafeObserver_SwallowsExceptionsFromInner_LogsAtTrace() + { + // Arrange: subscribe an ActivityListener so the trace-level activity event + // emitted on exception suppression is observable. We must have an ambient + // Activity for the AddEvent call to take effect. + using ActivitySource source = new ActivitySource("SafeObserverTests"); + List capturedEvents = new List(); + using ActivityListener listener = new ActivityListener + { + ShouldListenTo = s => s.Name == "SafeObserverTests", + Sample = (ref ActivityCreationOptions _) => ActivitySamplingResult.AllData, + ActivityStopped = activity => + { + foreach (ActivityEvent ev in activity.Events) + { + capturedEvents.Add(ev); + } + }, + }; + ActivitySource.AddActivityListener(listener); + + ThrowingObserver throwing = new ThrowingObserver(); + SafeObserver safe = new SafeObserver(throwing); + + // Act + Assert: exercise the entire surface; nothing must escape SafeObserver. + using (Activity? activity = source.StartActivity("safe-observer-suppression-test")) + { + Assert.NotNull(activity); // sanity: listener is wired so an activity exists + + Exception? captured = Record.Exception(() => + { + safe.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: false); + safe.OnExecuteSucceeded("s1", ExecutionResultFormat.InlineArrow); + safe.OnPollCompleted(count: 1, latencyMs: 1); + safe.OnFirstBatchReady(latencyMs: 1); + safe.OnConsumed(latencyMs: 1); + safe.OnChunksDownloaded(new ChunkMetrics()); + safe.OnReaderInspected(ExecutionResultFormat.ExternalLinks, isCompressed: true); + safe.OnError(new InvalidOperationException("propagated error")); + safe.OnFinalized(); + }); + Assert.Null(captured); + + // Inner observer was invoked exactly once per method (nine methods). + Assert.Equal(9, throwing.CallCount); + } + + // Activity has now stopped; ActivityStopped callback populates capturedEvents. + // SafeObserver emits one suppression event per swallowed exception. + List suppressions = capturedEvents + .Where(e => e.Name == "telemetry.observer.suppressed") + .ToList(); + Assert.Equal(9, suppressions.Count); + + // Each suppression event must carry diagnostic tags identifying SafeObserver + // as the source and including the inner exception's type and message — + // this is what makes the suppression visible at trace level. + foreach (ActivityEvent ev in suppressions) + { + Dictionary tags = ev.Tags.ToDictionary(kv => kv.Key, kv => kv.Value); + Assert.Equal(nameof(InvalidOperationException), tags["error.type"]); + Assert.Equal("SafeObserver", tags["observer.suppressed.source"]); + Assert.Contains("boom", (string)tags["error.message"]!); + } + } + + // ── Additional coverage ────────────────────────────────────────────────────── + + [Fact] + public void SafeObserver_Constructor_RejectsNullInner() + { + Assert.Throws(() => new SafeObserver(null!)); + } + + [Fact] + public void SafeObserver_SwallowsException_WithoutAmbientActivity() + { + // Arrange: no ActivityListener subscribed → Activity.Current is null. The + // suppression path must still succeed without throwing. + ThrowingObserver throwing = new ThrowingObserver(); + SafeObserver safe = new SafeObserver(throwing); + + // Act + Assert + Exception? captured = Record.Exception(() => + { + safe.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: false); + safe.OnFinalized(); + }); + Assert.Null(captured); + Assert.Equal(2, throwing.CallCount); + } + } +} diff --git a/csharp/test/Unit/Telemetry/TelemetryObserverTests.cs b/csharp/test/Unit/Telemetry/TelemetryObserverTests.cs new file mode 100644 index 00000000..d14d0b6f --- /dev/null +++ b/csharp/test/Unit/Telemetry/TelemetryObserverTests.cs @@ -0,0 +1,411 @@ +/* +* Copyright (c) 2025 ADBC Drivers Contributors +* +* Licensed under the Apache License, Version 2.0 (the "License"); +* you may not use this file except in compliance with the License. +* You may obtain a copy of the License at +* +* http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ + +using System; +using System.Collections.Generic; +using System.Threading; +using System.Threading.Tasks; +using AdbcDrivers.Databricks.Reader.CloudFetch; +using AdbcDrivers.Databricks.Telemetry; +using AdbcDrivers.Databricks.Telemetry.Models; +using AdbcDrivers.Databricks.Telemetry.Proto; +using ExecutionResultFormat = AdbcDrivers.Databricks.Telemetry.Proto.ExecutionResult.Types.Format; +using OperationType = AdbcDrivers.Databricks.Telemetry.Proto.Operation.Types.Type; +using StatementType = AdbcDrivers.Databricks.Telemetry.Proto.Statement.Types.Type; +using Xunit; + +namespace AdbcDrivers.Databricks.Tests.Unit.Telemetry +{ + /// + /// Unit tests for verifying: + /// + /// Observer method calls propagate into the underlying . + /// OnFinalized enqueues exactly one . + /// The terminal call is idempotent under both serial and concurrent invocation. + /// All methods satisfy the fail-open contract even when the telemetry client throws. + /// + /// + public class TelemetryObserverTests + { + // ── Test doubles ───────────────────────────────────────────────────────────── + + /// + /// Records every enqueued log so tests can inspect the exact proto fields the + /// observer attempted to emit, and counts enqueues to assert exactly-once semantics. + /// + private sealed class CapturingTelemetryClient : ITelemetryClient + { + public List Logs { get; } = new List(); + + public int EnqueueCallCount; + + public void Enqueue(TelemetryFrontendLog log) + { + Interlocked.Increment(ref EnqueueCallCount); + lock (Logs) + { + Logs.Add(log); + } + } + + public Task FlushAsync(CancellationToken ct = default) => Task.CompletedTask; + + public Task CloseAsync() => Task.CompletedTask; + + public ValueTask DisposeAsync() => default; + } + + /// + /// Simulates a corrupted / faulty telemetry client whose Enqueue path raises an + /// exception. Used to assert the observer's fail-open guarantee. + /// + private sealed class ThrowingTelemetryClient : ITelemetryClient + { + public int EnqueueCallCount; + + public void Enqueue(TelemetryFrontendLog log) + { + Interlocked.Increment(ref EnqueueCallCount); + throw new InvalidOperationException("simulated telemetry client failure"); + } + + public Task FlushAsync(CancellationToken ct = default) => Task.CompletedTask; + + public Task CloseAsync() => Task.CompletedTask; + + public ValueTask DisposeAsync() => default; + } + + // ── Fixtures ───────────────────────────────────────────────────────────────── + + private static TelemetrySessionContext CreateSession(ITelemetryClient client) + { + return new TelemetrySessionContext + { + SessionId = "session-abc", + WorkspaceId = 4242L, + TelemetryClient = client, + AuthType = "pat", + SystemConfiguration = new DriverSystemConfiguration { DriverVersion = "1.0.0" }, + DriverConnectionParams = new DriverConnectionParameters { HttpPath = "/sql/1.0/wh/x" }, + }; + } + + private static (TelemetryObserver observer, CapturingTelemetryClient client) CreateObserver() + { + CapturingTelemetryClient client = new CapturingTelemetryClient(); + TelemetrySessionContext session = CreateSession(client); + TelemetryObserver observer = new TelemetryObserver(session); + return (observer, client); + } + + // ── Required tests (per task description) ──────────────────────────────────── + + [Fact] + public void TelemetryObserver_OnExecuteStarted_PopulatesContext() + { + // Arrange + (TelemetryObserver observer, _) = CreateObserver(); + + // Act + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: true); + + // Assert: scalar fields land on the underlying context. + Assert.Equal(StatementType.Query, observer.Context.StatementType); + Assert.Equal(OperationType.ExecuteStatement, observer.Context.OperationType); + Assert.True(observer.Context.IsCompressed); + } + + [Fact] + public void TelemetryObserver_OnExecuteSucceeded_RecordsStatementId() + { + // Arrange + (TelemetryObserver observer, _) = CreateObserver(); + + // Act + observer.OnExecuteSucceeded("stmt-id-42", ExecutionResultFormat.ExternalLinks); + + // Assert + Assert.Equal("stmt-id-42", observer.Context.StatementId); + Assert.Equal(ExecutionResultFormat.ExternalLinks, observer.Context.ResultFormat); + } + + [Fact] + public void TelemetryObserver_OnFinalized_EnqueuesExactlyOnce() + { + // Arrange + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: false); + observer.OnExecuteSucceeded("stmt-1", ExecutionResultFormat.InlineArrow); + + // Act + observer.OnFinalized(); + + // Assert + Assert.Equal(1, client.EnqueueCallCount); + Assert.Single(client.Logs); + + // The emitted log must reflect the recorded context. + OssSqlDriverTelemetryLog log = client.Logs[0].Entry!.SqlDriverLog!; + Assert.Equal("session-abc", log.SessionId); + Assert.Equal("stmt-1", log.SqlStatementId); + Assert.Equal(StatementType.Query, log.SqlOperation.StatementType); + Assert.Equal(OperationType.ExecuteStatement, log.SqlOperation.OperationDetail.OperationType); + Assert.Null(log.ErrorInfo); + + // Workspace id and frontend envelope must be populated. + Assert.Equal(4242L, client.Logs[0].WorkspaceId); + Assert.False(string.IsNullOrEmpty(client.Logs[0].FrontendLogEventId)); + Assert.NotNull(client.Logs[0].Context); + Assert.True(client.Logs[0].Context!.TimestampMillis > 0); + } + + [Fact] + public void TelemetryObserver_OnFinalized_CalledTwice_EnqueuesOnce() + { + // Arrange + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + + // Act: invoke OnFinalized twice in serial (mirrors error + dispose paths). + observer.OnFinalized(); + observer.OnFinalized(); + + // Assert: exactly one enqueue. + Assert.Equal(1, client.EnqueueCallCount); + Assert.True(observer.HasEmitted); + } + + [Fact] + public async Task TelemetryObserver_OnFinalized_ConcurrentCalls_EnqueueOnce() + { + // Arrange + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + + // Act: race many threads to OnFinalized; only one must win. + const int parallelism = 32; + using ManualResetEventSlim start = new ManualResetEventSlim(); + Task[] tasks = new Task[parallelism]; + for (int i = 0; i < parallelism; i++) + { + tasks[i] = Task.Run(() => + { + start.Wait(); + observer.OnFinalized(); + }); + } + start.Set(); + await Task.WhenAll(tasks); + + // Assert + Assert.Equal(1, client.EnqueueCallCount); + } + + [Fact] + public void TelemetryObserver_OnError_RecordsErrorAndFinalizes() + { + // Arrange + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: false); + InvalidOperationException error = new InvalidOperationException("simulated query failure"); + + // Act + observer.OnError(error); + observer.OnFinalized(); + observer.OnFinalized(); // idempotent: should not double-emit even with error path + + // Assert: exactly one log, error_info populated. + Assert.Equal(1, client.EnqueueCallCount); + OssSqlDriverTelemetryLog log = client.Logs[0].Entry!.SqlDriverLog!; + Assert.NotNull(log.ErrorInfo); + Assert.Equal("InvalidOperationException", log.ErrorInfo.ErrorName); + Assert.True(observer.Context.HasError); + Assert.Equal("simulated query failure", observer.Context.ErrorMessage); + } + + [Fact] + public void TelemetryObserver_AllMethods_NeverThrow_WhenContextCorrupted() + { + // Arrange: build an observer whose telemetry client throws on every Enqueue + // (this simulates a corrupted downstream dependency). The observer must + // continue to absorb all calls without re-throwing. + ThrowingTelemetryClient throwing = new ThrowingTelemetryClient(); + TelemetrySessionContext session = CreateSession(throwing); + TelemetryObserver observer = new TelemetryObserver(session); + + // Act + Assert: exercise the entire surface, including pathological inputs + // (null statementId, null exception, null ChunkMetrics). + Exception? captured = Record.Exception(() => + { + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: true); + observer.OnExecuteSucceeded(null!, ExecutionResultFormat.Unspecified); + observer.OnPollCompleted(count: 0, latencyMs: 0); + observer.OnFirstBatchReady(latencyMs: -1); + observer.OnConsumed(latencyMs: -1); + observer.OnChunksDownloaded(null!); + observer.OnReaderInspected(ExecutionResultFormat.Unspecified, isCompressed: false); + observer.OnError(null!); + observer.OnError(new InvalidOperationException("boom")); + observer.OnFinalized(); + observer.OnFinalized(); + }); + + Assert.Null(captured); + + // The throwing client must have been invoked exactly once (idempotent finalize) + // and the observer must have swallowed its exception. + Assert.Equal(1, throwing.EnqueueCallCount); + Assert.True(observer.HasEmitted); + } + + [Fact] + public void TelemetryObserver_OnChunksDownloaded_MergesIntoChunkDetails() + { + // Arrange + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + ChunkMetrics metrics = new ChunkMetrics + { + TotalChunksPresent = 12, + TotalChunksIterated = 11, + InitialChunkLatencyMs = 75, + SlowestChunkLatencyMs = 220, + SumChunksDownloadTimeMs = 1430, + }; + + // Act + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: false); + observer.OnChunksDownloaded(metrics); + observer.OnFinalized(); + + // Assert: context absorbed the metrics. + Assert.Equal(12, observer.Context.TotalChunksPresent); + Assert.Equal(11, observer.Context.TotalChunksIterated); + Assert.Equal(75, observer.Context.InitialChunkLatencyMs); + Assert.Equal(220, observer.Context.SlowestChunkLatencyMs); + Assert.Equal(1430, observer.Context.SumChunksDownloadTimeMs); + + // The chunk_details proto block on the emitted log mirrors the input. + ChunkDetails details = client.Logs[0].Entry!.SqlDriverLog!.SqlOperation.ChunkDetails; + Assert.NotNull(details); + Assert.Equal(12, details.TotalChunksPresent); + Assert.Equal(11, details.TotalChunksIterated); + Assert.Equal(75, details.InitialChunkLatencyMillis); + Assert.Equal(220, details.SlowestChunkLatencyMillis); + Assert.Equal(1430, details.SumChunksDownloadTimeMillis); + } + + // ── Additional coverage ────────────────────────────────────────────────────── + + [Fact] + public void TelemetryObserver_Constructor_RejectsNullSession() + { + Assert.Throws(() => new TelemetryObserver(null!)); + } + + [Fact] + public void TelemetryObserver_OnFinalized_WithNullTelemetryClient_IsNoOp() + { + // Arrange: session has no telemetry client (covers the disabled case). + TelemetrySessionContext session = new TelemetrySessionContext + { + SessionId = "s1", + WorkspaceId = 1L, + TelemetryClient = null, + }; + TelemetryObserver observer = new TelemetryObserver(session); + + // Act + Assert: must not throw, must mark itself emitted so a later call is a no-op. + observer.OnFinalized(); + observer.OnFinalized(); + Assert.True(observer.HasEmitted); + } + + [Fact] + public void TelemetryObserver_OnFirstBatchReady_OnlyFirstCallWins() + { + // Arrange + (TelemetryObserver observer, _) = CreateObserver(); + + // Act + observer.OnFirstBatchReady(latencyMs: 50); + observer.OnFirstBatchReady(latencyMs: 999); + + // Assert: subsequent calls do not overwrite the earliest observed latency. + Assert.Equal(50, observer.Context.FirstBatchReadyMs); + } + + [Fact] + public void TelemetryObserver_OnReaderInspected_OverridesResultFormatAndCompression() + { + // Arrange: OnExecuteStarted seeds the placeholder defaults (InlineArrow / false) + // that the legacy CreateTelemetryContext helper stamped. The reader inspection + // hook fires later from the finalize path and must overwrite both fields with + // the server-reported truth (PECO-2988, PECO-2978). + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatement, isCompressed: false); + observer.OnExecuteSucceeded("stmt-rdr", ExecutionResultFormat.InlineArrow); + + // Act: simulate the CloudFetch + LZ4 case where the active reader reports + // a different result format and compression flag than the defaults. + observer.OnReaderInspected(ExecutionResultFormat.ExternalLinks, isCompressed: true); + observer.OnFinalized(); + + // Assert: context reflects the post-inspection values, and the emitted log + // carries them through to the proto fields. + Assert.Equal(ExecutionResultFormat.ExternalLinks, observer.Context.ResultFormat); + Assert.True(observer.Context.IsCompressed); + OssSqlDriverTelemetryLog log = client.Logs[0].Entry!.SqlDriverLog!; + Assert.Equal(ExecutionResultFormat.ExternalLinks, log.SqlOperation.ExecutionResult); + Assert.True(log.SqlOperation.IsCompressed); + } + + [Fact] + public void TelemetryObserver_OnReaderInspected_DoesNotThrow_WhenInvokedBeforeExecuteStart() + { + // Arrange: caller invokes OnReaderInspected directly without an OnExecuteStarted + // prelude. The method must be idempotent-friendly and never throw, per the + // fail-open contract. + (TelemetryObserver observer, _) = CreateObserver(); + + // Act + Assert + Exception? captured = Record.Exception(() => + observer.OnReaderInspected(ExecutionResultFormat.InlineArrow, isCompressed: false)); + Assert.Null(captured); + + // The fields it touches must reflect the invocation even without a prior call. + Assert.Equal(ExecutionResultFormat.InlineArrow, observer.Context.ResultFormat); + Assert.False(observer.Context.IsCompressed); + } + + [Fact] + public void TelemetryObserver_OnPollCompleted_StoresCountAndLatency() + { + // Arrange + (TelemetryObserver observer, CapturingTelemetryClient client) = CreateObserver(); + + // Act + observer.OnExecuteStarted(StatementType.Query, OperationType.ExecuteStatementAsync, isCompressed: false); + observer.OnPollCompleted(count: 5, latencyMs: 250); + observer.OnFinalized(); + + // Assert + Assert.Equal(5, observer.Context.PollCount); + Assert.Equal(250, observer.Context.PollLatencyMs); + OperationDetail detail = client.Logs[0].Entry!.SqlDriverLog!.SqlOperation.OperationDetail; + Assert.Equal(5, detail.NOperationStatusCalls); + Assert.Equal(250, detail.OperationStatusLatencyMillis); + } + } +} diff --git a/docs/designs/PECO-3022-sea-telemetry-integration-design.md b/docs/designs/PECO-3022-sea-telemetry-integration-design.md index 9f02ecb4..d267c489 100644 --- a/docs/designs/PECO-3022-sea-telemetry-integration-design.md +++ b/docs/designs/PECO-3022-sea-telemetry-integration-design.md @@ -255,7 +255,7 @@ public static IConnectionTelemetry Create( IReadOnlyDictionary properties, string host, string assemblyVersion, - IOAuthTokenProvider? oauthTokenProvider, + OAuthClientCredentialsProvider? oauthTokenProvider, string sessionId, // CHANGED: was TSessionHandle? sessionHandle DriverMode.Types.Type mode, // NEW: Thrift or Sea bool enableDirectResults, @@ -264,7 +264,9 @@ public static IConnectionTelemetry Create( Activity? activity); ``` -Thrift caller converts at the boundary: `sessionHandle.SessionId.Guid.ToString()`. SEA caller passes its `_sessionId` directly. +Thrift caller converts at the boundary: `sessionHandle.SessionId.Guid.ToString()` (with a null-check; an empty string is mapped to `null` `SessionId` inside `Create` to match the prior behavior). SEA caller passes its `_sessionId` directly. + +The `mode` parameter is also threaded through `BuildDriverConnectionParams` and `SafeBuildDriverConnectionParams` — both methods previously hardcoded `Mode = DriverMode.Types.Type.Thrift`. The literal is gone from `ConnectionTelemetry.cs`; only the `DatabricksConnection` (Thrift) call site supplies it. ### 5.3 `IConnectionTelemetry` — no surface change @@ -400,7 +402,9 @@ sequenceDiagram ### Connection-level concurrency -`IConnectionTelemetry.DisposeAsync` is called from `StatementExecutionConnection.Dispose` synchronously (consistent with existing Thrift pattern): `_telemetry.DisposeAsync().AsTask().Wait(TimeSpan.FromSeconds(5))`. This flushes any pending events with a hard timeout so connection-close cannot hang on a stuck exporter. +`IConnectionTelemetry.DisposeAsync` is called from `StatementExecutionConnection.Dispose` synchronously (consistent with existing Thrift pattern): `_telemetry.DisposeAsync().Wait(TimeSpan.FromSeconds(5))`. This flushes any pending events with a hard timeout so connection-close cannot hang on a stuck exporter. + +> **Implementation note (T5):** `IConnectionTelemetry.DisposeAsync` returns `Task` (not `ValueTask`), so the call is `_telemetry.DisposeAsync().Wait(TimeSpan.FromSeconds(5))` rather than `.AsTask().Wait(...)`. The result is the same: a hard-bounded synchronous flush. --- @@ -454,7 +458,9 @@ The existing `CircuitBreakerTelemetryExporter` is reused unchanged. Behavior in ### Connection telemetry initialization failure -If `ConnectionTelemetry.Create` throws during `OpenAsync` (e.g. feature-flag fetch fails), the exception is caught locally in `StatementExecutionConnection`, logged at `TRACE`, and `_telemetry` is set to a `NullConnectionTelemetry` singleton (already exists for Thrift). The connection open succeeds; only telemetry is disabled for the connection. +If `ConnectionTelemetry.Create` throws during `OpenAsync` (e.g. feature-flag fetch fails), the exception is caught locally in `StatementExecutionConnection`, logged at `TRACE`, and `_telemetry` is set to a `NoOpConnectionTelemetry` singleton (already exists for Thrift; the name in the codebase is `NoOpConnectionTelemetry`). The connection open succeeds; only telemetry is disabled for the connection. + +> **Implementation note (T5):** `ConnectionTelemetry.Create` already swallows internally and returns `NoOpConnectionTelemetry.Instance` on any failure, so the outer try/catch in `StatementExecutionConnection.InitializeTelemetry` is belt-and-suspenders — it only runs if `Create` is later modified to throw in a refactor. Keeping it makes the fail-open contract explicit at the call site. ---