Skip to content

feat(csharp): add enable_fast_metadata_query flag for DESC TABLE EXTENDED#456

Open
jadewang-db wants to merge 3 commits into
mainfrom
PECO-3021
Open

feat(csharp): add enable_fast_metadata_query flag for DESC TABLE EXTENDED#456
jadewang-db wants to merge 3 commits into
mainfrom
PECO-3021

Conversation

@jadewang-db
Copy link
Copy Markdown
Collaborator

@jadewang-db jadewang-db commented May 14, 2026

Summary

Adds opt-in adbc.databricks.enable_fast_metadata_query flag (default false) that pairs the new SQL modifier STATIC ONLY (runtime PR #198486) on DESC TABLE EXTENDED AS JSON with the matching off-WLM routing signal per protocol.

Both signals are required together — STATIC ONLY alone still queues through WLM; off-WLM routing alone still does the full metadata scan.

Protocol SQL emitted Off-WLM signal Conditions
Thrift DESC TABLE EXTENDED <t> STATIC ONLY AS JSON RunAsync=false on descStmt flag set AND warehouse path (/sql/1.0/warehouses/{id} or /sql/1.0/endpoints/{id}). General-cluster paths: flag is ignored.
SEA DESC TABLE EXTENDED <t> STATIC ONLY AS JSON x-databricks-sea-can-run-fully-sync: true (already sent by ExecuteMetadataSqlAsync) flag set. SEA is always-warehouse.

Implementation

  • DatabricksConnection.IsWarehousePath matches the path regex (cached) — only used to gate the Thrift opt-in.
  • DatabricksStatement.RunAsyncOverride is a new internal nullable bool; when non-null it takes precedence over the connection-level RunAsyncInThrift inside SetStatementProperties. The override is set on the per-call descStmt in GetColumnsExtendedAsync rather than mutating connection state.
  • StatementExecutionConnection.EnableFastMetadataQuery parses the same flag for SEA.

Fallback safety

STATIC ONLY is rejected with parse error 42601 (INVALID_STATIC_ONLY_USAGE) on servers without PR #198486. The existing catch (HiveServer2Exception ex) when (ex.SqlState == "42601" || ex.SqlState == "20000") in GetColumnsExtendedAsync already handles that — falls back to GetColumns + GetPrimaryKeys + GetCrossReference.

Test plan

  • Added unit tests for the new parameter constant.
  • Added unit tests for IsWarehousePath covering /sql/1.0/warehouses/{id}, /sql/1.0/endpoints/{id} (with/without trailing slash, with/without query strings), /sql/protocolv1/... clusters, empty path, and the AdbcOptions.Uri fallback.
  • Added unit tests for UseFastMetadataQuery covering all four combinations of (flag, warehouse-path).
  • Full dotnet test --filter "FullyQualifiedName~Unit." suite: 638 passed, 0 failed.
  • E2E verification against a DBSQL warehouse running PR #198486 is gated on that runtime change reaching the test env; existing 42601 fallback protects pre-rollout.

Closes PECO-3021

This pull request and its description were written by Isaac.

Copy link
Copy Markdown
Collaborator Author

@jadewang-db jadewang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review

Overall the change is small and well-scoped. The Thrift-side opt-in plus warehouse-path gating matches the JIRA, and SEA was correctly identified as already wired via x-databricks-sea-can-run-fully-sync. One high-priority PR-hygiene issue around the branch state, plus a handful of low-priority simplifications inline.

Positive observations

  • UseFastMetadataQuery = _enableFastMetadataQuery && IsWarehousePath is the right composition; unit tests enumerate the full truth table.
  • Per-statement RunAsyncOverride is the right surface — mutating connection state would have been worse.
  • Existing 42601/20000 fallback in GetColumnsExtendedAsync still protects against servers that reject the new shape, so rollout is safe.
  • Activity tag fast_metadata_query will make the flag's effect easy to confirm in production traces.

HIGH — branch is significantly behind main

The PR diff lists 11 files (+277/-9), but the commit f8f66af only touches 6 files (+200/-3). The extras come from telemetry refactors that landed on main while this branch was idle (e.g., PECO-2978/2988/2992 in DatabricksStatement.cs and ConnectionTelemetry.cs), plus d02dae2 which was already merged via #401.

$ git log origin/main..PECO-3021 --oneline
f8f66af feat(csharp): add enable_fast_metadata_query flag for DESC TABLE EXTENDED
d02dae2 fix(csharp): emit bare hostname for telemetry host_url to match JDBC   ← already merged via #401

Recommend git rebase origin/main and force-push so the diff cleanly reflects only PECO-3021's work.

Test coverage note

Coverage of new code is exclusively unit-level. Wire behavior (server actually receiving RunAsync=false) is not asserted — relies on the existing E2E harness once runtime PR #198486 deploys. Worth mentioning the intentional E2E gap in the PR body.

Comment thread csharp/src/DatabricksStatement.cs Outdated
Comment thread csharp/src/DatabricksConnection.cs Outdated
Comment thread csharp/src/DatabricksStatement.cs Outdated
Comment thread csharp/src/DatabricksStatement.cs
Comment thread csharp/src/DatabricksConnection.cs
Comment thread csharp/src/DatabricksStatement.cs Outdated
Copy link
Copy Markdown
Collaborator Author

@jadewang-db jadewang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review pass on PECO-3021. Overall the design is clean — the AND-gate of flag + warehouse-path on Thrift, the per-statement RunAsyncOverride to avoid mutating connection state, and the reuse of the existing 42601/20000 fallback for old servers are all the right shapes. The unit-test matrix on IsWarehousePath and UseFastMetadataQuery is thorough.

Findings below are mostly Low/Medium polish — no Critical or High issues. One scope/PR-hygiene note: this PR also carries commit d02dae2 (telemetry host_url bare-hostname fix, PECO-2987) which the description doesn't mention. Consider either splitting it out or extending the PR body so reviewers know both changes are in flight.

Comment thread csharp/src/DatabricksConnection.cs Outdated
Comment thread csharp/src/DatabricksConnection.cs Outdated
Comment thread csharp/src/DatabricksStatement.cs Outdated
Comment thread csharp/test/E2E/Telemetry/ConnectionParametersTests.cs Outdated
@jadewang-db jadewang-db changed the base branch from main to PECO-2987 May 22, 2026 18:57
jadewang-db pushed a commit that referenced this pull request May 22, 2026
- Drop `_isWarehousePathCached` field; `IsWarehousePath` getter now
  recomputes on each call. Removes the `Nullable<bool>` non-atomic
  read/write concern and simplifies the property.
- Scope the `IndexOf('?')` query-string strip to the raw-Path branch
  only; `Uri.AbsolutePath` already strips it for the URI branch.
- Replace the defensive `descStmt is DatabricksStatement` pattern in
  `GetColumnsExtendedAsync` with an unconditional cast — if a future
  refactor breaks the invariant, a loud `InvalidCastException` is
  better than silently emitting a fast-metadata query that still goes
  through WLM.
- Extract `var connection = (DatabricksConnection)Connection;` to
  reuse a single local for `CanUseDescTableExtended` and
  `UseFastMetadataQuery` instead of casting twice.
- Drop overly-strict `Assert.DoesNotContain(":", actualHostUrl)` from
  the host_url regression test; the equality check and `"://"` check
  together enforce the bare-hostname contract without rejecting IPv6
  literals.

Co-authored-by: Isaac
@jadewang-db jadewang-db changed the base branch from PECO-2987 to main May 22, 2026 18:59
@jadewang-db jadewang-db deleted the PECO-3021 branch May 22, 2026 20:00
@jadewang-db jadewang-db restored the PECO-3021 branch May 22, 2026 20:01
@jadewang-db jadewang-db reopened this May 22, 2026
@jadewang-db jadewang-db requested a review from msrathore-db May 26, 2026 21:18
// telemetry event emitted at statement Dispose, instead of a meaningless 0.
internal long? CloseStatementRpcLatencyMs { get; set; }
internal Exception? CloseStatementRpcError { get; set; }
internal bool? RunAsyncOverride { get; set; }
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this extra flag? Who is passing this in?
Can we just rely on runAsyncInThrift?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runAsyncInThrift is the user's connection-level setting (adbc.databricks.enable_run_async_in_thrift_op, defaults to true) — flipping it on the connection would affect all their statements, and mutating it around the desc-stmt call would race with any concurrent statements on the same connection.

RunAsyncOverride is an internal per-statement mechanism — no user surface — used exclusively by GetColumnsExtendedAsync to flip RunAsync=false on its own internal descStmt while the user's other statements keep their configured setting. The pairing with the STATIC ONLY SQL modifier is what tells the warehouse to route the desc-stmt off the WLM path; the connection-level flag alone can't express "this single statement, no others."

Open to a different shape if you have one in mind — happy to refactor if e.g. you'd prefer a constructor param or a dedicated ExecuteWithRunAsyncOff() helper instead of the override property.

Comment thread csharp/src/Telemetry/ConnectionTelemetry.cs
Copy link
Copy Markdown
Collaborator

@msrathore-db msrathore-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from Eric's comment

Jade Wang added 2 commits May 27, 2026 16:01
…NDED

Adds the SQL modifier STATIC ONLY (runtime PR #198486) to DESC TABLE EXTENDED
AS JSON so a DBSQL warehouse can skip Delta log / Mesa RPCs / other expensive
I/O. STATIC ONLY only takes effect when paired with off-WLM routing, so the
driver also flips the matching protocol signal:

- Thrift: RunAsync=false on the descStmt — gated on warehouse path
  (/sql/1.0/warehouses/{id} or /sql/1.0/endpoints/{id}) so the signal is
  never sent to general-purpose clusters.
- SEA: ExecuteMetadataSqlAsync already sends x-databricks-sea-can-run-fully-sync;
  SEA is always-warehouse, so the flag alone gates the SQL change there.

Older servers that don't recognise STATIC ONLY return parse error 42601,
caught by the existing fallback in GetColumnsExtendedAsync.

Co-authored-by: Isaac
- Drop `_isWarehousePathCached` field; `IsWarehousePath` getter now
  recomputes on each call. Removes the `Nullable<bool>` non-atomic
  read/write concern and simplifies the property.
- Scope the `IndexOf('?')` query-string strip to the raw-Path branch
  only; `Uri.AbsolutePath` already strips it for the URI branch.
- Replace the defensive `descStmt is DatabricksStatement` pattern in
  `GetColumnsExtendedAsync` with an unconditional cast — if a future
  refactor breaks the invariant, a loud `InvalidCastException` is
  better than silently emitting a fast-metadata query that still goes
  through WLM.
- Extract `var connection = (DatabricksConnection)Connection;` to
  reuse a single local for `CanUseDescTableExtended` and
  `UseFastMetadataQuery` instead of casting twice.
- Drop overly-strict `Assert.DoesNotContain(":", actualHostUrl)` from
  the host_url regression test; the equality check and `"://"` check
  together enforce the bare-hostname contract without rejecting IPv6
  literals.

Co-authored-by: Isaac
The runtime grammar (SqlBaseParser.g4) accepts modifiers in this order:

    identifierReference partitionSpec? describeColName? (AS JSON)? (STATIC ONLY)?

The driver was emitting `STATIC ONLY AS JSON` (modifiers swapped), which
the parser rejects with `42601 PARSE_SYNTAX_ERROR at 'AS'`. The existing
`42601` catch in `GetColumnsExtendedAsync` would have masked the bug —
falling back silently to the slow path even on warehouses that *have*
the runtime patch (PECO-3022 / runtime PR #198486), defeating the
optimization with no signal to the operator.

E2E-verified against a custom DBR image built from ~/runtime on the
`stack/describe-table-static-only` branch (commit 485e630), deployed
to a SQL warehouse on benchmarking-staging-aws-us-west-2:

- flag=true  + warehouse: emits `DESC TABLE EXTENDED <t> AS JSON STATIC ONLY`,
  status FINISHED, returns JSON metadata.
- flag=false + warehouse: emits `DESC TABLE EXTENDED <t> AS JSON`,
  status FINISHED.
- flag=true  + cluster path: `UseFastMetadataQuery` returns false
  (existing unit test `UseFastMetadataQuery_RequiresFlagAndWarehousePath`).

Also adds `FastMetadataQueryE2ETest.cs` with three E2E checks (driver
emission, runtime acceptance of the keyword, flag-disabled control)
and tightens docstrings to spell out the emitted SQL explicitly so the
order is harder to flip in a future refactor.

Co-authored-by: Isaac
@jadewang-db jadewang-db added this pull request to the merge queue May 29, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 29, 2026
@jadewang-db jadewang-db added this pull request to the merge queue May 30, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks May 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants