fix(csharp): emit result.bytes_downloaded on inline Dispose by eric-wang-1990 · Pull Request #493 · adbc-drivers/databricks

eric-wang-1990 · 2026-05-28T07:21:48Z

Motivation

DatabricksCompositeReader.Dispose emitted an asymmetric span shape across
result delivery paths:

CloudFetch path — reader.active_reader_type = "CloudFetchReader"
and result.bytes_downloaded = <N> (a real byte counter).
Inline path — reader.active_reader_type only; no byte counter.

The byte counter is exactly the kind of signal debug tooling and
dashboards filter on, and missing it on inline disposes means
operators see a blank field for half their workloads.

What changed

csharp/src/Reader/DatabricksReader.cs — added a _totalBytesConsumed
field that accumulates the wire-side TSparkArrowBatch.Batch.Length
for every inline batch passed through ProcessFetchedBatches. In
Dispose(bool) it emits Activity.Current?.SetTag( StatementExecutionEvent.ResultBytesDownloaded, _totalBytesConsumed),
identical to the call CloudFetchReader.Dispose already makes.

Because BaseDatabricksReader.Dispose opens no activity of its own,
Activity.Current at the moment either reader's Dispose runs is
the DatabricksCompositeReader.Dispose span — i.e. the tag lands on
exactly the span issue tracing(csharp): inline Dispose path missing result.bytes_downloaded (CloudFetch has it) #485 calls out as asymmetric, on both paths.

Byte-counting convention

CloudFetch counts IDownloadResult.Size — the chunk's size as
delivered by the downloader, before any LZ4 decompression. To match
that semantic for inline data, this patch counts the
pre-decompression batch.Batch.Length (the bytes received over the
Thrift connection, captured as originalSize in the existing decompress
branch). Picking pre-decompression keeps both paths reporting "bytes the
reader received from the wire" rather than mixing wire bytes on one path
with decompressed bytes on the other.

The tag name is reused as-is even though "downloaded" is technically a
stretch for inline data (which arrives via the Thrift connection rather
than a separate cloud download). The whole point of the issue is that
dashboards and queries filtering on this tag should work uniformly for
both readers, and renaming would defeat that — documented in a code
comment.

Test added (RED → GREEN)

csharp/test/E2E/CloseOperationE2ETest.cs — new
Dispose_InlineReader_EmitsBytesDownloaded_Issue485. It:

Sets Protocol=thrift, UseCloudFetch=false,
EnableDirectResults=false so the driver is forced through the
inline result path.
Runs SELECT * FROM range(1, 100) and reads every batch.
Disposes the reader and statement.
Asserts the captured DatabricksCompositeReader.Dispose span tags
contain reader.active_reader_type = "DatabricksReader" (sanity:
we exercised the inline path) and result.bytes_downloaded with
a strictly positive value.

The test class's ActivityListener was extended to snapshot
activity.TagObjects alongside the existing event capture so the
assertion can inspect span tags.

RED (before the fix)

Failed AdbcDrivers.Databricks.Tests.CloseOperationE2ETest.Dispose_InlineReader_EmitsBytesDownloaded_Issue485
Error: Expected 'result.bytes_downloaded' tag on DatabricksCompositeReader.Dispose
       for the inline result path (issue #485). ... Got tags: [reader.active_reader_type]

GREEN (after the fix)

Passed!  - Failed: 0, Passed: 1, Skipped: 0, Total: 1

Regression check

CloseOperationE2ETest: 4/4 pass (existing 3 theory cases +
the new test).
DriverTests: 35/35 pass.
TelemetryTagRegistryTests: 31/31 pass.

All against sf10/pecotesting, Thrift protocol.

Closes #485

This pull request and its description were written by Isaac.

…iling for #485) Adds Dispose_InlineReader_EmitsBytesDownloaded_Issue485 to CloseOperationE2ETest. The test forces the Thrift inline result path (UseCloudFetch=false, EnableDirectResults=false), reads a small range() result through DatabricksReader, disposes the reader, and asserts that the DatabricksCompositeReader.Dispose span carries a result.bytes_downloaded tag with a strictly positive value — matching the tag CloudFetchReader already emits on the same span. This commit is RED: against the unpatched driver the test fails with "Got tags: [reader.active_reader_type]", proving the inline path is missing the byte counter (issue #485). Refs #485 Co-authored-by: Isaac

…y with CloudFetch DatabricksCompositeReader.Dispose previously emitted result.bytes_downloaded only on the CloudFetch path. CloudFetchReader.Dispose calls Activity.Current?.SetTag(...) and Activity.Current at that point is the composite Dispose span (BaseDatabricksReader.Dispose opens no activity of its own), which is why the tag lands on the composite span. The inline DatabricksReader path emitted no such tag, leaving the composite Dispose span asymmetric: CloudFetch carried both reader.active_reader_type and result.bytes_downloaded, while inline carried only the reader-type tag. This patch: - Adds a _totalBytesConsumed counter to DatabricksReader, incremented in ProcessFetchedBatches with the wire-side TSparkArrowBatch.Batch.Length (i.e. pre-LZ4-decompression bytes received over Thrift). This matches CloudFetchReader's convention of counting downloaded chunk sizes (DownloadResult.Size, set after the chunk is downloaded but before decompression). - Emits the same Activity.Current?.SetTag(ResultBytesDownloaded, ...) call in DatabricksReader.Dispose so the inline path lands on the composite Dispose span identically to CloudFetch. The tag name "result.bytes_downloaded" is kept identical even though "downloaded" is technically a stretch for inline data (which arrives via the Thrift connection rather than a separate cloud download), because dashboards and queries filtering on this tag should work uniformly for both reader paths — that uniformity is the higher priority and is why issue #485 was filed. Closes #485 Co-authored-by: Isaac

eric-wang-1990 added 2 commits May 28, 2026 00:18

eric-wang-1990 marked this pull request as ready for review May 28, 2026 23:30

eric-wang-1990 requested review from jadewang-db and msrathore-db May 28, 2026 23:30

eric-wang-1990 changed the title ~~tracing(csharp): emit result.bytes_downloaded on inline Dispose~~ fix(csharp): emit result.bytes_downloaded on inline Dispose May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(csharp): emit result.bytes_downloaded on inline Dispose#493

fix(csharp): emit result.bytes_downloaded on inline Dispose#493
eric-wang-1990 wants to merge 2 commits into
mainfrom
tracing/485-inline-dispose-bytes-parity

eric-wang-1990 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eric-wang-1990 commented May 28, 2026

Motivation

What changed

Byte-counting convention

Test added (RED → GREEN)

RED (before the fix)

GREEN (after the fix)

Regression check

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant