Skip to content

fix(csharp): emit result.bytes_downloaded on inline Dispose#493

Open
eric-wang-1990 wants to merge 2 commits into
mainfrom
tracing/485-inline-dispose-bytes-parity
Open

fix(csharp): emit result.bytes_downloaded on inline Dispose#493
eric-wang-1990 wants to merge 2 commits into
mainfrom
tracing/485-inline-dispose-bytes-parity

Conversation

@eric-wang-1990
Copy link
Copy Markdown
Collaborator

Motivation

DatabricksCompositeReader.Dispose emitted an asymmetric span shape across
result delivery paths:

  • CloudFetch pathreader.active_reader_type = "CloudFetchReader"
    and result.bytes_downloaded = <N> (a real byte counter).
  • Inline pathreader.active_reader_type only; no byte counter.

The byte counter is exactly the kind of signal debug tooling and
dashboards filter on, and missing it on inline disposes means
operators see a blank field for half their workloads.

What changed

  • csharp/src/Reader/DatabricksReader.cs — added a _totalBytesConsumed
    field that accumulates the wire-side TSparkArrowBatch.Batch.Length
    for every inline batch passed through ProcessFetchedBatches. In
    Dispose(bool) it emits Activity.Current?.SetTag( StatementExecutionEvent.ResultBytesDownloaded, _totalBytesConsumed),
    identical to the call CloudFetchReader.Dispose already makes.

    Because BaseDatabricksReader.Dispose opens no activity of its own,
    Activity.Current at the moment either reader's Dispose runs is
    the DatabricksCompositeReader.Dispose span
    — i.e. the tag lands on
    exactly the span issue tracing(csharp): inline Dispose path missing result.bytes_downloaded (CloudFetch has it) #485 calls out as asymmetric, on both paths.

Byte-counting convention

CloudFetch counts IDownloadResult.Size — the chunk's size as
delivered by the downloader, before any LZ4 decompression. To match
that semantic for inline data, this patch counts the
pre-decompression batch.Batch.Length (the bytes received over the
Thrift connection, captured as originalSize in the existing decompress
branch). Picking pre-decompression keeps both paths reporting "bytes the
reader received from the wire" rather than mixing wire bytes on one path
with decompressed bytes on the other.

The tag name is reused as-is even though "downloaded" is technically a
stretch for inline data (which arrives via the Thrift connection rather
than a separate cloud download). The whole point of the issue is that
dashboards and queries filtering on this tag should work uniformly for
both readers, and renaming would defeat that — documented in a code
comment.

Test added (RED → GREEN)

csharp/test/E2E/CloseOperationE2ETest.cs — new
Dispose_InlineReader_EmitsBytesDownloaded_Issue485. It:

  1. Sets Protocol=thrift, UseCloudFetch=false,
    EnableDirectResults=false so the driver is forced through the
    inline result path.
  2. Runs SELECT * FROM range(1, 100) and reads every batch.
  3. Disposes the reader and statement.
  4. Asserts the captured DatabricksCompositeReader.Dispose span tags
    contain reader.active_reader_type = "DatabricksReader" (sanity:
    we exercised the inline path) and result.bytes_downloaded with
    a strictly positive value.

The test class's ActivityListener was extended to snapshot
activity.TagObjects alongside the existing event capture so the
assertion can inspect span tags.

RED (before the fix)

Failed AdbcDrivers.Databricks.Tests.CloseOperationE2ETest.Dispose_InlineReader_EmitsBytesDownloaded_Issue485
Error: Expected 'result.bytes_downloaded' tag on DatabricksCompositeReader.Dispose
       for the inline result path (issue #485). ... Got tags: [reader.active_reader_type]

GREEN (after the fix)

Passed!  - Failed: 0, Passed: 1, Skipped: 0, Total: 1

Regression check

  • CloseOperationE2ETest: 4/4 pass (existing 3 theory cases +
    the new test).
  • DriverTests: 35/35 pass.
  • TelemetryTagRegistryTests: 31/31 pass.

All against sf10/pecotesting, Thrift protocol.

Closes #485

This pull request and its description were written by Isaac.

…iling for #485)

Adds Dispose_InlineReader_EmitsBytesDownloaded_Issue485 to CloseOperationE2ETest.
The test forces the Thrift inline result path (UseCloudFetch=false,
EnableDirectResults=false), reads a small range() result through
DatabricksReader, disposes the reader, and asserts that the
DatabricksCompositeReader.Dispose span carries a result.bytes_downloaded
tag with a strictly positive value — matching the tag CloudFetchReader
already emits on the same span.

This commit is RED: against the unpatched driver the test fails with
"Got tags: [reader.active_reader_type]", proving the inline path is
missing the byte counter (issue #485).

Refs #485

Co-authored-by: Isaac
…y with CloudFetch

DatabricksCompositeReader.Dispose previously emitted result.bytes_downloaded
only on the CloudFetch path. CloudFetchReader.Dispose calls
Activity.Current?.SetTag(...) and Activity.Current at that point is the
composite Dispose span (BaseDatabricksReader.Dispose opens no activity of
its own), which is why the tag lands on the composite span. The inline
DatabricksReader path emitted no such tag, leaving the composite Dispose
span asymmetric: CloudFetch carried both reader.active_reader_type and
result.bytes_downloaded, while inline carried only the reader-type tag.

This patch:

- Adds a _totalBytesConsumed counter to DatabricksReader, incremented in
  ProcessFetchedBatches with the wire-side TSparkArrowBatch.Batch.Length
  (i.e. pre-LZ4-decompression bytes received over Thrift). This matches
  CloudFetchReader's convention of counting downloaded chunk sizes
  (DownloadResult.Size, set after the chunk is downloaded but before
  decompression).
- Emits the same Activity.Current?.SetTag(ResultBytesDownloaded, ...) call
  in DatabricksReader.Dispose so the inline path lands on the composite
  Dispose span identically to CloudFetch.

The tag name "result.bytes_downloaded" is kept identical even though
"downloaded" is technically a stretch for inline data (which arrives via
the Thrift connection rather than a separate cloud download), because
dashboards and queries filtering on this tag should work uniformly for
both reader paths — that uniformity is the higher priority and is why
issue #485 was filed.

Closes #485

Co-authored-by: Isaac
@eric-wang-1990 eric-wang-1990 marked this pull request as ready for review May 28, 2026 23:30
@eric-wang-1990 eric-wang-1990 changed the title tracing(csharp): emit result.bytes_downloaded on inline Dispose fix(csharp): emit result.bytes_downloaded on inline Dispose May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tracing(csharp): inline Dispose path missing result.bytes_downloaded (CloudFetch has it)

1 participant