fix(csharp): emit result.bytes_downloaded on inline Dispose#493
Open
eric-wang-1990 wants to merge 2 commits into
Open
fix(csharp): emit result.bytes_downloaded on inline Dispose#493eric-wang-1990 wants to merge 2 commits into
eric-wang-1990 wants to merge 2 commits into
Conversation
…iling for #485) Adds Dispose_InlineReader_EmitsBytesDownloaded_Issue485 to CloseOperationE2ETest. The test forces the Thrift inline result path (UseCloudFetch=false, EnableDirectResults=false), reads a small range() result through DatabricksReader, disposes the reader, and asserts that the DatabricksCompositeReader.Dispose span carries a result.bytes_downloaded tag with a strictly positive value — matching the tag CloudFetchReader already emits on the same span. This commit is RED: against the unpatched driver the test fails with "Got tags: [reader.active_reader_type]", proving the inline path is missing the byte counter (issue #485). Refs #485 Co-authored-by: Isaac
…y with CloudFetch DatabricksCompositeReader.Dispose previously emitted result.bytes_downloaded only on the CloudFetch path. CloudFetchReader.Dispose calls Activity.Current?.SetTag(...) and Activity.Current at that point is the composite Dispose span (BaseDatabricksReader.Dispose opens no activity of its own), which is why the tag lands on the composite span. The inline DatabricksReader path emitted no such tag, leaving the composite Dispose span asymmetric: CloudFetch carried both reader.active_reader_type and result.bytes_downloaded, while inline carried only the reader-type tag. This patch: - Adds a _totalBytesConsumed counter to DatabricksReader, incremented in ProcessFetchedBatches with the wire-side TSparkArrowBatch.Batch.Length (i.e. pre-LZ4-decompression bytes received over Thrift). This matches CloudFetchReader's convention of counting downloaded chunk sizes (DownloadResult.Size, set after the chunk is downloaded but before decompression). - Emits the same Activity.Current?.SetTag(ResultBytesDownloaded, ...) call in DatabricksReader.Dispose so the inline path lands on the composite Dispose span identically to CloudFetch. The tag name "result.bytes_downloaded" is kept identical even though "downloaded" is technically a stretch for inline data (which arrives via the Thrift connection rather than a separate cloud download), because dashboards and queries filtering on this tag should work uniformly for both reader paths — that uniformity is the higher priority and is why issue #485 was filed. Closes #485 Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
DatabricksCompositeReader.Disposeemitted an asymmetric span shape acrossresult delivery paths:
reader.active_reader_type = "CloudFetchReader"and
result.bytes_downloaded = <N>(a real byte counter).reader.active_reader_typeonly; no byte counter.The byte counter is exactly the kind of signal debug tooling and
dashboards filter on, and missing it on inline disposes means
operators see a blank field for half their workloads.
What changed
csharp/src/Reader/DatabricksReader.cs— added a_totalBytesConsumedfield that accumulates the wire-side
TSparkArrowBatch.Batch.Lengthfor every inline batch passed through
ProcessFetchedBatches. InDispose(bool)it emitsActivity.Current?.SetTag( StatementExecutionEvent.ResultBytesDownloaded, _totalBytesConsumed),identical to the call
CloudFetchReader.Disposealready makes.Because
BaseDatabricksReader.Disposeopens no activity of its own,Activity.Currentat the moment either reader'sDisposeruns isthe
DatabricksCompositeReader.Disposespan — i.e. the tag lands onexactly the span issue tracing(csharp): inline Dispose path missing result.bytes_downloaded (CloudFetch has it) #485 calls out as asymmetric, on both paths.
Byte-counting convention
CloudFetch counts
IDownloadResult.Size— the chunk's size asdelivered by the downloader, before any LZ4 decompression. To match
that semantic for inline data, this patch counts the
pre-decompression
batch.Batch.Length(the bytes received over theThrift connection, captured as
originalSizein the existing decompressbranch). Picking pre-decompression keeps both paths reporting "bytes the
reader received from the wire" rather than mixing wire bytes on one path
with decompressed bytes on the other.
The tag name is reused as-is even though "downloaded" is technically a
stretch for inline data (which arrives via the Thrift connection rather
than a separate cloud download). The whole point of the issue is that
dashboards and queries filtering on this tag should work uniformly for
both readers, and renaming would defeat that — documented in a code
comment.
Test added (RED → GREEN)
csharp/test/E2E/CloseOperationE2ETest.cs— newDispose_InlineReader_EmitsBytesDownloaded_Issue485. It:Protocol=thrift,UseCloudFetch=false,EnableDirectResults=falseso the driver is forced through theinline result path.
SELECT * FROM range(1, 100)and reads every batch.DatabricksCompositeReader.Disposespan tagscontain
reader.active_reader_type = "DatabricksReader"(sanity:we exercised the inline path) and
result.bytes_downloadedwitha strictly positive value.
The test class's
ActivityListenerwas extended to snapshotactivity.TagObjectsalongside the existing event capture so theassertion can inspect span tags.
RED (before the fix)
GREEN (after the fix)
Regression check
CloseOperationE2ETest: 4/4 pass (existing 3 theory cases +the new test).
DriverTests: 35/35 pass.TelemetryTagRegistryTests: 31/31 pass.All against
sf10/pecotesting, Thrift protocol.Closes #485
This pull request and its description were written by Isaac.