Skip to content

tracing(csharp): ReadNextRecordBatchAsync span explosion drives 95% of TraceId cardinality #476

@eric-wang-1990

Description

@eric-wang-1990

Severity: High

Observation

Under load, ReadNextRecordBatchAsync dominates trace volume and TraceId cardinality. In a sweep of all 33 E2E test classes with OTEL_TRACES_EXPORTER=adbcfile:

  • MemoryStressTests (3 tests against 5M-row queries): 24,632 ReadNextRecordBatchAsync spans / 18.5MB / 52.5% of bytes. Plus 12,033 DatabricksReader.ProcessFetchedBatches spans / 15.5MB / 44.1%. Together = 96.6% of bytes for an extract workload.
  • 12,316 of 12,958 distinct TraceIds (95%) are single-span root ReadNextRecordBatchAsync traces — i.e. each "pull next batch" call starts a fresh top-level trace.
  • Reproducible across every CloudFetch / extract test: CloudFetchE2ETest 1,580 ReadNext spans (790 roots); CloudFetchStressTests 2,692 / 1,346 roots.

This is a real APM ingest-cost issue (volume) AND a backend cardinality issue (throwaway TraceIds clutter trace-list views).

Workaround that exists today

The events on DatabricksReader.ProcessFetchedBatches (decompress_start, decompress_completed, deserialize_batch) DO carry per-batch metrics. So users can filter by event name and get per-batch detail without the span volume.

Suggested fix

Pick one (or combine):

  1. Aggregate: emit one span per N batches (e.g. N=100) with summed metrics instead of one per batch.
  2. Demote ReadNextRecordBatchAsync to an event on the parent statement span, so it doesn't create its own activity.
  3. At minimum, chain ReadNextRecordBatchAsync to the parent statement TraceId so the 12K roots collapse to 1 root per statement. (Partial overlap with the "no driver-session root" issue.)

Evidence

  • Traces.MemoryStressTests.20260527_154049/ (35MB / 37,950 spans / 12,958 TraceIds — ReadNext + ProcessFetchedBatches = 96.6% of bytes)
  • Traces.CloudFetchE2ETest.20260527_144809/ (1,580 ReadNext spans, 790 as roots)
  • Traces.CloudFetchStressTests.20260527_154205/ (2,692 ReadNext, 1,346 roots)

Found during a tracer-output bugbash of the file exporter; full evaluation report available on request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions