feat(csharp): apply adbc.spark.data_type_conv on SEA path (PECO-3060) [SEA] by eric-wang-1990 · Pull Request #469 · adbc-drivers/databricks

eric-wang-1990 · 2026-05-20T09:43:40Z

What's Changed

PECO-3060 — M3 remaining parameter gap. The Statement Execution (SEA/REST) path
now honours `adbc.spark.data_type_conv`, matching the Thrift path's
`HiveServer2SchemaParser.GetArrowType` semantics:

`scalar` (default): DATE → Date32, DECIMAL → Decimal128, TIMESTAMP → Timestamp,
FLOAT → Float — native Arrow types, identical to today's SEA behaviour.
`none`: DATE / DECIMAL / TIMESTAMP → String; FLOAT → Double — surfaces the
conversion-sensitive scalars as strings (or widens to double) so SEA matches
Thrift output regardless of protocol.

`StatementExecutionConnection` now parses `SparkParameters.DataTypeConv` via
`DataTypeConversionParser` (same precedence as `SparkHttpConnection`) and
exposes the parsed `DataTypeConversion` enum to the statement.
`ArrowTypeParser.MapPrimitiveType` branches on the flag for the four affected
scalars; a new `ScalarConversionStream` wrapper — layered after
`IntervalSerializingStream` / `ComplexTypeSerializingStream` only when the mode is
`none` — converts the native Date32 / Timestamp / Decimal128 / Float arrays into
matching `StringArray` / `DoubleArray` so the manifest schema and batch data agree.

Why

PECO-3060 — Jun 10 cutoff per the M3 plan. Until now, the SEA path
unconditionally returned native typed columns. Users setting
`adbc.spark.data_type_conv=none` (or its alias `adbc.hive.data_type_conv=none`)
got Date32/Decimal128/Timestamp from SEA but String from Thrift — a protocol-visible
behaviour difference. This change makes the two paths behaviourally identical.

Red → Green proof

Before fix (`ExecuteQuery_DataTypeConv_None_SerializesScalarTypesToStrings`,
SEA + `data_type_conv=none`):

```
Assert.Equal() Failure: Values differ
Expected: String
Actual: Date32
```

After fix:

```
Passed AdbcDrivers.Databricks.Tests.E2E.StatementExecution.StatementExecutionDriverE2ETests.ExecuteQuery_DataTypeConv_None_SerializesScalarTypesToStrings [1 s]
Passed AdbcDrivers.Databricks.Tests.E2E.StatementExecution.StatementExecutionDriverE2ETests.ExecuteQuery_DataTypeConv_Scalar_KeepsNativeTypes [1 s]
```

Full `StatementExecutionDriverE2ETests` suite (13 tests) passes — no regressions in
adjacent SEA tests.

Files touched

`csharp/src/ArrowTypeParser.cs` — primitive type mapping consults the new flag.
`csharp/src/ScalarConversionStream.cs` — new wrapper that converts native arrays
when `none`. Same detection pattern (`Spark:DataType:SqlName` metadata) as the
existing Interval/ComplexType wrappers.
`csharp/src/StatementExecution/StatementExecutionConnection.cs` — parses and
exposes `DataTypeConversion`.
`csharp/src/StatementExecution/StatementExecutionStatement.cs` — threads the flag
through the manifest schema mapping and the reader pipeline.
`csharp/test/E2E/StatementExecution/StatementExecutionDriverE2ETests.cs` —
`SkippableFact` red→green coverage for both `none` and `scalar` modes.

Manual verification

`dotnet build` green on `netstandard2.0` and `net8.0`.
New E2E tests pass against `pecotesting` warehouse.
`StatementExecutionDriverE2ETests` class passes (13/13).
Default behaviour (`scalar`) unchanged — existing SEA tests untouched.

… [SEA] Honor data_type_conv on the Statement Execution (REST/SEA) path so it matches the Thrift path's HiveServer2SchemaParser semantics: - scalar (default): DATE -> Date32, DECIMAL -> Decimal128, TIMESTAMP -> Timestamp, FLOAT -> Float (native Arrow types, unchanged from current behaviour). - none: DATE / DECIMAL / TIMESTAMP -> String; FLOAT -> Double (widened). StatementExecutionConnection now parses SparkParameters.DataTypeConv via DataTypeConversionParser (same precedence as SparkHttpConnection) and exposes the resulting DataTypeConversion to StatementExecutionStatement. The schema mapping in ArrowTypeParser.MapPrimitiveType branches on the flag for the four conversion-sensitive scalars; a new ScalarConversionStream wrapper, layered between IntervalSerializingStream and ComplexTypeSerializingStream when the mode is none, converts the native Date32/Timestamp/Decimal128/Float arrays into matching StringArray / DoubleArray so the schema and batch data agree. E2E coverage: ExecuteQuery_DataTypeConv_None_SerializesScalarTypesToStrings proves DATE/TIMESTAMP/DECIMAL columns surface as StringType under adbc.spark.data_type_conv=none; ExecuteQuery_DataTypeConv_Scalar_KeepsNativeTypes guards the default mode. PECO-3060

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(csharp): apply adbc.spark.data_type_conv on SEA path (PECO-3060) [SEA]#469

feat(csharp): apply adbc.spark.data_type_conv on SEA path (PECO-3060) [SEA]#469
eric-wang-1990 wants to merge 1 commit into
mainfrom
feat/csharp/PECO-3060-sea-data-type-conv

eric-wang-1990 commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

eric-wang-1990 commented May 20, 2026

What's Changed

Why

Red → Green proof

Files touched

Manual verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant