feat(csharp): apply adbc.spark.data_type_conv on SEA path (PECO-3060) [SEA]#469
Open
eric-wang-1990 wants to merge 1 commit into
Open
feat(csharp): apply adbc.spark.data_type_conv on SEA path (PECO-3060) [SEA]#469eric-wang-1990 wants to merge 1 commit into
eric-wang-1990 wants to merge 1 commit into
Conversation
… [SEA] Honor data_type_conv on the Statement Execution (REST/SEA) path so it matches the Thrift path's HiveServer2SchemaParser semantics: - scalar (default): DATE -> Date32, DECIMAL -> Decimal128, TIMESTAMP -> Timestamp, FLOAT -> Float (native Arrow types, unchanged from current behaviour). - none: DATE / DECIMAL / TIMESTAMP -> String; FLOAT -> Double (widened). StatementExecutionConnection now parses SparkParameters.DataTypeConv via DataTypeConversionParser (same precedence as SparkHttpConnection) and exposes the resulting DataTypeConversion to StatementExecutionStatement. The schema mapping in ArrowTypeParser.MapPrimitiveType branches on the flag for the four conversion-sensitive scalars; a new ScalarConversionStream wrapper, layered between IntervalSerializingStream and ComplexTypeSerializingStream when the mode is none, converts the native Date32/Timestamp/Decimal128/Float arrays into matching StringArray / DoubleArray so the schema and batch data agree. E2E coverage: ExecuteQuery_DataTypeConv_None_SerializesScalarTypesToStrings proves DATE/TIMESTAMP/DECIMAL columns surface as StringType under adbc.spark.data_type_conv=none; ExecuteQuery_DataTypeConv_Scalar_KeepsNativeTypes guards the default mode. PECO-3060
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What's Changed
PECO-3060 — M3 remaining parameter gap. The Statement Execution (SEA/REST) path
now honours `adbc.spark.data_type_conv`, matching the Thrift path's
`HiveServer2SchemaParser.GetArrowType` semantics:
FLOAT → Float — native Arrow types, identical to today's SEA behaviour.
conversion-sensitive scalars as strings (or widens to double) so SEA matches
Thrift output regardless of protocol.
`StatementExecutionConnection` now parses `SparkParameters.DataTypeConv` via
`DataTypeConversionParser` (same precedence as `SparkHttpConnection`) and
exposes the parsed `DataTypeConversion` enum to the statement.
`ArrowTypeParser.MapPrimitiveType` branches on the flag for the four affected
scalars; a new `ScalarConversionStream` wrapper — layered after
`IntervalSerializingStream` / `ComplexTypeSerializingStream` only when the mode is
`none` — converts the native Date32 / Timestamp / Decimal128 / Float arrays into
matching `StringArray` / `DoubleArray` so the manifest schema and batch data agree.
Why
PECO-3060 — Jun 10 cutoff per the M3 plan. Until now, the SEA path
unconditionally returned native typed columns. Users setting
`adbc.spark.data_type_conv=none` (or its alias `adbc.hive.data_type_conv=none`)
got Date32/Decimal128/Timestamp from SEA but String from Thrift — a protocol-visible
behaviour difference. This change makes the two paths behaviourally identical.
Red → Green proof
Before fix (`ExecuteQuery_DataTypeConv_None_SerializesScalarTypesToStrings`,
SEA + `data_type_conv=none`):
```
Assert.Equal() Failure: Values differ
Expected: String
Actual: Date32
```
After fix:
```
Passed AdbcDrivers.Databricks.Tests.E2E.StatementExecution.StatementExecutionDriverE2ETests.ExecuteQuery_DataTypeConv_None_SerializesScalarTypesToStrings [1 s]
Passed AdbcDrivers.Databricks.Tests.E2E.StatementExecution.StatementExecutionDriverE2ETests.ExecuteQuery_DataTypeConv_Scalar_KeepsNativeTypes [1 s]
```
Full `StatementExecutionDriverE2ETests` suite (13 tests) passes — no regressions in
adjacent SEA tests.
Files touched
when `none`. Same detection pattern (`Spark:DataType:SqlName` metadata) as the
existing Interval/ComplexType wrappers.
exposes `DataTypeConversion`.
through the manifest schema mapping and the reader pipeline.
`SkippableFact` red→green coverage for both `none` and `scalar` modes.
Manual verification