[GLUTEN-12008][VL] Align Expand projection types with output#12009
[GLUTEN-12008][VL] Align Expand projection types with output#12009jianzhenwu wants to merge 2 commits intoapache:mainfrom
Conversation
|
Run Gluten Clickhouse CI on x86 |
8e6fd05 to
ff3bfaf
Compare
|
Run Gluten Clickhouse CI on x86 |
ff3bfaf to
b600431
Compare
|
Run Gluten Clickhouse CI on x86 |
b600431 to
b1de6d1
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Why this problem occurs? Does the native side produces unexpected result? |
@jinchengchenghh Thank you for your reply. Sorry for the late reply. I think this is a bug in Gluten. Here is the SQL that reproduces the exception. issue 12008 |
|
hi @JkSelf pls help review. |
| } | ||
| } | ||
|
|
||
| test("Expand with round(avg(decimal)) and multiple distinct aggregates") { |
There was a problem hiding this comment.
@jianzhenwu Thanks for your fixing.
I tried reproducing the issue in my local environment using the SQL you provided (#12008 (comment) and the two tests here) and the latest main branch(82644d3) with Spark 3.5, but I was unable to reproduce it.
In Spark, the projection expressions are passed directly from ExpandExec and should already be aligned with the output schema https://github.com/apache/spark/blob/c26a127ba33137f36d55bf95cac71471e2a1704f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L1398-L1407. Could you provide more details on your environment or help investigate why this occurs on your side?
Thanks for your help!
There was a problem hiding this comment.
I encountered this problem using Spark 3.2. I believe it's also possible to reproduce the problem using Spark 3.3. I've tried using AI to explain the issue.
Spark 3.3 can reproduce the issue because its physical ExpandExec contains this decimal expression shape:
CAST((case_dd_decimal26 + case_ccb_decimal26) AS DECIMAL(27,10)) + case_fsv_decimal27
Spark declares the Expand output column as:
DECIMAL(27,10)
The null rows in the same Expand column are also:
CAST(NULL AS DECIMAL(27,10))
But when Velox compiles the non-null decimal arithmetic row, it infers:
DECIMAL(28,10)
So native ExpandNode sees mixed types in the same output column:
row 0: DECIMAL(28,10)
row 1: DECIMAL(27,10)
Then Velox fails with:
The projections type does not match across different rows in the same column.
Got: DECIMAL(27, 10), DECIMAL(28, 10)
Spark 3.5 does not reproduce it because the generated ExpandExec expression is different:
(case_dd_decimal25 + case_ccb_decimal25) + case_fsv_decimal25
It does not insert the intermediate:
CAST(... AS DECIMAL(27,10))
that Spark 3.3 has. With this Spark 3.5 plan shape, Velox’s inferred type stays compatible with Spark’s Expand output type, so all projection rows in the Expand column remain consistent.
So the difference is not the SQL result type. Both Spark versions declare the Expand output as DECIMAL(27,10). The difference is the internal decimal expression tree Spark generates before Gluten/Velox conversion. Spark 3.3’s tree causes Velox to widen one projection row to DECIMAL(28,10); Spark 3.5’s tree does not.
There was a problem hiding this comment.
hi @JkSelf Do you think this fix is correct to address the issue in the Spark32 scenario?
What changes are proposed in this pull request?
This PR fixes Expand projection/output type alignment for Velox.
The main changes are:
ExpandExecprojection expressions with the correspondingExpandExec.outputattribute type before native conversion.PullOutPreProject, so nativeExpandRelstill only contains fields or literals, which is required by Velox.ExpandExecTransformerdirectly to the output type.ValidationResult.failed(...)instead of generating an invalid native Expand plan.doTransformto prevent inconsistent Expand projections from reaching Velox.How was this patch tested?
Passed:
Added regression coverage in VeloxExpandSuite for:
Was this patch authored or co-authored using generative AI tooling?
Yes, this patch was co-authored using generative AI tooling.
Related issue: #12008