[GLUTEN-12008][VL] Align Expand projection types with output#12009

Open

jianzhenwu wants to merge 2 commits intoapache:mainfrom

jianzhenwu:GLUTEN-12008-align-expand-output-types

jianzhenwu commented Apr 29, 2026 •

edited by github-actions Bot

Loading

What changes are proposed in this pull request?

This PR fixes Expand projection/output type alignment for Velox.

The main changes are:

Align ExpandExec projection expressions with the corresponding ExpandExec.output attribute type before native conversion.
Push non-literal type alignment casts into PullOutPreProject, so native ExpandRel still only contains fields or literals, which is required by Velox.
Align null literals inside ExpandExecTransformer directly to the output type.
Add validation-time diagnostics for remaining projection/output type mismatches and return ValidationResult.failed(...) instead of generating an invalid native Expand plan.
Keep a hard guard in doTransform to prevent inconsistent Expand projections from reaching Velox.
Add Velox regression tests for decimal expressions with multiple distinct aggregates.

How was this patch tested?

Passed:

git diff --check

Passed:

mvn -Pbackends-velox -pl backends-velox -am \
  -DskipTests -DskipUTs \
  -Dscalastyle.skip=true \
  -Dcheckstyle.skip=true \
  -Dspotless.check.skip=true \
  -Dlicense.skip=true \
  clean test-compile

Added regression coverage in VeloxExpandSuite for:

round(avg(decimal)) with multiple distinct aggregates
decimal CASE WHEN sum with multiple distinct aggregates

Was this patch authored or co-authored using generative AI tooling?

Yes, this patch was co-authored using generative AI tooling.

Related issue: #12008

github-actions Bot added CORE VELOX labels

github-actions Bot commented Apr 29, 2026

Run Gluten Clickhouse CI on x86

jianzhenwu force-pushed the GLUTEN-12008-align-expand-output-types branch from 8e6fd05 to ff3bfaf Compare

April 29, 2026 08:44

github-actions Bot commented Apr 29, 2026

Run Gluten Clickhouse CI on x86

jianzhenwu force-pushed the GLUTEN-12008-align-expand-output-types branch from ff3bfaf to b600431 Compare

April 29, 2026 09:19

github-actions Bot commented Apr 29, 2026

Run Gluten Clickhouse CI on x86


          [GLUTEN-12008][VL] Align Expand projection types with output

b1de6d1

jianzhenwu force-pushed the GLUTEN-12008-align-expand-output-types branch from b600431 to b1de6d1 Compare

April 29, 2026 10:24

github-actions Bot commented Apr 29, 2026

Run Gluten Clickhouse CI on x86


          [GLUTEN-12008][VL] Cast Expand decimal pre-project output

4320cc9

github-actions Bot commented Apr 30, 2026

Run Gluten Clickhouse CI on x86

Contributor

jinchengchenghh commented Apr 30, 2026

Why this problem occurs? Does the native side produces unexpected result?

Author

jianzhenwu commented May 2, 2026 •

edited

Loading

Why this problem occurs? Does the native side produces unexpected result?

@jinchengchenghh Thank you for your reply. Sorry for the late reply. I think this is a bug in Gluten. Here is the SQL that reproduces the exception. issue 12008

jinchengchenghh requested a review from JkSelf

May 5, 2026 11:07

Author

jianzhenwu commented May 7, 2026

hi @JkSelf pls help review.

JkSelf reviewed

View reviewed changes

backends-velox/src/test/scala/org/apache/spark/sql/execution/VeloxExpandSuite.scala

                   }
                 }
+                test("Expand with round(avg(decimal)) and multiple distinct aggregates") {

Contributor

JkSelf May 7, 2026

@jianzhenwu Thanks for your fixing.

I tried reproducing the issue in my local environment using the SQL you provided (#12008 (comment) and the two tests here) and the latest main branch(82644d3) with Spark 3.5, but I was unable to reproduce it.

In Spark, the projection expressions are passed directly from ExpandExec and should already be aligned with the output schema https://github.com/apache/spark/blob/c26a127ba33137f36d55bf95cac71471e2a1704f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L1398-L1407. Could you provide more details on your environment or help investigate why this occurs on your side?

Thanks for your help!

Author

jianzhenwu May 8, 2026

I encountered this problem using Spark 3.2. I believe it's also possible to reproduce the problem using Spark 3.3. I've tried using AI to explain the issue.

Spark 3.3 can reproduce the issue because its physical ExpandExec contains this decimal expression shape:

CAST((case_dd_decimal26 + case_ccb_decimal26) AS DECIMAL(27,10)) + case_fsv_decimal27

Spark declares the Expand output column as:

DECIMAL(27,10)

The null rows in the same Expand column are also:

CAST(NULL AS DECIMAL(27,10))

But when Velox compiles the non-null decimal arithmetic row, it infers:

DECIMAL(28,10)

So native ExpandNode sees mixed types in the same output column:

row 0: DECIMAL(28,10)
row 1: DECIMAL(27,10)

Then Velox fails with:

The projections type does not match across different rows in the same column.
Got: DECIMAL(27, 10), DECIMAL(28, 10)

Spark 3.5 does not reproduce it because the generated ExpandExec expression is different:

(case_dd_decimal25 + case_ccb_decimal25) + case_fsv_decimal25

It does not insert the intermediate:

CAST(... AS DECIMAL(27,10))

that Spark 3.3 has. With this Spark 3.5 plan shape, Velox’s inferred type stays compatible with Spark’s Expand output type, so all projection rows in the Expand column remain consistent.

So the difference is not the SQL result type. Both Spark versions declare the Expand output as DECIMAL(27,10). The difference is the internal decimal expression tree Spark generates before Gluten/Velox conversion. Spark 3.3’s tree causes Velox to widen one projection row to DECIMAL(28,10); Spark 3.5’s tree does not.

Author

jianzhenwu May 11, 2026

hi @JkSelf Do you think this fix is correct to address the issue in the Spark32 scenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels