Skip to content

[GLUTEN-12008][VL] Align Expand projection types with output#12009

Open
jianzhenwu wants to merge 2 commits intoapache:mainfrom
jianzhenwu:GLUTEN-12008-align-expand-output-types
Open

[GLUTEN-12008][VL] Align Expand projection types with output#12009
jianzhenwu wants to merge 2 commits intoapache:mainfrom
jianzhenwu:GLUTEN-12008-align-expand-output-types

Conversation

@jianzhenwu
Copy link
Copy Markdown

@jianzhenwu jianzhenwu commented Apr 29, 2026

What changes are proposed in this pull request?

This PR fixes Expand projection/output type alignment for Velox.

The main changes are:

  • Align ExpandExec projection expressions with the corresponding ExpandExec.output attribute type before native conversion.
  • Push non-literal type alignment casts into PullOutPreProject, so native ExpandRel still only contains fields or literals, which is required by Velox.
  • Align null literals inside ExpandExecTransformer directly to the output type.
  • Add validation-time diagnostics for remaining projection/output type mismatches and return ValidationResult.failed(...) instead of generating an invalid native Expand plan.
  • Keep a hard guard in doTransform to prevent inconsistent Expand projections from reaching Velox.
  • Add Velox regression tests for decimal expressions with multiple distinct aggregates.

How was this patch tested?

  • Passed:
git diff --check

Passed:

mvn -Pbackends-velox -pl backends-velox -am \
  -DskipTests -DskipUTs \
  -Dscalastyle.skip=true \
  -Dcheckstyle.skip=true \
  -Dspotless.check.skip=true \
  -Dlicense.skip=true \
  clean test-compile

Added regression coverage in VeloxExpandSuite for:

  • round(avg(decimal)) with multiple distinct aggregates
  • decimal CASE WHEN sum with multiple distinct aggregates

Was this patch authored or co-authored using generative AI tooling?

Yes, this patch was co-authored using generative AI tooling.

Related issue: #12008

@github-actions github-actions Bot added CORE works for Gluten Core VELOX labels Apr 29, 2026
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jianzhenwu jianzhenwu force-pushed the GLUTEN-12008-align-expand-output-types branch from 8e6fd05 to ff3bfaf Compare April 29, 2026 08:44
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jianzhenwu jianzhenwu force-pushed the GLUTEN-12008-align-expand-output-types branch from ff3bfaf to b600431 Compare April 29, 2026 09:19
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jianzhenwu jianzhenwu force-pushed the GLUTEN-12008-align-expand-output-types branch from b600431 to b1de6d1 Compare April 29, 2026 10:24
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jinchengchenghh
Copy link
Copy Markdown
Contributor

Why this problem occurs? Does the native side produces unexpected result?

@jianzhenwu
Copy link
Copy Markdown
Author

jianzhenwu commented May 2, 2026

Why this problem occurs? Does the native side produces unexpected result?

@jinchengchenghh Thank you for your reply. Sorry for the late reply. I think this is a bug in Gluten. Here is the SQL that reproduces the exception. issue 12008

@jinchengchenghh jinchengchenghh requested a review from JkSelf May 5, 2026 11:07
@jianzhenwu
Copy link
Copy Markdown
Author

hi @JkSelf pls help review.

}
}

test("Expand with round(avg(decimal)) and multiple distinct aggregates") {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jianzhenwu Thanks for your fixing.

I tried reproducing the issue in my local environment using the SQL you provided (#12008 (comment) and the two tests here) and the latest main branch(82644d3) with Spark 3.5, but I was unable to reproduce it.

In Spark, the projection expressions are passed directly from ExpandExec and should already be aligned with the output schema https://github.com/apache/spark/blob/c26a127ba33137f36d55bf95cac71471e2a1704f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala#L1398-L1407. Could you provide more details on your environment or help investigate why this occurs on your side?

Thanks for your help!

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encountered this problem using Spark 3.2. I believe it's also possible to reproduce the problem using Spark 3.3. I've tried using AI to explain the issue.

Spark 3.3 can reproduce the issue because its physical ExpandExec contains this decimal expression shape:

CAST((case_dd_decimal26 + case_ccb_decimal26) AS DECIMAL(27,10)) + case_fsv_decimal27

Spark declares the Expand output column as:

DECIMAL(27,10)

The null rows in the same Expand column are also:

CAST(NULL AS DECIMAL(27,10))

But when Velox compiles the non-null decimal arithmetic row, it infers:

DECIMAL(28,10)

So native ExpandNode sees mixed types in the same output column:

row 0: DECIMAL(28,10)
row 1: DECIMAL(27,10)

Then Velox fails with:

The projections type does not match across different rows in the same column.
Got: DECIMAL(27, 10), DECIMAL(28, 10)

Spark 3.5 does not reproduce it because the generated ExpandExec expression is different:

(case_dd_decimal25 + case_ccb_decimal25) + case_fsv_decimal25

It does not insert the intermediate:

CAST(... AS DECIMAL(27,10))

that Spark 3.3 has. With this Spark 3.5 plan shape, Velox’s inferred type stays compatible with Spark’s Expand output type, so all projection rows in the Expand column remain consistent.

So the difference is not the SQL result type. Both Spark versions declare the Expand output as DECIMAL(27,10). The difference is the internal decimal expression tree Spark generates before Gluten/Velox conversion. Spark 3.3’s tree causes Velox to widen one projection row to DECIMAL(28,10); Spark 3.5’s tree does not.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @JkSelf Do you think this fix is ​​correct to address the issue in the Spark32 scenario?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants