Skip to content

[GLUTEN-10992][VL] Fix MatchError for KeyGroupedPartitioning in native shuffle#12335

Open
brijrajk wants to merge 1 commit into
apache:mainfrom
brijrajk:fix/10992-keygrouped-partitioning-fallback
Open

[GLUTEN-10992][VL] Fix MatchError for KeyGroupedPartitioning in native shuffle#12335
brijrajk wants to merge 1 commit into
apache:mainfrom
brijrajk:fix/10992-keygrouped-partitioning-fallback

Conversation

@brijrajk

@brijrajk brijrajk commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

When Spark 4.0's V2 bucketing shuffle (spark.sql.v2.bucketing.shuffle.enabled=true) is used in a join where only one side reports partitioning, Spark generates a ShuffleExchangeExec with KeyGroupedPartitioning as its output partitioning.

The default case _ => in VeloxSparkPlanExecApi.genColumnarShuffleExchange created a ColumnarShuffleExchangeExec for this node without validation. When the query executed, ExecUtil.genShuffleDependency crashed with a scala.MatchError because KeyGroupedPartitioning was missing from its exhaustive match.

Changes:

  • VeloxSparkPlanExecApi.genColumnarShuffleExchange: add an explicit case _: KeyGroupedPartitioning => before the default that adds a fallback tag and returns the vanilla ShuffleExchangeExec. This prevents a ColumnarShuffleExchangeExec from being created for an unsupported partitioning type.
  • ExecUtil.genShuffleDependency: add an explicit wildcard case other => that throws GlutenNotSupportException instead of the cryptic scala.MatchError, as a defensive guard for any future unknown partitioning types.

How was this patch tested?

The existing testGluten("SPARK-41471: shuffle one side: only one side reports partitioning") tests in GlutenKeyGroupedPartitioningSuite (both spark40 and spark41) reproduce the crash exactly — they set V2_BUCKETING_SHUFFLE_ENABLED=true with only one bucketed side, which triggers a ShuffleExchangeExec with KeyGroupedPartitioning output and then call checkAnswer. After this fix these tests pass without MatchError.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (https://claude.ai/code)

Related issue: #10992

…e shuffle

When Spark 4.0's V2 bucketing shuffle (spark.sql.v2.bucketing.shuffle.enabled=true)
is used in a join where only one side reports partitioning, Spark generates a
ShuffleExchangeExec with KeyGroupedPartitioning as its output. The default
case in VeloxSparkPlanExecApi.genColumnarShuffleExchange created a
ColumnarShuffleExchangeExec for this node, which then crashed with a
scala.MatchError in ExecUtil.genShuffleDependency because KeyGroupedPartitioning
was not handled in the native partitioning match.

Fix by adding an explicit KeyGroupedPartitioning case to genColumnarShuffleExchange
that marks the shuffle for fallback to vanilla Spark. Also harden
ExecUtil.genShuffleDependency with an explicit wildcard that throws
GlutenNotSupportException instead of a cryptic MatchError for any future
unknown partitioning types.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions github-actions Bot added the VELOX label Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant