Skip to content

[MINOR] Upgrade Spark 4.0 to 4.0.2#12180

Merged
yaooqinn merged 2 commits into
apache:mainfrom
yaooqinn:users/kentyao/spark-4.0.2
Jun 4, 2026
Merged

[MINOR] Upgrade Spark 4.0 to 4.0.2#12180
yaooqinn merged 2 commits into
apache:mainfrom
yaooqinn:users/kentyao/spark-4.0.2

Conversation

@yaooqinn
Copy link
Copy Markdown
Member

@yaooqinn yaooqinn commented May 29, 2026

What changes were proposed in this pull request?

Bumps the spark-4.0 profile from 4.0.1 to 4.0.2 (patch release).

Sibling of #12177 (Spark 4.1 → 4.1.2), kept as a separate PR per one-concern-per-PR.

Files touched (6 / +7 −7)

  • pom.xml / tools/gluten-it/pom.xmlspark.version
  • .github/workflows/util/install-spark-resources.sh — download version
  • .github/workflows/velox_backend_x86.yml — step names
  • docs/get-started/{build-guide,getting-started}.md — supported versions

delta.version (also 4.0.1, but for Delta Lake) is intentionally not touched.

Why no shim code change

4.0.2 is a maintenance release. Unlike 4.1.2 (#12177), it does not revert any binary signatures the spark40 shim depends on — SPARK-55337 (MemoryStream binary-compat reversion) only lives on the 4.1.x branch.

Upstream fixes worth watching CI for

JIRA Why it might matter to Gluten
SPARK-54439 SPJ KeyGroupedPartitioning + join key size mismatch — may shift plan-stability files
SPARK-53434 ColumnarRow#get should check isNullAt — touches columnar row path
SPARK-54917 ORC bumped to 2.1.4 — Velox uses its own ORC reader; only affects Spark vectorized fallback
SPARK-54753 persist/unpersist memory leak — observability for cache users

No Gluten-side code change is needed for any of the above.

How was this patch tested?

Relying on the existing Spark 4.0 CI matrix (gluten-ut/spark40 + velox_backend_x86 4.0 jobs).

Was this patch authored or co-authored using generative AI tooling?

Yes

Generated-by: claude-opus-4.7

Bumps the spark-4.0 profile from 4.0.1 to 4.0.2 (patch release).

Changes:
- pom.xml / tools/gluten-it/pom.xml: spark.version
- .github/workflows/util/install-spark-resources.sh: download version
- .github/workflows/velox_backend_x86.yml: step names
- docs/get-started/{build-guide,getting-started}.md: supported versions

No shim code changes are required: 4.0.2 is a maintenance release with
no public API changes, and unlike 4.1.2 (SPARK-55337) it does not revert
any binary signatures that the spark40 shim depends on.

Notable upstream fixes that may affect Gluten behaviour (no Gluten code
change needed, but worth watching CI for plan-stability / metrics diffs):
- SPARK-54439 SPJ KeyGroupedPartitioning + join key size mismatch
- SPARK-53434 ColumnarRow#get should check isNullAt
- SPARK-54917 Upgrade ORC to 2.1.4

Generated-by: claude-opus-4.7
@github-actions github-actions Bot added CORE works for Gluten Core INFRA TOOLS DOCS labels May 29, 2026
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zhouyuan
Copy link
Copy Markdown
Member

zhouyuan commented Jun 2, 2026

@yaooqinn the KeyGroupedPartitioningSuite is re-written in Gluten tests:
https://github.com/apache/gluten/blob/main/gluten-ut/spark40/src/test/scala/org/apache/spark/sql/connector/GlutenKeyGroupedPartitioningSuite.scala#L1039

@yaooqinn yaooqinn force-pushed the users/kentyao/spark-4.0.2 branch from ebca2f5 to a51b5cb Compare June 3, 2026 04:35
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

Run Gluten Clickhouse CI on x86

@yaooqinn yaooqinn force-pushed the users/kentyao/spark-4.0.2 branch from a51b5cb to ebca2f5 Compare June 3, 2026 05:15
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

Run Gluten Clickhouse CI on x86

@yaooqinn yaooqinn force-pushed the users/kentyao/spark-4.0.2 branch from ebca2f5 to a51b5cb Compare June 3, 2026 05:31
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

Run Gluten Clickhouse CI on x86

… 4.0

Spark 4.0.2 picks up SPARK-54439 (apache/spark#53142), a correctness fix
in KeyGroupedShuffleSpec.createPartitioning() with two new tests in
KeyGroupedPartitioningSuite. The vanilla tests use the base collectShuffles
helper which only matches ShuffleExchangeExec, so they fail under Gluten
where the shuffle is a ColumnarShuffleExchangeExec.

Rather than excluding them, port them as testGluten overrides (same pattern
as the existing SPARK-41471 tests) so they reuse the columnar-aware
collectShuffles helper and keep coverage of the correctness fix.

Locally verified on Velox backend (Spark 4.0.2, Scala 2.13): both new tests
pass (shuffles.size == 1 and checkAnswer), with no change to the set of
pre-existing suite failures.

Generated-by: Claude Opus 4.8
@yaooqinn yaooqinn force-pushed the users/kentyao/spark-4.0.2 branch from a51b5cb to fbb4804 Compare June 3, 2026 16:03
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2026

Run Gluten Clickhouse CI on x86

@yaooqinn yaooqinn merged commit 6565a40 into apache:main Jun 4, 2026
109 of 110 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core DOCS INFRA TOOLS

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants