Skip to content

[GLUTEN-11539][VL] Improve error message for unsupported spark.io.compression.codec in native shuffle#12333

Merged
marin-ma merged 1 commit into
apache:mainfrom
brijrajk:fix/11539-shuffle-codec-fallback
Jun 24, 2026
Merged

[GLUTEN-11539][VL] Improve error message for unsupported spark.io.compression.codec in native shuffle#12333
marin-ma merged 1 commit into
apache:mainfrom
brijrajk:fix/11539-shuffle-codec-fallback

Conversation

@brijrajk

@brijrajk brijrajk commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

What changes are proposed in this pull request?

Fixes #11539

When spark.io.compression.codec is set to a codec not supported by the Gluten native shuffle writer (e.g. snappy, none), GlutenShuffleUtils.getCompressionCodec() was throwing an IllegalArgumentException with a message that only listed the supported codecs but gave no guidance on how to resolve the issue.

This PR improves the error message to clearly tell users:

  1. Which codecs Gluten shuffle supports
  2. That the configured codec is not supported
  3. How to fix it: configure spark.gluten.sql.columnar.shuffle.codec explicitly

Before:

java.lang.IllegalArgumentException: The value of spark.io.compression.codec should be one of lz4, zstd, but was snappy

After:

java.lang.IllegalArgumentException: Gluten shuffle only supports lz4, zstd. snappy is not supported. You may configure spark.gluten.sql.columnar.shuffle.codec to lz4 or zstd.

The explicit spark.gluten.sql.columnar.shuffle.codec path is unchanged — it still validates and throws with its own message when set to an unsupported value.

Files changed

  • GlutenShuffleUtils.scala — replaces the logWarning + silent fallback in the case None branch with a clear IllegalArgumentException pointing users to spark.gluten.sql.columnar.shuffle.codec
  • MiscOperatorSuite.scala — updates the GLUTEN-11539 regression test to assert the exception and message instead of the old fallback behaviour

How was this patch tested?

The existing regression test (GLUTEN-11539: unsupported spark.io.compression.codec throws with actionable message in MiscOperatorSuite) was updated to use intercept[IllegalArgumentException] and assert that the message contains "snappy is not supported" and the spark.gluten.sql.columnar.shuffle.codec key.

Verified locally (Spark 4.0, Velox backend): MiscOperatorSuite — 95/95 passed.


Was this patch authored or co-authored using generative AI tooling?

Yes. Claude Code (claude-sonnet-4-6) was used as an AI assistant during development.

@github-actions github-actions Bot added CORE works for Gluten Core VELOX labels Jun 22, 2026
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@marin-ma

Copy link
Copy Markdown
Contributor

@brijrajk Thanks for proposing this change. However, I think we should throw an exception to users instead of using a different compression codec than the one that was configured.

If spark.io.compression.codec is set to none, then the data should be uncompressed.

cc: @FelixYBW Do you have any suggestions?

@FelixYBW

FelixYBW commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

@brijrajk Thanks for proposing this change. However, I think we should throw an exception to users instead of using a different compression codec than the one that was configured.

If spark.io.compression.codec is set to none, then the data should be uncompressed.

We have a dedicated config to solve the codec mismatch issue between Gluten and Spark: spark.gluten.sql.columnar.shuffle.codec, it overrides spark.io.compression.codec. If users only config spark.io.compression.codec, I think we should raise the exception and let users know they need to configure spark.gluten.sql.columnar.shuffle.codec

@brijrajk can you refine the error message

java.lang.IllegalArgumentException: The value of spark.io.compression.codec should be one of lz4, zstd, but was snappy

to something like:

Gluten shuffle only supports lz4, zstd. snappy isn't supported yet. You may config spark.gluten.sql.columnar.shuffle.codec to lz4/zstd.

@brijrajk brijrajk force-pushed the fix/11539-shuffle-codec-fallback branch from 0ddc080 to cd5694c Compare June 24, 2026 01:38
@brijrajk brijrajk changed the title [GLUTEN-11539][VL] Fall back to zstd when spark.io.compression.codec is unsupported [GLUTEN-11539][VL] Improve error message for unsupported spark.io.compression.codec in native shuffle Jun 24, 2026
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@brijrajk

Copy link
Copy Markdown
Contributor Author

Thanks @marin-ma and @FelixYBW for the feedback. Updated in the latest push:

  • Removed the silent fallback to zstd — unsupported codecs (including none) now throw IllegalArgumentException
  • Refined the error message to match the suggested format:
    Gluten shuffle only supports lz4, zstd. snappy is not supported. You may configure spark.gluten.sql.columnar.shuffle.codec to lz4 or zstd.
    
  • Updated the regression test in MiscOperatorSuite to assert the exception and message instead of the old fallback behaviour

Regarding spark.io.compression.codec=none (@marin-ma): the current implementation throws for none as well, pointing users to spark.gluten.sql.columnar.shuffle.codec. Supporting uncompressed native shuffle would be a separate follow-up.

@FelixYBW

Copy link
Copy Markdown
Contributor

Thank you for the fix.

…pression.codec in native shuffle

When `spark.io.compression.codec` is set to a codec not supported by
the Gluten native shuffle writer (e.g. snappy, none), the previous error
message only listed supported codecs with no guidance on resolution.

Improve the message to tell users which codecs Gluten shuffle supports,
that the configured codec is not supported, and how to fix it by
configuring `spark.gluten.sql.columnar.shuffle.codec` explicitly.

Before:
  The value of spark.io.compression.codec should be one of lz4, zstd,
  but was snappy

After:
  Gluten shuffle only supports lz4, zstd. snappy is not supported.
  You may configure spark.gluten.sql.columnar.shuffle.codec to lz4 or zstd.

The explicit `spark.gluten.sql.columnar.shuffle.codec` path is unchanged.
Update the existing regression test and add two new cases: one for
`none` (explicitly raised by reviewers) and one for the supported-codec
happy path.
@brijrajk brijrajk force-pushed the fix/11539-shuffle-codec-fallback branch from cd5694c to 93a29ef Compare June 24, 2026 01:47
@github-actions

Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@marin-ma

Copy link
Copy Markdown
Contributor

@brijrajk LGTM. Thanks.

Supporting uncompressed native shuffle would be a separate follow-up.

The uncompressed codepath is already supported. Set spark.shuffle.compress=false will disable shuffle compression.
https://github.com/apache/gluten/blob/main/backends-velox/src/main/scala/org/apache/gluten/vectorized/ColumnarBatchSerializer.scala#L96-L101

@marin-ma marin-ma merged commit 8eb0413 into apache:main Jun 24, 2026
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] Job failed when the compression.codec was not supported

3 participants