[GLUTEN-11539][VL] Improve error message for unsupported spark.io.compression.codec in native shuffle#12333
Conversation
|
Run Gluten Clickhouse CI on x86 |
|
@brijrajk Thanks for proposing this change. However, I think we should throw an exception to users instead of using a different compression codec than the one that was configured. If cc: @FelixYBW Do you have any suggestions? |
We have a dedicated config to solve the codec mismatch issue between Gluten and Spark: @brijrajk can you refine the error message to something like: |
0ddc080 to
cd5694c
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Thanks @marin-ma and @FelixYBW for the feedback. Updated in the latest push:
Regarding |
|
Thank you for the fix. |
…pression.codec in native shuffle When `spark.io.compression.codec` is set to a codec not supported by the Gluten native shuffle writer (e.g. snappy, none), the previous error message only listed supported codecs with no guidance on resolution. Improve the message to tell users which codecs Gluten shuffle supports, that the configured codec is not supported, and how to fix it by configuring `spark.gluten.sql.columnar.shuffle.codec` explicitly. Before: The value of spark.io.compression.codec should be one of lz4, zstd, but was snappy After: Gluten shuffle only supports lz4, zstd. snappy is not supported. You may configure spark.gluten.sql.columnar.shuffle.codec to lz4 or zstd. The explicit `spark.gluten.sql.columnar.shuffle.codec` path is unchanged. Update the existing regression test and add two new cases: one for `none` (explicitly raised by reviewers) and one for the supported-codec happy path.
cd5694c to
93a29ef
Compare
|
Run Gluten Clickhouse CI on x86 |
|
@brijrajk LGTM. Thanks.
The uncompressed codepath is already supported. Set |
What changes are proposed in this pull request?
Fixes #11539
When
spark.io.compression.codecis set to a codec not supported by the Gluten native shuffle writer (e.g.snappy,none),GlutenShuffleUtils.getCompressionCodec()was throwing anIllegalArgumentExceptionwith a message that only listed the supported codecs but gave no guidance on how to resolve the issue.This PR improves the error message to clearly tell users:
spark.gluten.sql.columnar.shuffle.codecexplicitlyBefore:
After:
The explicit
spark.gluten.sql.columnar.shuffle.codecpath is unchanged — it still validates and throws with its own message when set to an unsupported value.Files changed
GlutenShuffleUtils.scala— replaces thelogWarning+ silent fallback in thecase Nonebranch with a clearIllegalArgumentExceptionpointing users tospark.gluten.sql.columnar.shuffle.codecMiscOperatorSuite.scala— updates the GLUTEN-11539 regression test to assert the exception and message instead of the old fallback behaviourHow was this patch tested?
The existing regression test (
GLUTEN-11539: unsupported spark.io.compression.codec throws with actionable messageinMiscOperatorSuite) was updated to useintercept[IllegalArgumentException]and assert that the message contains"snappy is not supported"and thespark.gluten.sql.columnar.shuffle.codeckey.Verified locally (Spark 4.0, Velox backend):
MiscOperatorSuite— 95/95 passed.Was this patch authored or co-authored using generative AI tooling?
Yes. Claude Code (claude-sonnet-4-6) was used as an AI assistant during development.