Backend
VL (Velox)
Bug description
Problem
Spark builds the write-side Hadoop configuration with sessionState.newHadoopConfWithOptions(options) before invoking the file writer. This makes configs provided as spark.hadoop.<key> visible to the underlying Parquet writer as <key>.
Gluten Velox native write currently builds native write parameters from write options only, so configs coming from Spark HadoopConf are not propagated.
Reproduction
Enable the Velox native writer and set a Parquet write config through Spark HadoopConf, for example:
spark.conf.set("spark.hadoop.parquet.enable.dictionary", "false")
Expected Behavior
The native writer should respect HadoopConf-backed Parquet write configs in the same way Spark's native file write path does.
For spark.hadoop.parquet.enable.dictionary=false, the written Parquet footer should not contain dictionary encodings such as RLE_DICTIONARY or PLAIN_DICTIONARY.
Actual Behavior
The native writer ignores the spark.hadoop.* config, and the Parquet footer still shows dictionary encoding.
Gluten version
main branch
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
Backend
VL (Velox)
Bug description
Problem
Spark builds the write-side Hadoop configuration with
sessionState.newHadoopConfWithOptions(options)before invoking the file writer. This makes configs provided asspark.hadoop.<key>visible to the underlying Parquet writer as<key>.Gluten Velox native write currently builds native write parameters from write options only, so configs coming from Spark HadoopConf are not propagated.
Reproduction
Enable the Velox native writer and set a Parquet write config through Spark HadoopConf, for example:
Expected Behavior
The native writer should respect HadoopConf-backed Parquet write configs in the same way Spark's native file write path does.
For spark.hadoop.parquet.enable.dictionary=false, the written Parquet footer should not contain dictionary encodings such as RLE_DICTIONARY or PLAIN_DICTIONARY.
Actual Behavior
The native writer ignores the spark.hadoop.* config, and the Parquet footer still shows dictionary encoding.
Gluten version
main branch
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs