Summary
Add UniForm Iceberg support for Delta tables on the Velox backend.
Today Gluten documents Iceberg readers (UniForm) as Not tested in docs/get-started/VeloxDelta.md. This issue tracks enabling and validating the supported Velox path.
Current state
Gluten already has some relevant hooks:
backends-velox/src-delta33/main/scala/org/apache/spark/sql/delta/GlutenOptimisticTransaction.scala
- materializes partition columns for Iceberg compat
- tags
AddFile entries with Iceberg compat version
backends-velox/src-delta33/main/scala/org/apache/spark/sql/delta/GlutenDeltaParquetFileFormat.scala
- forces
TIMESTAMP_MICROS
- switches to
DeltaParquetWriteSupport for IcebergCompatV2
During investigation we also confirmed that Velox already supports explicit Parquet field_id assignment in its native Parquet writer.
Main gaps
- no
delta-iceberg dependency/test enablement in Gluten build/test flow
- no end-to-end Velox UniForm test coverage
- no proof that native Delta write passes the nested field IDs required by
IcebergCompatV2 into the native Parquet writer
- no explicit validation/fallback for unsupported cases such as active deletion vectors
- no supported-scope documentation beyond
Not tested
Relevant Delta requirements
At minimum, the supported path needs:
- column mapping enabled
minReaderVersion >= 2
minWriterVersion >= 7
delta.enableIcebergCompatV2=true
delta.universalFormat.enabledFormats=iceberg
- Delta 3.1+ writer
- Hive Metastore-backed Iceberg catalog path
- no active deletion vectors
- partition columns materialized in Parquet
numRecords populated in new AddFile stats
- timestamp columns written as int64 / micros
- nested array/map field IDs written into the Parquet schema
References
Summary
Add UniForm Iceberg support for Delta tables on the Velox backend.
Today Gluten documents
Iceberg readers (UniForm)asNot testedindocs/get-started/VeloxDelta.md. This issue tracks enabling and validating the supported Velox path.Current state
Gluten already has some relevant hooks:
backends-velox/src-delta33/main/scala/org/apache/spark/sql/delta/GlutenOptimisticTransaction.scalaAddFileentries with Iceberg compat versionbackends-velox/src-delta33/main/scala/org/apache/spark/sql/delta/GlutenDeltaParquetFileFormat.scalaTIMESTAMP_MICROSDeltaParquetWriteSupportforIcebergCompatV2During investigation we also confirmed that Velox already supports explicit Parquet
field_idassignment in its native Parquet writer.Main gaps
delta-icebergdependency/test enablement in Gluten build/test flowIcebergCompatV2into the native Parquet writerNot testedRelevant Delta requirements
At minimum, the supported path needs:
minReaderVersion >= 2minWriterVersion >= 7delta.enableIcebergCompatV2=truedelta.universalFormat.enabledFormats=icebergnumRecordspopulated in newAddFilestatsReferences