Skip to content

[VL][Delta] Add UniForm Iceberg support for Delta tables #12039

@malinjawi

Description

@malinjawi

Summary

Add UniForm Iceberg support for Delta tables on the Velox backend.

Today Gluten documents Iceberg readers (UniForm) as Not tested in docs/get-started/VeloxDelta.md. This issue tracks enabling and validating the supported Velox path.

Current state

Gluten already has some relevant hooks:

  • backends-velox/src-delta33/main/scala/org/apache/spark/sql/delta/GlutenOptimisticTransaction.scala
    • materializes partition columns for Iceberg compat
    • tags AddFile entries with Iceberg compat version
  • backends-velox/src-delta33/main/scala/org/apache/spark/sql/delta/GlutenDeltaParquetFileFormat.scala
    • forces TIMESTAMP_MICROS
    • switches to DeltaParquetWriteSupport for IcebergCompatV2

During investigation we also confirmed that Velox already supports explicit Parquet field_id assignment in its native Parquet writer.

Main gaps

  • no delta-iceberg dependency/test enablement in Gluten build/test flow
  • no end-to-end Velox UniForm test coverage
  • no proof that native Delta write passes the nested field IDs required by IcebergCompatV2 into the native Parquet writer
  • no explicit validation/fallback for unsupported cases such as active deletion vectors
  • no supported-scope documentation beyond Not tested

Relevant Delta requirements

At minimum, the supported path needs:

  • column mapping enabled
  • minReaderVersion >= 2
  • minWriterVersion >= 7
  • delta.enableIcebergCompatV2=true
  • delta.universalFormat.enabledFormats=iceberg
  • Delta 3.1+ writer
  • Hive Metastore-backed Iceberg catalog path
  • no active deletion vectors
  • partition columns materialized in Parquet
  • numRecords populated in new AddFile stats
  • timestamp columns written as int64 / micros
  • nested array/map field IDs written into the Parquet schema

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions