What's the feature are you trying to implement?
Implement SQL UPDATE functionality for iceberg-datafusion integration, enabling row-level updates with WHERE clause filtering. This completes the essential DML operations alongside INSERT and the upcoming MERGE INTO support #2201.
Refer to the insert_into support
SQL Example
UPDATE orders
SET status = 'shipped', shipped_date = current_date()
WHERE status = 'pending' AND payment_confirmed = true;
Overall Architecture
UPDATE table SET col1 = val1, col2 = val2 WHERE condition
↓
TableProvider::update() [NEW]
↓
IcebergTableScan (with WHERE filters)
↓
[Optional if partitioned: Project partition + Repartition + Sort for partitioned tables]
↓
IcebergUpdateWriteExec [NEW] - Apply assignments, write new files, track deleted
files
↓
CoalescePartitionsExec (reuse existing)
↓
IcebergUpdateCommitExec [NEW] - Commit via RowDelta transaction
↓
RecordBatch(count: UInt64)
Strategy: Copy-on-Write (COW)
- Scan table with WHERE filters to find matching rows
- Apply UPDATE assignments (evaluate expressions)
- Write modified rows to new data files
- Mark original files as deleted
- Commit atomically with RowDelta (add new files + remove old files)
The following tasks are already completed on the PoC branch with 6 commits. Will raise formal PRs one after another as the fork repo doesn't support stacking PRs.
Willingness to contribute
I can contribute to this feature independently
What's the feature are you trying to implement?
Implement SQL UPDATE functionality for iceberg-datafusion integration, enabling row-level updates with WHERE clause filtering. This completes the essential DML operations alongside INSERT and the upcoming
MERGE INTOsupport #2201.Refer to the insert_into support
SQL Example
Overall Architecture
Strategy: Copy-on-Write (COW)
The following tasks are already completed on the PoC branch with 6 commits. Will raise formal PRs one after another as the fork repo doesn't support stacking PRs.
MERGE INTOWillingness to contribute
I can contribute to this feature independently