Add MERGE INTO support for DataFusion integration

### What's the feature are you trying to implement?

Add support for SQL `MERGE INTO` (UPSERT) operations in the iceberg-datafusion integration. This enables atomic row-level updates and inserts based on join conditions, essential for CDC pipelines, incremental updates, and data synchronization. I already have a [PoC branch](https://github.com/wirybeaver/iceberg-rust/commits/feature/merge-into/).

Refer to the [insert_into support](https://github.com/apache/iceberg-rust/issues/1540)

The Spark **SPJ** (Storage Partition Join) style is the key optimization I wanted to introduce. The Datafusion currently doesn't support `merge_into` sql parsing and logic plan yet. I am contributing the "MERGE INTO" in datafusion as well: https://github.com/apache/datafusion/issues/20746.

**SQL Example:**
```sql
MERGE INTO target_table t
USING source_table s
ON t.id = s.id
WHEN MATCHED THEN
  UPDATE SET t.value = s.value
WHEN NOT MATCHED THEN
  INSERT (id, value) VALUES (s.id, s.value)
```

The following tasks are already completed on the PoC branch. Will raise formal PRs one after another as the fork repo doesn't support stacking PRs.
- [x] https://github.com/apache/iceberg-rust/pull/2203
- [ ] Add IcebergMergeExec with HashJoinExec integration and row classification
- [ ] Add IcebergMergeWriteExec and IcebergMergeCommitExec nodes
- [ ] Implement full MERGE execution logic with file tracking
- [ ] Integrate MERGE INTO into IcebergTableProvider
- [ ] Add comprehensive MERGE INTO integration tests
- [ ] Add partition-aware merge optimization (spark storage partition join style)

### Willingness to contribute

I would be willing to contribute to this feature with guidance from the Iceberg Rust community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MERGE INTO support for DataFusion integration #2201

What's the feature are you trying to implement?

Willingness to contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add MERGE INTO support for DataFusion integration #2201

Description

What's the feature are you trying to implement?

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions