Fix Delta to Iceberg not working on column mapping enabled Delta source table #766

xr-chen · 2025-12-09T13:53:32Z

Important Read

Close Delta table to Iceberg doesn't work on column mapping enabled Delta source table #765

What is the purpose of the pull request

Currently, a column mapping enabled Delta table with array/map columns can't be converted into an Iceberg table using xTable because

Delta doesn't generate field IDs for elements within an array column, or key & value within a map column
For elements in an array column or key & value in a map column, IDs generated by the variable fieIdIdTrack are already used in the schema, which violates the field ID requirements of Iceberg's NestFiled type.

For example, the schema of a Delta table with a string column name and an array column scores would look like:

{
    "type": "struct",
    "fields": [
        {
            "name": "name",
            "type": "string",
            "nullable": true,
            "metadata": {
                "delta.columnMapping.id": 1,
                "delta.columnMapping.physicalName": "name"
            }
        },
        {
            "name": "scores",
            "type": {
                "type": "array",
                "elementType": "long",
                "containsNull": true
            },
            "nullable": true,
            "metadata": {
                "delta.columnMapping.id": 2,
                "delta.columnMapping.physicalName": "scores"
            }
        }
    ]
}

In the above schema, there wasn't a delta.columnMapping.id for elements in the array column. Similarly, map columns don't have field IDs for their key value.

  "type": "struct",
  "fields": [
    {
      "name": "name",
      "type": "string",
      "nullable": true,
      "metadata": {
        "delta.columnMapping.id": 1,
        "delta.columnMapping.physicalName": "name"
      }
    },
    {
      "name": "properties",
      "type": {
        "type": "map",
        "keyType": "string",
        "valueType": "string",
        "valueContainsNull": true
      },
      "nullable": true,
      "metadata": {
        "delta.columnMapping.id": 2,
        "delta.columnMapping.physicalName": "properties"
      }
    }
  ]
}

Brief change log

Update the fieldIdTracker variable to be the latest field ID every time we get a field ID either from the source table schema or from fieldIdTracker itself, such that fieldIdTracker won't return any ID already in the source table schema.

Verify this pull request

Added testToIcebergWithPartialFieldIdsSet to verify the change.

the-other-tim-brown · 2025-12-10T15:01:42Z

@xr-chen thank you for the bug report and contribution! There is a test case called ITConversionController that converts between the formats. Is it possible to update this test so it will trigger the same issue that you saw?

the-other-tim-brown · 2025-12-10T15:04:21Z

xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergSchemaExtractor.java

-                        : field.getFieldId())
+                field -> {
+                  int id =
+                      field.getFieldId() == null


If the fieldId is set, we need to use that ID. That is how we are able to handle renames in the schema. The reader will lookup the column based on the ID.

It still uses the ID in the schema, if the fieldId is set. Here, I just updated the fieldIdTracker such that it won't return any ID that was already used

the-other-tim-brown · 2025-12-10T15:07:24Z

xtable-core/src/test/java/org/apache/xtable/iceberg/TestIcebergSchemaExtractor.java

+            .name("testRecord")
+            .dataType(InternalType.RECORD)
+            .isNullable(false)
+            .fields(


Let's add another field that comes after the list to ensure the next field ID is chosen properly. For example, the field after the list is going to have ID 3. We want to make sure that this carries through to the Iceberg schema.

Let's make sure the Map case is also tested here to ensure there is no regression in the future.

Does this field to be added come with an ID or not?

Will test the Map case, that's a good point

Map case and more fields after the list were added

xr-chen · 2025-12-10T19:54:28Z

Thanks @the-other-tim-brown for reviewing the changes. You mean adding a new test in that controller for column mapping enabled Delta tables, maybe kind of like this one but for Delta?

incubator-xtable/xtable-core/src/test/java/org/apache/xtable/ITConversionController.java

Line 712 in 64f38b8

public void testIcebergCorruptedSnapshotRecovery() throws Exception {

…onversion process

xr-chen · 2025-12-11T12:09:40Z

The problem is actually more complicated than fixing the fieldId generation; the data file written by a column mapping enabled Delta table doesn't follow the table schema at all. We need to somehow map the columns in the data files to the columns in the schema.

root
 |-- col-4ee2e8c9-35f1-4868-8ff5-46d285fac4b2: integer (nullable = true)
 |-- col-89424a5c-1fbc-4f97-b9dc-c3bf90f9849e: string (nullable = true)
 |-- col-0c385cbd-8f94-4043-bb55-89f07a39a2bb: string (nullable = true)

mapping info in the delta table

[
 {
   "name": "id",
   "type": "integer",
   "nullable": false,
   "metadata": {
     "delta.columnMapping.id": 1,
     "delta.columnMapping.physicalName": "col-4ee2e8c9-35f1-4868-8ff5-46d285fac4b2"
   }
 },
 {
   "name": "firstName",
   "type": "string",
   "nullable": true,
   "metadata": {
     "delta.columnMapping.id": 2,
     "delta.columnMapping.physicalName": "col-89424a5c-1fbc-4f97-b9dc-c3bf90f9849e"
   }
 },
 {
   "name": "lastName",
   "type": "string",
   "nullable": true,
   "metadata": {
     "delta.columnMapping.id": 3,
     "delta.columnMapping.physicalName": "col-0c385cbd-8f94-4043-bb55-89f07a39a2bb"
   }
 }
]

the-other-tim-brown · 2025-12-11T14:32:33Z

xtable-core/src/test/java/org/apache/xtable/ITConversionController.java

  }

+  @Test
+  public void testColumnMappingEnabledDeltaToIceberg() {


@xr-chen to answer your question, yes this is exactly what I was looking for.

Do you think we should also do some minor schema evolution in this case?

@the-other-tim-brown Yes, I think so, but the code actually can't pass this test case now, so it probably won't work on any rename column type of schema change. It seems to me that only populating fieldId doesn't work, and the converted Iceberg doesn't know which 'physical' column in the data file to read data from for a 'logic' column name in the table schema, and it returns null values for all columns if we read from the generated Iceberg table. This issue is probably due to:

We don't extract the delta.columnMapping.physicalName from the Delta table's schema in

incubator-xtable/xtable-core/src/main/java/org/apache/xtable/delta/DeltaSchemaExtractor.java

Line 56 in 64f38b8

public class DeltaSchemaExtractor {

, so we don't know where the column is actually stored

In the converted Iceberg, it doesn't have a name mapping to recognize which Parquet column corresponds to a given Iceberg field ID

I added delta's physical column names into the Icerberg's schema.name-mapping.default, and the converted table could read data from the correct place now and could return the same content as the original delta table. But I got a weird issue during testing,

All tests could pass with this new test disabled when running mvn verify

This test could pass when running it independently

When running this test with all other tests together, ITConversionController.testVariousOperations will fail
@the-other-tim-brown, is there anything shared among the test cases?

The filesystem is shared between the tests along with the hadoop and spark configurations

Ahh, it's due to the idToStorageName field I added to the IcebergSchemaExtractor, the field wasn't reset before a new sync run, and the schema extractor was used as a singleton, so previous extraction results were carried over to the next run, which fails the test. Now the mvn clean verify passed on my end

Good catch on this, thanks for digging into the issue. I added a comment on that class.

Now that you have this working, should we add some schema evolution here?

I think we should at least add a second commit as a sanity check that everything works as expected.

Is there an existing function I can use to change the schema of the source delta table for testing, or I should implement it by myself?

By a second commit, you mean inserting more records by insertRows and syncing the table again to make sure it works in incremental sync mode as well?

We already have some helpers for this. Earlier in this test class you will see a test case that uses GenericTable.getInstanceWithAdditionalColumns. This has some helpers for creating the evolved Delta table schema under the hood that you should be able to build off of.

And yes, just inserting or updating some more records is fine. I just want to ensure there isn't some unexpected side-effect when we set this table property multiple times for Iceberg.

… parquet columns

vinishjail97 · 2025-12-16T19:30:26Z

xtable-api/src/main/java/org/apache/xtable/model/schema/InternalField.java

  // The id field for the field. This is used to identify the field in the schema even after
  // renames.
  Integer fieldId;
+  @Getter String storageName;


Can you add a comment for this field?

sure, will add

Comment added.

@xr-chen "name mapping" is a delta specific concept. The comment should describe more generally what is happening here. Something like The name of the column in the data file used to store this field when it differs from the name in table's definition
The comment should also describe whether this will be null when the names are the same or if the string is expected to be populated

@xr-chen will this value be null if it is not different?

Yes, it will be null

Can you add that detail to the comment?

vinishjail97 · 2025-12-16T19:33:09Z

xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergConversionTarget.java

  }

+  private MappedFields updateNameMapping(MappedFields mapping, Map<Integer, String> updates) {
+    if (mapping == null) {


Can mapping ever be null? If null is returned will NameMapping.of not throw an exception?

It's mainly used to end the recursive call; all nested fields of a given field were processed using this function as well. In that case, having null as a nested field actually makes sense. And for the mapping we want to update, I believe MappingUtil.create won't return us a null

…xtraction test

the-other-tim-brown · 2025-12-17T14:57:33Z

xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergConversionTarget.java

  public void syncSchema(InternalSchema schema) {
    Schema latestSchema = schemaExtractor.toIceberg(schema);
+    if (!schemaExtractor.getIdToStorageName().isEmpty()) {
+      NameMapping mapping = MappingUtil.create(latestSchema);


The IcebergTableManager is also setting the default name mapping, should we remove that and just rely on this?

I think it makes sense; it's better to update the name mapping in a single place

I attempted moving the code for setting default name mapping in IcebergTableManager to here, but some test cases failed; there might be some reasons the map should be initialized there?

There is no special reason. It is just there from when I first set this up. The mapping needs to be there to handle the case where the field ID is not set in the file schema as you have seen.

I see, I've moved the name mapping code to the schema sync function now

the-other-tim-brown · 2025-12-19T01:58:36Z

xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergSchemaExtractor.java

  private static final String MAP_KEY_FIELD_NAME = "key";
  private static final String MAP_VALUE_FIELD_NAME = "value";
  private static final String LIST_ELEMENT_FIELD_NAME = "element";
+  @Getter private final Map<Integer, String> idToStorageName = new HashMap<>();


@xr-chen This adds state to the class so we have to decide if we want to make an instance of this class per conversion or remove this state.

Removing the state would require you to return the map as part of the response for the toIceberg.

I don't have a strong opinion either way, but I would prefer that over the clear call in the toIceberg method.

Good suggestion, this will avoid unexpected outcomes caused by adding states to a singleton object. I will update the code

…lution

Fix Iceberg duplicate field id in converted schema

e1e0ed0

xr-chen changed the title ~~Fix duplicate field id in converted Iceberg schema from columnMapping enabled Delta table~~ Fix Delta to Iceberg not working on column mapping enabled Delta source table Dec 10, 2025

the-other-tim-brown reviewed Dec 10, 2025

View reviewed changes

Traverse the whole schema before converting, add test for the whole c…

210ea62

…onversion process

the-other-tim-brown reviewed Dec 11, 2025

View reviewed changes

update name mapping in the target Iceberg to read values from correct…

4526881

… parquet columns

vinishjail97 reviewed Dec 16, 2025

View reviewed changes

Add comments for physical field name and add more fields for schema e…

2d92688

…xtraction test

the-other-tim-brown reviewed Dec 17, 2025

View reviewed changes

reset id to name mapping before extraction

fc763a9

xr-chen requested review from the-other-tim-brown and vinishjail97 December 18, 2025 02:02

xr-chen added 2 commits December 18, 2025 16:05

Update name mapping in one place

6526ab4

update comments

baf9d33

the-other-tim-brown reviewed Dec 19, 2025

View reviewed changes

xr-chen added 2 commits December 19, 2025 02:48

Create schema extractor per conversion, add test cases for schema evo…

88ac8c0

…lution

Add adding columns case for schema evolution test

07dd0d1

Fix Delta to Iceberg not working on column mapping enabled Delta source table #766

Are you sure you want to change the base?

Fix Delta to Iceberg not working on column mapping enabled Delta source table #766

Uh oh!

Conversation

xr-chen commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Important Read

What is the purpose of the pull request

Brief change log

Verify this pull request

Uh oh!

the-other-tim-brown commented Dec 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xr-chen Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xr-chen commented Dec 10, 2025

Uh oh!

xr-chen commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xr-chen Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xr-chen Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

xr-chen commented Dec 9, 2025 •

edited

Loading

xr-chen Dec 10, 2025 •

edited

Loading

xr-chen commented Dec 11, 2025 •

edited

Loading

xr-chen Dec 11, 2025 •

edited

Loading

xr-chen Dec 16, 2025 •

edited

Loading