Skip to content

Conversation

@rutb327
Copy link
Owner

@rutb327 rutb327 commented Aug 7, 2025

Closes 2272

Rationale for this change

Implements the validation logic described in 2272to match Java and Rust behavior for partition field name conflicts with schema fields.
This mirrors the method in Java checkAndAddPartitionName():
https://github.com/apache/iceberg/blob/4dbc7f578eee7ceb9def35ebfa1a4cc236fb598f/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L392-L416

Identity transforms (sourceColumnID != null)- Allow schema field name conflicts only when sourced form the same field
Non-identity (sourceColumnID == null)- Disallow any schema field name conflicts.

In this PR isinstance(transform, (IdentityTransform, VoidTransform)) is used to achieve the same logic as Java’s sourceColumnID check.

Are these changes tested?

Yes, all existing tests pass and added a test covering validation scenarios.

Are there any user-facing changes?

Yes. Non-identity transforms can no longer use schema field names as partition field names.

@kevinjqliu
Copy link

hey @rutb327 could you open this PR against the apache/iceberg-python repo?

assert spec.fields[i] == expected_partition_fields[i]


@pytest.mark.integration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this test! could you add another one to check that the same check is applied with creating the table with both schema and partition spec?

something like

    schema = Schema(
        NestedField(1, "id", LongType(), required=False),
        NestedField(2, "event_ts", TimestampType(), required=False),
        NestedField(3, "another_ts", TimestampType(), required=False),
        NestedField(4, "str", StringType(), required=False),
    )
   partition_spec = PartitionSpec(
        PartitionField(source_id=1, field_id=1000, transform=IdentityTransform(), name="id"), spec_id=1
    )
    table = _create_table_with_schema(catalog, schema, "2", partition_spec)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and perhaps add a test for when the partition field is already there and we try to add a new schema field which will conflict

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have raised the PR against the apache/iceberg-python. I will add these tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants