-
Notifications
You must be signed in to change notification settings - Fork 0
disallow non-identity partition fields with schema field names #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
hey @rutb327 could you open this PR against the apache/iceberg-python repo? |
| assert spec.fields[i] == expected_partition_fields[i] | ||
|
|
||
|
|
||
| @pytest.mark.integration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this test! could you add another one to check that the same check is applied with creating the table with both schema and partition spec?
something like
schema = Schema(
NestedField(1, "id", LongType(), required=False),
NestedField(2, "event_ts", TimestampType(), required=False),
NestedField(3, "another_ts", TimestampType(), required=False),
NestedField(4, "str", StringType(), required=False),
)
partition_spec = PartitionSpec(
PartitionField(source_id=1, field_id=1000, transform=IdentityTransform(), name="id"), spec_id=1
)
table = _create_table_with_schema(catalog, schema, "2", partition_spec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and perhaps add a test for when the partition field is already there and we try to add a new schema field which will conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have raised the PR against the apache/iceberg-python. I will add these tests
Closes 2272
Rationale for this change
Implements the validation logic described in 2272to match Java and Rust behavior for partition field name conflicts with schema fields.
This mirrors the method in Java checkAndAddPartitionName():
https://github.com/apache/iceberg/blob/4dbc7f578eee7ceb9def35ebfa1a4cc236fb598f/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L392-L416
Identity transforms (
sourceColumnID != null)- Allow schema field name conflicts only when sourced form the same fieldNon-identity (
sourceColumnID == null)- Disallow any schema field name conflicts.In this PR
isinstance(transform, (IdentityTransform, VoidTransform))is used to achieve the same logic as Java’ssourceColumnIDcheck.Are these changes tested?
Yes, all existing tests pass and added a test covering validation scenarios.
Are there any user-facing changes?
Yes. Non-identity transforms can no longer use schema field names as partition field names.