feat(datafusion): Add schema validation for partition projection#2008
Merged
liurenjie1024 merged 1 commit intoapache:mainfrom Jan 13, 2026
Merged
Conversation
4e6566f to
62ab07a
Compare
| /// | ||
| /// # Returns | ||
| /// A new Arrow data type with all metadata removed from nested structures | ||
| fn strip_metadata_from_datatype(data_type: &DataType) -> DataType { |
Contributor
There was a problem hiding this comment.
Two suggestions:
- Move this part to arrow module, we have plans to make move arrow out of core library, so it would be better to put all arrow related code to same module.
- Use ArrowSchemaVisitor to do it.
Member
Author
There was a problem hiding this comment.
Okay. I will update this today.
d9b253a to
a5b4783
Compare
Implement schema validation in project_with_partition to ensure the input schema matches the Iceberg table schema before calculating partition values. This prevents subtle bugs from schema mismatches and provides clear error messages when schemas don't match. Changes: - Add helper functions to recursively strip metadata from Arrow schemas - Implement schema validation that compares input schema with expected Iceberg table schema, ignoring metadata differences - Add comprehensive tests for metadata stripping and schema validation - Closes apache#1752 The implementation follows the approach suggested in issue apache#1752: - Recursively visits schema and removes metadata from all fields - Compares cleaned schemas using Arrow's built-in equality operator - Returns helpful error messages showing both schemas on mismatch Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
a5b4783 to
d3a1c7a
Compare
liurenjie1024
approved these changes
Jan 13, 2026
Contributor
liurenjie1024
left a comment
There was a problem hiding this comment.
Thanks @viirya for this fix!
Member
Author
|
Thanks @liurenjie1024 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implement schema validation in project_with_partition to ensure the input schema matches the Iceberg table schema before calculating partition values. This prevents subtle bugs from schema mismatches and provides clear error messages when schemas don't match.
Changes:
The implementation follows the approach suggested in issue #1752:
Which issue does this PR close?
What changes are included in this PR?
Are these changes tested?