Skip to content

Support Filters on Top-Level Struct Fields #1832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 23, 2025

Conversation

srilman
Copy link
Contributor

@srilman srilman commented Mar 22, 2025

Closes #1778.

Rationale for this change

Current, filters that are applied to the top-level struct column do not work. For example, given a table of schema:

table {
  2: id: optional int
  1: data: required string
  3: location: struct<5: latitude: optional float, 6: longitude: optional float>
}

We want to support applying filters to field location, such as location is not null. Note that filters like location == {"latitude": ..., "longitude": ...} wont work right now, but can be equivalently rewritten to location.latitude == ... and location.longitude == ....

Are these changes tested?

Yes, tests were added at both the schema level and table reads.

Are there any user-facing changes?

Support some basic filters on struct columns at the top-level.

@@ -925,7 +926,7 @@ def primitive_fields() -> List[NestedField]:
]


def test_add_top_level_primitives(primitive_fields: NestedField) -> None:
def test_add_top_level_primitives(primitive_fields: List[NestedField]) -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing a typing error I noticed. The type of the fixture was incorrect.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @srilman Thanks for fixing this, this looks great to me 👍

@Fokko Fokko merged commit 71cb247 into apache:main Mar 23, 2025
7 checks passed
@Fokko Fokko added this to the PyIceberg 0.9.1 milestone Apr 20, 2025
Fokko pushed a commit that referenced this pull request Apr 25, 2025
Closes #1778.

# Rationale for this change

Current, filters that are applied to the top-level struct column do not
work. For example, given a table of schema:
```
table {
  2: id: optional int
  1: data: required string
  3: location: struct<5: latitude: optional float, 6: longitude: optional float>
}
```
We want to support applying filters to field `location`, such as
`location is not null`. Note that filters like `location == {"latitude":
..., "longitude": ...}` wont work right now, but can be equivalently
rewritten to `location.latitude == ... and location.longitude == ...`.

# Are these changes tested?

Yes, tests were added at both the schema level and table reads.

# Are there any user-facing changes?

Support some basic filters on struct columns at the top-level.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Applying Filter on Top-Level Struct Columns Throws Error
2 participants