Skip to content

fix upsert with null values #1861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 31, 2025

Conversation

kevinjqliu
Copy link
Contributor

@kevinjqliu kevinjqliu commented Mar 30, 2025

Rationale for this change

Closes #1835

Original implementation, != (not_equal) does not account for null values. It emits null when either sides are null
The new implementation, first checks for not_equal. And on null values, returns true only if both sides are null

Similar to apache/iceberg-rust#1045

Are these changes tested?

Yes

Are there any user-facing changes?

No

@kevinjqliu kevinjqliu added this to the PyIceberg 0.9.1 milestone Mar 30, 2025
@Fokko
Copy link
Contributor

Fokko commented Mar 31, 2025

Thanks for looking into this @kevinjqliu, pretty annoying null-handling here :/

@kevinjqliu
Copy link
Contributor Author

kevinjqliu commented Mar 31, 2025

now im wondering if nulls are properly handled when we convert iceberg expressions to pyarrow expressions

def visit_equal(self, term: BoundTerm[Any], literal: Literal[Any]) -> pc.Expression:
return pc.field(term.ref().field.name) == _convert_scalar(literal.value, term.ref().field.field_type)
def visit_not_equal(self, term: BoundTerm[Any], literal: Literal[Any]) -> pc.Expression:
return pc.field(term.ref().field.name) != _convert_scalar(literal.value, term.ref().field.field_type)

https://arrow.apache.org/docs/python/generated/pyarrow.compute.equal.html

Compare values for equality (x == y).

A null on either side emits a null comparison result.

@Fokko
Copy link
Contributor

Fokko commented Mar 31, 2025

now im wondering if nulls are properly handled when we convert iceberg expressions to pyarrow expressions

I think we need to check the Or and And BooleanExpressions, because I think we also want to use the kleene or there 👍

Edit: I think that situation is different since we're comparing two columns in the Upsert logic. For the visitor that you linked, we compare against a literal.

@kevinjqliu kevinjqliu merged commit d69a191 into apache:main Mar 31, 2025
7 checks passed
@kevinjqliu kevinjqliu deleted the kevinjqliu/upsert-with-null branch March 31, 2025 17:33
@kevinjqliu
Copy link
Contributor Author

thanks for the review @Fokko

i opened #1865 to test the scan logic with nulls

Fokko pushed a commit that referenced this pull request Apr 17, 2025
<!--
Thanks for opening a pull request!
-->

<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->

# Rationale for this change
Closes #1835

Original implementation, `!=`
([not_equal](https://arrow.apache.org/docs/python/generated/pyarrow.compute.not_equal.html#pyarrow.compute.not_equal))
does not account for `null` values. It emits `null` when either sides
are `null`
The new implementation, first checks for `not_equal`. And on null
values, returns `true` only if both sides are `null`

Similar to apache/iceberg-rust#1045

# Are these changes tested?
Yes

# Are there any user-facing changes?
No

<!-- In the case of user-facing changes, please add the changelog label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upsert with NULL value not working
2 participants