feat: make Table.fill_null() have consistent semantics with .select(), etc #11698
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is breaking.
Before, doing
t.fill_null("a")
filled with the string literal "a". Now it tries to bind against a column "a".It also was impossible to use column references such as
t.fill_null(t.other_col)
ort.fill_null(ibis._.other_col)
. Now, it is consistent with how .select(), .mutate(), etc all work, and those two examples work.This also changes the behavior of the polars backend. Before, Table.fill_null() would fill in NaN. Now it does not, consistent with other backends, and the rest of ibis's Column.fill_null() semantics, including how polars treats Column.fill_null()
This does mean there is now some inconsistency between Table.fill_null() and Value.fill_null(). Value.fill_null("a") interprets the "a" as a literal value, not a column reference. We could adjust that API as well, or at least add a very visible warning to both docstrings. But then what does that mean for the other Value APIs, like Value.isin(["a"])? Is "a" a literal or column reference? Maybe the difference is if the method is on a Table, all strings are treated as column references, but on Values they are treated as literals. To be even more breaking, but what I think would be the most consistent, would be if we made "a" ALWAYS be a literal, and you HAVE to use
_.a
to reference a column. This would be super breaking, but we could probably provide a decent upgrade path by using a heuristic ofis "a" a column in the table?
to give warnings for several ibis versions.As a non-public-facing refactor, this also simplifies the internal representation so that the ops.FillNull always stores a
Mapping[str, <bound ops.Value>]