How can I test if two Ibis expressions are the same? #8754
-
Some context for those familiar with dbtI'm thinking about ways to make https://github.com/binste/dbt-ibis faster. In a larger dbt deployment with multiple complex Ibis "models", I've noticed that the compilation of the Ibis expressions to SQL can take a while. A complex expression might take ~2-3 seconds. Currently, dbt-ibis just compiles all Ibis expressions to SQL even if the expression has not changed and hence the SQL it generates is the same as is already saved. I could store some information, such as hashes, across runs of dbt-ibis to be more intelligent in which Ibis expressions need to be compiled to SQL. QuestionI'm looking for a way to test if two Ibis expressions will produce the exact same SQL code if they would be compiled, without having to actually compile them. Did someone already try this before? Any ideas are welcome! :) What I thought about so far:
Here some example code to play around with: import ibis
import datetime as dt
import ibis.expr.types as ir
def model() -> ir.Table:
# Exact code here does not matter. This just represents a function which returns any arbitrary Ibis Table expression.
t = ibis.table({"a": "string", "b": "int"}, name="table_1")
t1 = t.mutate(new=t.a.cast("int") + t.b, max=ibis.greatest(t.a.cast("int"), t.b))
t2 = t1.filter(t1.a == "foo").filter(
t1.a.cast("timestamp") == dt.datetime(2017, 1, 1)
)
t2 = t2.select(["a", "b"])
return t2
model_1 = model()
model_2 = model()
hash(model_1) == hash(model_2) # True
repr(model_1) == repr(model_2) # True
model_1 == model_2 # False |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 3 replies
-
there is [ins] In [1]: import ibis
[ins] In [2]: t = ibis.examples.penguins.fetch()
[ins] In [3]: a = t.group_by(["species", "island"]).agg(count=t.count()).order_by("count")
[ins] In [4]: b = t.group_by(["species", "island"]).agg(count=t.count()).order_by("count")
[ins] In [5]: a.equals(b)
Out[5]: True
[ins] In [6]: a.equals(t)
Out[6]: False |
Beta Was this translation helpful? Give feedback.
-
also are you on |
Beta Was this translation helpful? Give feedback.
-
Also see #8446 |
Beta Was this translation helpful? Give feedback.
-
Thank you both! I'll need to up my search game, I didn't find I'm looking for a way to test equality of expressions which are generated in separate Python processes at separate times, i.e. across Python runs. This was not really clear in my original question and the example code is confusing. Here a more detailed example. This is some shared code across Python runs: import ibis
import datetime as dt
import ibis.expr.types as ir
def generate_ibis_expression() -> ir.Table:
# Exact code here does not matter. This just represents a function
# which returns any arbitrary Ibis Table expression.
t = ibis.table({"a": "string", "b": "int"}, name="table_1")
t1 = t.mutate(new=t.a.cast("int") + t.b, max=ibis.greatest(t.a.cast("int"), t.b))
t2 = t1.filter(t1.a == "foo").filter(
t1.a.cast("timestamp") == dt.datetime(2017, 1, 1)
)
t2 = t2.select(["a", "b"])
return t2
def store_some_representation(ibis_expression: ir.Table) -> None:
# TODO: Figure out how to store a representation of an Ibis expression on disk
# which can be loaded later and be compared to another one.
with open("expr_representation.txt", "w") as f:
f.write(hash(ibis_expression))
def compare_expression_to_stored_representation(ibis_expression: ir.Table) -> bool:
with open("expr_representation.txt", "r") as f:
stored_hash = f.read()
# This does not work as hashes are not stable across runs but just to illustrate
# what I mean:
return hash(ibis_expression) == stored_hash The first time the script runs, we just generate the Ibis expression and store some representation (or pickle the whole expression?): ibis_expression = generate_ibis_expression()
store_some_representation(ibis_expression) The second time it runs, we do this: ibis_expression = generate_ibis_expression()
compare_expression_to_stored_representation(ibis_expression) As it's across runs, hashes won't be stable and it seems to me that Pickling failed for me in an Ibis expression where I'm referencing a built-in SQL function as described here. I'll need to do some more experiments with it. Any inputs are welcome but of course I fully understand if this is a special case you don't want to think about right now :) |
Beta Was this translation helpful? Give feedback.
-
Right now there are two options:
We've had a few requests for something like this, it might be time to gather the requests and use cases and see if there's something we can implement to support those use cases. |
Beta Was this translation helpful? Give feedback.
there is
exprA.equals(exprB)
: