-
Notifications
You must be signed in to change notification settings - Fork 1k
feat(optimizer)!: Annotate parameterized numeric types for Snowflake functions #6230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Let's check whether these rules apply to other operators as well, e.g., |
7946b90 to
0778d24
Compare
georgesittas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work & research @fivetran-MichaelLee, this is super valuable.
After reviewing this PR, I feel hesitant merging it as-is, mainly because it's uncovered some existing inconsistencies / feature gaps and my hunch is that once we properly address those, then the current solution can be made much simpler.
I suggest pausing for now and thinking this through a bit more.
|
|
||
| if len(expressions) >= 1: | ||
| p_expr = expressions[0] | ||
| if isinstance(p_expr, exp.DataTypeParam) and isinstance(p_expr.this, exp.Literal): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be surprised if non-integers were syntactically/semantically valid. I think it's safe to assume that only integers can be used for the precision and scale parameters.
Additionally, numerics always have default precision and scale if missing, e.g., NUMBER is a synonym of NUMBER(38, 0) (and is parsed as such). So we could also do something like:
precision_value = precision.this.to_py() if isinstance(precision, exp.DataTypeParam) or 38
scale_value = scale.this.to_py() if isinstance(scale, exp.DataTypeParam) or 0And so it should be possible to simplify this whole logic quite a bit.
| if expression.is_number: | ||
| precision, scale = _extract_literal_precision_scale(expression.this) | ||
| return exp.DataType( | ||
| this=exp.DataType.Type.DECIMAL, | ||
| expressions=[ | ||
| exp.DataTypeParam(this=exp.Literal.number(precision)), | ||
| exp.DataTypeParam(this=exp.Literal.number(scale)), | ||
| ], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should override the Literal annotator for this:
# In typing/snowflake.py
def _annotate_literal(self, expression):
expression = self._annotate_literal(expression)
if expression.is_type(exp.DataType.Type.DECIMAL):
precision, scale = _extract_literal_precision_scale(expression.this)
expression.type.set("expressions", [exp.DataTypeParam(...), exp.DataTypeParam(...)])
return expression| if expression.type.is_type(*exp.DataType.INTEGER_TYPES) and not expression.type.expressions: | ||
| return exp.DataType( | ||
| this=exp.DataType.Type.DECIMAL, | ||
| expressions=[ | ||
| exp.DataTypeParam(this=exp.Literal.number(38)), | ||
| exp.DataTypeParam(this=exp.Literal.number(0)), | ||
| ], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an interesting change. Indeed, according to the docs, integers are really NUMBER(38, 0).
I wonder if we should hook into _set_type for Snowflake and intercept integer type annotations in order to automatically change them to NUMBER(38, 0) and coercion works as expected... This feels like a much bigger change, though.
(sqlglot) ➜ sqlglot git:(main) snow sql -q "select system\$typeof(cast(1 as int) + cast(2 as number(5, 3)))"
select system$typeof(cast(1 as int) + cast(2 as number(5, 3)))
+---------------------------------------------------------+
| SYSTEM$TYPEOF(CAST(1 AS INT) + CAST(2 AS NUMBER(5, 3))) |
|---------------------------------------------------------|
| NUMBER(38,3)[SB2] |
+---------------------------------------------------------+
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also exposes an inconsistency where at parse time we change NUMBER (or equivalent, e.g. DECIMAL) to NUMBER(38, 0), but we don't do it for these integer types...
| if p1 >= p2 and s1 >= s2: | ||
| return type1.copy() | ||
|
|
||
| if p2 >= p1 and s2 >= s1: | ||
| return type2.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to copy() here, this is taken care of in _set_type -> type.setter -> DataType.build (copies by default)
| return result_type | ||
|
|
||
|
|
||
| T = t.TypeVar("T", bound=exp.Expression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have E in sqlglot._typing and can use that here instead.
| if any(_is_float(e.type) for e in expressions_to_coerce): | ||
| self._set_type(expression, exp.DataType.Type.FLOAT) | ||
| return expression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like something that should be handled by a COERCES_TO override for Snowflake, i.e. NUMBER always coerces to FLOAT:
(sqlglot) ➜ sqlglot git:(main) snow sql -q "select system\$typeof(coalesce(cast(1 as number(4, 2)), cast(1 as float)))"
select system$typeof(coalesce(cast(1 as number(4, 2)), cast(1 as float)))
+--------------------------------------------------------------------+
| SYSTEM$TYPEOF(COALESCE(CAST(1 AS NUMBER(4, 2)), CAST(1 AS FLOAT))) |
|--------------------------------------------------------------------|
| FLOAT[DOUBLE] |
+--------------------------------------------------------------------+
|
Discussed in Slack and concluded that we'll temporarily close this to avoid:
|
Add snowflake specific coercion logic when the arguments are parameterized numbers. This affects the following functions:
The rules for coercing 2 parameterized numbers, type1 and type2, where the precision and scale of the first type is
p1ands2, and the precision and scale of the second type isp2ands2are:When a function can take more than 2 arguments, the final type is calculated by a reduce function using the pair coercion rules stated above.