Skip to content

Conversation

@fivetran-MichaelLee
Copy link
Collaborator

@fivetran-MichaelLee fivetran-MichaelLee commented Oct 31, 2025

Add snowflake specific coercion logic when the arguments are parameterized numbers. This affects the following functions:

  • NULLIF
  • IFNULL
  • IFF
  • COALESCE
  • GREATEST
  • LEAST

The rules for coercing 2 parameterized numbers, type1 and type2, where the precision and scale of the first type is p1 and s2, and the precision and scale of the second type is p2 and s2 are:

  • If p1 >= p2 AND s1 >= s2: return type1
  • If p2 >= p1 AND s2 >= s1: return type2
  • Otherwise: return DECIMAL(min(38, max(p1, p2) + |s2 - s1|), max(s1, s2))

When a function can take more than 2 arguments, the final type is calculated by a reduce function using the pair coercion rules stated above.

@fivetran-MichaelLee fivetran-MichaelLee self-assigned this Oct 31, 2025
@fivetran-MichaelLee fivetran-MichaelLee marked this pull request as ready for review October 31, 2025 16:00
@georgesittas
Copy link
Collaborator

Let's check whether these rules apply to other operators as well, e.g., DECIMAL1 + DECIMAL2, etc. If that's the case, then it may be worth extending / overriding _maybe_coerce to incorporate them for Snowflake, so we don't have to do this for every expression separately.

@fivetran-MichaelLee fivetran-MichaelLee force-pushed the mlee/annotate-snowflake-nullif branch from 7946b90 to 0778d24 Compare November 5, 2025 02:17
@fivetran-MichaelLee fivetran-MichaelLee changed the title feat(optimizer)!: Annotate numeric parameterized types for NULLIF feat(optimizer)!: Annotate parameterized numeric types for Snowflake functions Nov 5, 2025
Copy link
Collaborator

@georgesittas georgesittas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work & research @fivetran-MichaelLee, this is super valuable.

After reviewing this PR, I feel hesitant merging it as-is, mainly because it's uncovered some existing inconsistencies / feature gaps and my hunch is that once we properly address those, then the current solution can be made much simpler.

I suggest pausing for now and thinking this through a bit more.


if len(expressions) >= 1:
p_expr = expressions[0]
if isinstance(p_expr, exp.DataTypeParam) and isinstance(p_expr.this, exp.Literal):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be surprised if non-integers were syntactically/semantically valid. I think it's safe to assume that only integers can be used for the precision and scale parameters.

Additionally, numerics always have default precision and scale if missing, e.g., NUMBER is a synonym of NUMBER(38, 0) (and is parsed as such). So we could also do something like:

precision_value = precision.this.to_py() if isinstance(precision, exp.DataTypeParam) or 38
scale_value = scale.this.to_py() if isinstance(scale, exp.DataTypeParam) or 0

And so it should be possible to simplify this whole logic quite a bit.

Comment on lines +141 to +149
if expression.is_number:
precision, scale = _extract_literal_precision_scale(expression.this)
return exp.DataType(
this=exp.DataType.Type.DECIMAL,
expressions=[
exp.DataTypeParam(this=exp.Literal.number(precision)),
exp.DataTypeParam(this=exp.Literal.number(scale)),
],
)
Copy link
Collaborator

@georgesittas georgesittas Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should override the Literal annotator for this:

# In typing/snowflake.py
def _annotate_literal(self, expression):
    expression = self._annotate_literal(expression)
    if expression.is_type(exp.DataType.Type.DECIMAL):
        precision, scale = _extract_literal_precision_scale(expression.this)
        expression.type.set("expressions", [exp.DataTypeParam(...), exp.DataTypeParam(...)])

    return expression

Comment on lines +151 to +158
if expression.type.is_type(*exp.DataType.INTEGER_TYPES) and not expression.type.expressions:
return exp.DataType(
this=exp.DataType.Type.DECIMAL,
expressions=[
exp.DataTypeParam(this=exp.Literal.number(38)),
exp.DataTypeParam(this=exp.Literal.number(0)),
],
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting change. Indeed, according to the docs, integers are really NUMBER(38, 0).

I wonder if we should hook into _set_type for Snowflake and intercept integer type annotations in order to automatically change them to NUMBER(38, 0) and coercion works as expected... This feels like a much bigger change, though.

(sqlglot) ➜  sqlglot git:(main) snow sql -q "select system\$typeof(cast(1 as int) + cast(2 as number(5, 3)))"
select system$typeof(cast(1 as int) + cast(2 as number(5, 3)))
+---------------------------------------------------------+
| SYSTEM$TYPEOF(CAST(1 AS INT) + CAST(2 AS NUMBER(5, 3))) |
|---------------------------------------------------------|
| NUMBER(38,3)[SB2]                                       |
+---------------------------------------------------------+

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also exposes an inconsistency where at parse time we change NUMBER (or equivalent, e.g. DECIMAL) to NUMBER(38, 0), but we don't do it for these integer types...

Comment on lines +174 to +178
if p1 >= p2 and s1 >= s2:
return type1.copy()

if p2 >= p1 and s2 >= s1:
return type2.copy()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to copy() here, this is taken care of in _set_type -> type.setter -> DataType.build (copies by default)

return result_type


T = t.TypeVar("T", bound=exp.Expression)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have E in sqlglot._typing and can use that here instead.

Comment on lines +236 to +238
if any(_is_float(e.type) for e in expressions_to_coerce):
self._set_type(expression, exp.DataType.Type.FLOAT)
return expression
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like something that should be handled by a COERCES_TO override for Snowflake, i.e. NUMBER always coerces to FLOAT:

(sqlglot) ➜  sqlglot git:(main) snow sql -q "select system\$typeof(coalesce(cast(1 as number(4, 2)), cast(1 as float)))"
select system$typeof(coalesce(cast(1 as number(4, 2)), cast(1 as float)))
+--------------------------------------------------------------------+
| SYSTEM$TYPEOF(COALESCE(CAST(1 AS NUMBER(4, 2)), CAST(1 AS FLOAT))) |
|--------------------------------------------------------------------|
| FLOAT[DOUBLE]                                                      |
+--------------------------------------------------------------------+

@georgesittas
Copy link
Collaborator

Discussed in Slack and concluded that we'll temporarily close this to avoid:

  • Introducing new tech debt that may be unnecessary after we address some existing debt
  • Destabilizing main with new coercions, since there's a non-zero risk associated with these and so needs to be done right

@georgesittas georgesittas deleted the mlee/annotate-snowflake-nullif branch November 5, 2025 13:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants