fix: preserve timestamp precision when coercing mixed time units#22759
Conversation
|
@alamb ptal if you have time! Thank you. |
I think they are related, but not quite the same. Specifically #21908 I think is for the unwrap in cast optimization (e.g. I think this is about coercing types when comparing across unions and comparisons |
alamb
left a comment
There was a problem hiding this comment.
Makes sense to me -- thank you @fengys1996 and @Jefffrey
| } | ||
|
|
||
| fn date_scalar_value_as_i64(&self) -> Option<i64> { | ||
| fn temporal_scalar_value_as_i64(&self) -> Option<i64> { |
There was a problem hiding this comment.
It might make sense to keep treating date and timestamp separately for consistency with the rest of the code. For example keep date_scalar_value_as_i64 and add timestamp_scalar_value_as_i64
But this looks fine too
| fn timeunit_coercion(lhs_unit: &TimeUnit, rhs_unit: &TimeUnit) -> TimeUnit { | ||
| use arrow::datatypes::TimeUnit::*; | ||
| match (lhs_unit, rhs_unit) { | ||
| (Second, Millisecond) => Second, |
There was a problem hiding this comment.
here is an example where the old code would lose data (it would convert millisecond precision to second precision)
Which issue does this PR close?
Rationale for this change
DataFusion coerced Timestamp values with different TimeUnits to the coarser unit which result in a loss of precision and lead to incorrect comparison semantics
What changes are included in this PR?
Are these changes tested?
Yes. Added unit tests for timestamp coercion and overflow checks, plus sqllogictest coverage for comparison, subtraction, UNION, and COALESCE.
Are there any user-facing changes?
Yes. Timestamp operations with mixed time units now coerce to the finer unit. If that conversion would overflow i64, the query returns an overflow error.