[datafusion-spark] Support 2-argument ceil(value, scale)#21710
[datafusion-spark] Support 2-argument ceil(value, scale)#21710diegoQuinas wants to merge 13 commits into
Conversation
There was a problem hiding this comment.
@diegoQuinas
Thanks for the work here.
There are a few inconsistencies between the declared return types and the actual execution paths that could lead to planner/runtime mismatches. I’ve called those out below. Once those are addressed and we have a couple of type-contract tests in place, this should be in a much safer spot.
There was a problem hiding this comment.
@diegoQuinas
Nice follow-up overall. This addresses all the original comments and the refactor looks clean. The type handling is much more consistent now and the SLT coverage is solid. I left a few small suggestions, mostly around the decimal 2-arg behavior and some minor readability and edge-case clarifications.
| // Decimal128 with positive scale (1-arg only). | ||
| ScalarValue::Decimal128(_, _, _) if has_scale => { | ||
| return not_impl_err!( | ||
| "2-argument ceil is not yet supported for decimal inputs" |
There was a problem hiding this comment.
If we aren't supporting this yet we should keep the original issue open instead of marking this PR as closing it
There was a problem hiding this comment.
I'm working on this. Maybe this night I add it.
|
Thank you for opening this pull request! Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch). Details |
- Fix type contract: 2-arg ceil now preserves input type for any scale (incl. scale=0 on floats); decimal 2-arg surfaces NotImplemented - Implement integer ceil at negative scale (ceiling toward +inf, distinct from HALF_UP round); positive scale on integer is a no-op - Extract shared get_scale helper into math/scale.rs and reuse from round - Simplify Signature using TypeSignature::Numeric + Coercible([Numeric, Int32]) instead of hand-enumerated type lists - Move Rust unit tests into the SLT, including type-contract assertions via arrow_typeof for every 2-arg overload
808a7b6 to
bc19931
Compare
The return_type branch for Decimal128 with a scale argument used exec_err! while the two evaluate paths use not_impl_err!. The ceil.slt test expects the "This feature is not implemented" message, so the mismatch failed CI. Make all three paths consistent.
An unrelated submodule bump (13bbae387 -> eccb0e4a4) was accidentally committed alongside the ceil work. Restore it to the value tracked by main so the diff stays scoped to the ceil change.
|
I was checking some outputs against Spark 4.1.2, and it seems they actually operate only on decimal types for the 2-arg variant. See: >>> spark.sql("select ceil(-25::int)").printSchema()
root
|-- CEIL(CAST(-25 AS INT)): long (nullable = true)
>>> spark.sql("select ceil(-25::int, -1)").printSchema()
root
|-- ceil(CAST(-25 AS INT), -1): decimal(11,0) (nullable = true)And see reference code: case class RoundCeil(child: Expression, scale: Expression)
extends RoundBase(child, scale, BigDecimal.RoundingMode.CEILING, "ROUND_CEILING") {
override def inputTypes: Seq[AbstractDataType] = Seq(DecimalType, IntegerType)
override def nodeName: String = "ceil"
override protected def withNewChildrenInternal(
newLeft: Expression, newRight: Expression): RoundCeil =
copy(child = newLeft, scale = newRight)
}
So I think this PR won't align with Spark since we need to operate only on decimals (and subsequently return decimal types)? |
Which issue does this PR close?
Part of #21560
Rationale for this change
The Spark
ceilfunction supports an optionalscaleparameter that controlsthe decimal position to round up to. This was not yet implemented in
datafusion-spark.
What changes are included in this PR?
Signatureto accept 1 or 2 arguments, following the same pattern asSparkRoundreturn_type: floats preserve their type when a scale is provided (instead of returningInt64); scale=0 preserves the original behaviorget_scale()helper to extract the optional scale argument, returningNonefor NULL scale (which produces a NULL result)ceil_float()helper for ceiling floats at arbitrary decimal positionsspark_ceil_scalarandspark_ceil_arrayto apply the scaleAre these changes tested?
Yes, unit tests covering:
Are there any user-facing changes?
Yes —
ceil(expr, scale)is now supported in addition to the existingceil(expr).