-
Notifications
You must be signed in to change notification settings - Fork 236
fix : implement_try_eval_mode_arithmetic #2073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @andygrove , I Implemented custom arrow kernels to perform checked_add , checked_sub and checked_mul (registered as UDFS) supporting Integral types only (similar to spark's behavior) . My hope is to repeat / reuse this for other ops (in future), now that there is a framework established. Please take a look whenever you get a chance and I can make changes (if any) to support try eval mode. Thank you very much |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2073 +/- ##
============================================
+ Coverage 56.12% 58.45% +2.33%
- Complexity 976 1253 +277
============================================
Files 119 143 +24
Lines 11743 13192 +1449
Branches 2251 2370 +119
============================================
+ Hits 6591 7712 +1121
- Misses 4012 4256 +244
- Partials 1140 1224 +84 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
2f735f2
to
e539695
Compare
@andygrove ,
|
@coderfender looks like there are some clippy issues to be resolved |
native/core/src/execution/planner.rs
Outdated
@@ -231,7 +231,7 @@ impl PhysicalPlanner { | |||
) -> Result<Arc<dyn PhysicalExpr>, ExecutionError> { | |||
match spark_expr.expr_struct.as_ref().unwrap() { | |||
ExprStruct::Add(expr) => { | |||
// TODO respect eval mode | |||
// TODO respect ANSI eval mode | |||
// https://github.com/apache/datafusion-comet/issues/2021 | |||
// https://github.com/apache/datafusion-comet/issues/536 | |||
let _eval_mode = from_protobuf_eval_mode(expr.eval_mode)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the leading _
from the variable name now that we are using the variable
match op { | ||
"checked_add" => builder.append_option(l.add_checked(r).ok()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performing this match
operation on every row will be expensive. It would be better to invert this and to the match
once and then have a different for
loop for each operation, something like this:
match op {
"checked_add" =>
for i in 0..len {
...
}
"checked_sub" =>
for i in 0..len {
...
}
@coderfender I took a first pass through this, and I think this is looking good 👍 |
native/core/src/execution/planner.rs
Outdated
@@ -878,6 +879,7 @@ impl PhysicalPlanner { | |||
return_type: Option<&spark_expression::DataType>, | |||
op: DataFusionOperator, | |||
input_schema: SchemaRef, | |||
fail_on_overflow: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe consider passing in the eval mode here instead of a boolean? We'll eventually need to support all three modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thats a great idea!
One of the TPC-H check failed with a network exception. @andygrove could you please re trigger that workflow whenever you get a chance? |
@andygrove , thank you for restarting the failed job and glad to see that the checks have all passed. Please review once you get a chance and let me know if you think we need further changes. Thank you |
(l.to_array_of_size(r.len())?, Arc::clone(r)) | ||
} | ||
(ColumnarValue::Array(l), ColumnarValue::Scalar(r)) => { | ||
(Arc::clone(l), r.to_array_of_size(l.len())?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may eventually want to have a specialized version of the kernel for the scalar case to avoid the overhead of creating an array from the scalar. This does not need to happen as part of this PR, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure ! I will create a follow up enhancement to track changes for a scalar impl. Thank you for the feed back @andygrove .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @coderfender
@coderfender could you fix the conflicts? |
Thank you very much for the approval @andygrove . I am going to push a fix shortly after fixing conflicts. |
@andygrove , There is a test failure with the below error after rebase with main branch. I am currently investigating the failure and patch a potential fix
|
cfba67a
to
bd002fa
Compare
@andygrove the checks have all passed. Thank you for your approval .Please merge once you get a chance |
Which issue does this PR close?
Closes #2021
Rationale for this change
PR to support Try eval mode in native . Unfortunately neither DataFusion not Arrow crates support null on overflow ops which is the desired outcome in Spark (when Eval mode is set to Try)
What changes are included in this PR?
checked_arithmetic
which would perform checked_add / checked_subtract/ checked_multiply over operands and wraps overflow in a NULL . (There aren't direct DataFusion options / Arrow Kernel APIs which would provide this functionality out of the box and hence the need for custom kernel + UDF based solution)How are these changes tested?