Skip to content

Conversation

coderfender
Copy link
Contributor

@coderfender coderfender commented Aug 6, 2025

Which issue does this PR close?

Closes #2021

Rationale for this change

PR to support Try eval mode in native . Unfortunately neither DataFusion not Arrow crates support null on overflow ops which is the desired outcome in Spark (when Eval mode is set to Try)

What changes are included in this PR?

  1. New UDF called checked_arithmetic which would perform checked_add / checked_subtract/ checked_multiply over operands and wraps overflow in a NULL . (There aren't direct DataFusion options / Arrow Kernel APIs which would provide this functionality out of the box and hence the need for custom kernel + UDF based solution)
  2. On the Spark side, the check to fallback to Spark when the EVAL Mode is set to Try is removed for above arithmetic ops.

How are these changes tested?

  1. Implemented unit tests with various overflow edge cases (add , subtract , multiple , divide etc)

@coderfender coderfender marked this pull request as draft August 6, 2025 00:38
@coderfender
Copy link
Contributor Author

Hello @andygrove , I Implemented custom arrow kernels to perform checked_add , checked_sub and checked_mul (registered as UDFS) supporting Integral types only (similar to spark's behavior) . My hope is to repeat / reuse this for other ops (in future), now that there is a framework established.

Please take a look whenever you get a chance and I can make changes (if any) to support try eval mode. Thank you very much

@coderfender coderfender marked this pull request as ready for review August 7, 2025 16:37
@codecov-commenter
Copy link

codecov-commenter commented Aug 7, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 58.45%. Comparing base (f09f8af) to head (bd002fa).
⚠️ Report is 381 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2073      +/-   ##
============================================
+ Coverage     56.12%   58.45%   +2.33%     
- Complexity      976     1253     +277     
============================================
  Files           119      143      +24     
  Lines         11743    13192    +1449     
  Branches       2251     2370     +119     
============================================
+ Hits           6591     7712    +1121     
- Misses         4012     4256     +244     
- Partials       1140     1224      +84     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderfender coderfender force-pushed the fix_eval_try_mode_spark branch from 2f735f2 to e539695 Compare August 7, 2025 21:35
@coderfender
Copy link
Contributor Author

@andygrove ,
Here is the summary of changes :

  1. Spark's Try eval mode returns NULL in case there is a computation failure. Note that this is only supported / useful when the operands are integer (Int / Long) type since overflow on Float , Double and Decimal are non deterministic and/or return Nan/ Inf.
  2. Since neither data fusion nor Arrow kernels have native implementation of Spark's Try eval mode , I went ahead and implemented custom UDFs (with custom Arrow kernels) which perform checked_add, checked_sub, checked_mul , checked_div which return None when overflow occurs.
  3. I also verified that div and integer_div do work with the Try mode and added tests to check all possible edge cases.
  4. fail_on_overflow param is added to create_physical_expr function to fork code to call UDF based on the Eval Option selected.
  5. Once this PR gets approved / merged, I will go ahead and continue to use this framework to implement the EVAL mode for other ops such as cast while also refactoring abs , modulo operations

@andygrove
Copy link
Member

@coderfender looks like there are some clippy issues to be resolved

@@ -231,7 +231,7 @@ impl PhysicalPlanner {
) -> Result<Arc<dyn PhysicalExpr>, ExecutionError> {
match spark_expr.expr_struct.as_ref().unwrap() {
ExprStruct::Add(expr) => {
// TODO respect eval mode
// TODO respect ANSI eval mode
// https://github.com/apache/datafusion-comet/issues/2021
// https://github.com/apache/datafusion-comet/issues/536
let _eval_mode = from_protobuf_eval_mode(expr.eval_mode)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the leading _ from the variable name now that we are using the variable

Comment on lines 44 to 45
match op {
"checked_add" => builder.append_option(l.add_checked(r).ok()),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performing this match operation on every row will be expensive. It would be better to invert this and to the match once and then have a different for loop for each operation, something like this:

match op {
  "checked_add" =>
    for i in 0..len {
      ...
    }
  "checked_sub" =>
    for i in 0..len {
      ...
    }    

@andygrove
Copy link
Member

@coderfender I took a first pass through this, and I think this is looking good 👍

@@ -878,6 +879,7 @@ impl PhysicalPlanner {
return_type: Option<&spark_expression::DataType>,
op: DataFusionOperator,
input_schema: SchemaRef,
fail_on_overflow: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe consider passing in the eval mode here instead of a boolean? We'll eventually need to support all three modes.

Copy link
Contributor Author

@coderfender coderfender Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thats a great idea!

@coderfender
Copy link
Contributor Author

One of the TPC-H check failed with a network exception. @andygrove could you please re trigger that workflow whenever you get a chance?
Thank you

@coderfender
Copy link
Contributor Author

@andygrove , thank you for restarting the failed job and glad to see that the checks have all passed. Please review once you get a chance and let me know if you think we need further changes.

Thank you

(l.to_array_of_size(r.len())?, Arc::clone(r))
}
(ColumnarValue::Array(l), ColumnarValue::Scalar(r)) => {
(Arc::clone(l), r.to_array_of_size(l.len())?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may eventually want to have a specialized version of the kernel for the scalar case to avoid the overhead of creating an array from the scalar. This does not need to happen as part of this PR, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure ! I will create a follow up enhancement to track changes for a scalar impl. Thank you for the feed back @andygrove .

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @coderfender

@andygrove
Copy link
Member

@coderfender could you fix the conflicts?

@coderfender
Copy link
Contributor Author

Thank you very much for the approval @andygrove . I am going to push a fix shortly after fixing conflicts.

@coderfender
Copy link
Contributor Author

coderfender commented Aug 12, 2025

@andygrove , There is a test failure with the below error after rebase with main branch. I am currently investigating the failure and patch a potential fix

@coderfender coderfender force-pushed the fix_eval_try_mode_spark branch from cfba67a to bd002fa Compare August 12, 2025 19:42
@coderfender
Copy link
Contributor Author

@andygrove the checks have all passed. Thank you for your approval .Please merge once you get a chance

@andygrove andygrove merged commit 7976b94 into apache:main Aug 12, 2025
91 checks passed
@coderfender
Copy link
Contributor Author

Thank you very much for merging feature branch Andy . I created a new issue to extend on these changes and support ANSI mode for above arithmetic operations : #2137 (and raised a WIP PR #2136 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

try_ arithmetic functions return incorrect results
3 participants