Skip to content

Conversation

coderfender
Copy link
Contributor

@coderfender coderfender commented Aug 13, 2025

Which issue does this PR close?

Support ANSI mode for arithmetic operations in Spark #2137

Closes #2137.

Rationale for this change

Introduce ANSI support for arithmetic operations (for integer/ float types).
Now that try eval mode is supported with comet,this PR essentially tracks changes to take it a step further and support ANSI mode.

In order to achieve that, following changes are made :

  1. QueryPlanSerde is now updated to not fallback to Spark when EVAL mode is ANSI (for arithmetic operations only)
  2. planner.rs is now going to respect EVAL mode when creating a binary arithmetic function
  3. checked_arithmetic.rs now raises a spark equivalent exception on overflow / division by 0 if ANSI mode is enabled

What changes are included in this PR?

How are these changes tested?

  1. Unit tests in CometExpressionSuite.scala
  2. Spark tests

-->

@coderfender coderfender marked this pull request as draft August 13, 2025 01:08
@coderfender
Copy link
Contributor Author

coderfender commented Aug 14, 2025

Added test cases for addition , subtraction and multiplication use cases. Given that spark inherently converts decimal operands to doubles (and decimals in case of integral division in comet) , I am yet to make code changes needed to test out those use cases

@coderfender coderfender marked this pull request as ready for review August 14, 2025 22:36
@codecov-commenter
Copy link

codecov-commenter commented Aug 14, 2025

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 57.65%. Comparing base (f09f8af) to head (b64eb6d).
⚠️ Report is 482 commits behind head on main.

Files with missing lines Patch % Lines
...main/scala/org/apache/comet/serde/arithmetic.scala 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2136      +/-   ##
============================================
+ Coverage     56.12%   57.65%   +1.52%     
- Complexity      976     1297     +321     
============================================
  Files           119      147      +28     
  Lines         11743    13487    +1744     
  Branches       2251     2386     +135     
============================================
+ Hits           6591     7776    +1185     
- Misses         4012     4446     +434     
- Partials       1140     1265     +125     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@coderfender coderfender changed the title feat: init_ansi_mode_enabled feat: implement_ansi_mode_arithmetic Aug 15, 2025
@coderfender coderfender changed the title feat: implement_ansi_mode_arithmetic feat: implement_ansi_eval_mode_arithmetic Aug 15, 2025
@coderfender
Copy link
Contributor Author

coderfender commented Aug 15, 2025

@andygrove , @kazuyukitanimura
There is a test failure in Spark 4.0 where division operation is failing (as expected in ANSI mode) but the error message cannot divide by zero is what spark tests are expecting but now that the division op is sent to comet, the error message is different. We have two options to go forward

  1. Change the error message to match exactly with Spark
  2. Update test in spark , create a new diff file pass the test.

I am inclined towards #1 given the effort to reward ratio

@kazuyukitanimura
Copy link
Contributor

kazuyukitanimura commented Aug 15, 2025

Change the error message to match exactly with Spark

@coderfender I agree. Do you plan to change
native/spark-expr/src/error.rs ?
Hopefully we can make a message that can satisfy multiple Spark versions.

@coderfender
Copy link
Contributor Author

After some debugging early exception raising (from datafusion custom ANSI kernel) seems to be the issue . Below is the exact test which is causing a failure in Spark4.0

  test("fix ANSI support for divide") {
    val data = Seq((Integer.MIN_VALUE, 0))
    withSQLConf(
      SQLConf.ANSI_ENABLED.key -> "true",
      CometConf.COMET_ANSI_MODE_ENABLED.key -> "true",
      "spark.comet.explainFallback.enabled" -> "true") {
      withParquetTable(data, "tbl") {
        spark.sql("CREATE TABLE conditional_t USING PARQUET AS SELECT c1, c2 FROM VALUES(1d, 0),(2d, 1),(null, 1),(CAST('NaN' AS DOUBLE), 0) AS t(c1, c2);")
        val res = spark.sql("""
                              |SELECT coalesce(c2, 1/0) from conditional_t;
                              |  """.stripMargin)

        res.explain()
        checkSparkAnswerAndOperator(res)
      }
    }
  }

@coderfender
Copy link
Contributor Author

coderfender commented Aug 17, 2025

@kazuyukitanimura , @andygrove

I updated the exception / message to be exactly matching with Spark . However, it seems like the issue is with coalesce which in spark is ignoring divide by zero exception while it cant perform the same operation with comet (exception is raised on the native side) . I believe the options we have are

  1. Update diff file to expect an exception (Spark 4+)
  2. Update code to handle native exceptions on the Scala side (Not exactly sure if we have an existing mechanism to do that and less inclined to take this way forward) .

Would love to know your thoughts and also catch up if I am missing something.

Code to reproduce above error :

test("fix ANSI support coalesce") {
    val data = Seq((Integer.MIN_VALUE, 0))
    withSQLConf(
      SQLConf.ANSI_ENABLED.key -> "true",
      CometConf.COMET_ANSI_MODE_ENABLED.key -> "true") {
      withParquetTable(data, "tbl") {
        spark.sql(
          "CREATE TABLE conditional_t USING PARQUET AS SELECT c1, c2 FROM VALUES(1d, 0),(2d, 1),(null, 1),(CAST('NaN' AS DOUBLE), 0) AS t(c1, c2);")
        val res = spark.sql(s"""
                              |SELECT coalesce(c2, 1/0) from conditional_t;
                              |  """.stripMargin)
        
        checkSparkMaybeThrows(res) match {
          case (Some(sparkExc), Some(cometExc)) =>
            val cometErrorPattern =
              """org.apache.comet.CometNativeException"""
            assert(cometExc.getMessage.contains(cometErrorPattern))
            assert(sparkExc.getMessage.contains("Division by zero"))
          case _ => fail("Exception should be thrown")
        }

      }
    }
  }
  

@coderfender
Copy link
Contributor Author

coderfender commented Aug 20, 2025

Investigating into new test failures

@parthchandra
Copy link
Contributor

here is a test failure in Spark 4.0 where division operation is failing (as expected in ANSI mode) but the error message cannot divide by zero is what spark tests are expecting but now that the division op is sent to comet, the error message is different. We have two options to go forward

  1. Change the error message to match exactly with Spark
  2. Update test in spark , create a new diff file pass the test.

I am inclined towards #1 given the effort to reward ratio

@mbutrovich I seem to recall that you had a solution for this kind of test failure

@coderfender
Copy link
Contributor Author

coderfender commented Aug 23, 2025

@parthchandra , thank you for the review. coalesce in spark behaves differently than native comet (essentially performing lazy evaluation of the operands naturally suppressing exceptions). I was able to get that fixed with help from @kazuyukitanimura to disable coalesce when ANSI mode is enabled . However, there seems to be a new spark test failure ( apart from the PR build failure which I fixed by generating new set of golden files) and I am having some troubles in finding the exact test which failed. Once that is fixed this PR should be ready for further review

Failing test logs : https://github.com/apache/datafusion-comet/actions/runs/17112443091/job/48536768203?pr=2136

Thank you

@coderfender coderfender force-pushed the support_ansi_mode_arithmetic_functions branch from 922d7fc to 1022595 Compare August 23, 2025 21:21
@coderfender
Copy link
Contributor Author

I was able to comb through the failed tests and fixed the failing int check case match statement

@coderfender
Copy link
Contributor Author

Summary of changes

Native Side :

  1. Implemented ANSI mode to throw exceptions for arithmetic operations
  2. Changes to planner.rs to support eval mode all the way to scalar function UDF
  3. Update checked_arithmetic.rs to return NULL or throw Spark like exception depending on overflow.
  4. Add support to Float data types (in addition to int data types)

Scala side:

  1. Disable replacing 0's with NULLs when ANSI mode is enabled for arithmetic operation
  2. Disable comet coalesce when ANSI mode is enabled. This is because coalesce in comet throws an exception while spark suppresses it
    (This needed generation of new golden files on the comet side given that plans are changed AFAIK)

cc : @andygrove , @kazuyukitanimura

@andygrove
Copy link
Member

2. Disable comet coalesce when ANSI mode is enabled. This is because coalesce in comet throws an exception while spark suppresses it

To clarify this a bit more, Spark performs lazy evaluation on the inputs to coalesce and will stop evaluating as soon as it finds the first non-null input. Comet (and DataFusion) currently evaluate all of the inputs. This is why we see an error in Comet but not in Spark.

@andygrove
Copy link
Member

  1. Disable comet coalesce when ANSI mode is enabled. This is because coalesce in comet throws an exception while spark suppresses it

To clarify this a bit more, Spark performs lazy evaluation on the inputs to coalesce and will stop evaluating as soon as it finds the first non-null input. Comet (and DataFusion) currently evaluate all of the inputs. This is why we see an error in Comet but not in Spark.

I filed #2231 to track this on Comet side

@coderfender coderfender force-pushed the support_ansi_mode_arithmetic_functions branch 4 times, most recently from d3ee468 to a740ee3 Compare September 4, 2025 18:39
@coderfender
Copy link
Contributor Author

Rebased with main branch after lazy evaluation changes (Coalesce) are merged

@coderfender coderfender force-pushed the support_ansi_mode_arithmetic_functions branch from 4d825d8 to 58a1bee Compare September 6, 2025 06:57
@coderfender
Copy link
Contributor Author

Resolved issues with failing tests caused by incorrect diff file generation .

@coderfender coderfender force-pushed the support_ansi_mode_arithmetic_functions branch from da41f9d to b64eb6d Compare September 9, 2025 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable ANSI support for arithmetic operations
5 participants