Skip to content

Spaceship operator (<=>) not supported #14098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ion-elgreco opened this issue Jan 12, 2025 · 13 comments · Fixed by #14187
Closed

Spaceship operator (<=>) not supported #14098

ion-elgreco opened this issue Jan 12, 2025 · 13 comments · Fixed by #14187
Labels
enhancement New feature or request

Comments

@ion-elgreco
Copy link

ion-elgreco commented Jan 12, 2025

Is your feature request related to a problem or challenge?

I am trying to add support for Generated Columns in Delta-rs. For this to be functional, we need the spaceship operator to be implemented in datafusion SQL. Currently it's not supported as we can see here:

DeltaError: Generic DeltaTable error: This feature is not implemented: Unsupported SQL binary operator Spaceship

Describe the solution you'd like

Add support for parsing <=> if it exists in a sql statement.

Describe alternatives you've considered

No response

Additional context

No response

@ion-elgreco ion-elgreco added the enhancement New feature or request label Jan 12, 2025
@alamb
Copy link
Contributor

alamb commented Jan 13, 2025

It looks to me like the <=> operator is the same logically as the existing Operator IS NOT DISTINCT FROM:

https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Operator.html#variant.IsNotDistinctFrom

I also double checked that sqlparser already supports spaceship: https://docs.rs/sqlparser/latest/sqlparser/ast/enum.BinaryOperator.html#variant.Spaceship

Thus, implementing this feature would be a relatively simple matter

  1. Add planning support to plan <=> into DataFusion's existing IsNotDistinctFrom operator
  2. Add some tests

@alamb
Copy link
Contributor

alamb commented Jan 13, 2025

Planning support might be as simple as adding the appropriate translation here:

pub(crate) fn parse_sql_binary_op(&self, op: BinaryOperator) -> Result<Operator> {

@ion-elgreco
Copy link
Author

@alamb thanks for the pointers! I will take a look over the weekend to add this :D

@Spaarsh
Copy link
Contributor

Spaarsh commented Jan 15, 2025

@ion-elgreco @alamb if it is fine by you, can I work on this? I am just getting familiar with the codebase with a few commits here and there :D

@ion-elgreco
Copy link
Author

@Spaarsh go ahead! :)

@Spaarsh
Copy link
Contributor

Spaarsh commented Jan 18, 2025

Apologies for the delay! @ion-elgreco is this the output you are expecting? If so, I shall make a PR including some tests as well!
Image

@ion-elgreco
Copy link
Author

Yess, you could add also a sanity test:

format!("{Value1} = {value2} OR (value1} IS NULL AND {value2} IS NULL)"),

@Spaarsh
Copy link
Contributor

Spaarsh commented Jan 18, 2025

@ion-elgreco sure! I am still trying to figure how tests are written but as soon as I figure that out, I'll include your suggestion!

@alamb
Copy link
Contributor

alamb commented Jan 18, 2025

@ion-elgreco sure! I am still trying to figure how tests are written but as soon as I figure that out, I'll include your suggestion!

I think this document describes how to write tests pretty well: https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/README.md

Thanks @Spaarsh

@alamb
Copy link
Contributor

alamb commented Jan 18, 2025

@Spaarsh
Copy link
Contributor

Spaarsh commented Jan 18, 2025

@ion-elgreco based on your suggestion, I added a sanity test as well. It is as follows:

SELECT 
  (column1 <=> column2) = 
  (IFNULL(column1, false) = IFNULL(column2, false)) AS comparison_result
FROM (VALUES 
  (1, 1),      -- equal values
  (1, 2),      -- different values
  (NULL, NULL), -- both NULL
  (1, NULL),    -- first not NULL, second NULL
  (NULL, 1)     -- first NULL, second not NULL
) AS t(column1, column2);

My previous sanity test was:

SELECT 
  (column1 <=> column2) = 
  (column1 = column2 OR (column1 IS NULL AND column2 IS NULL))
FROM (VALUES 
  (1, 1),      -- equal values
  (1, 2),      -- different values
  (NULL, NULL), -- both NULL
  (1, NULL),    -- first not NULL, second NULL
  (NULL, 1)     -- first NULL, second not NULL
) as t;

But it failed in the last two cases since the right-hand side of the queries reduced to:
(1=NULL OR (false AND true) --> (NULL OR false) --> NULL
and (NULL=1 OR (true AND false) --> (NULL OR false) --> NULL while the left-hand side ie., the <=> operator returned false.

Hence, in order to address this, the test needed to be changed.

Please correct me if this is problematic!
Thanks!

@ion-elgreco
Copy link
Author

@Spaarsh actually I missed the null propagation part ; ) So I would discard my suggestion

@Spaarsh
Copy link
Contributor

Spaarsh commented Jan 18, 2025

@ion-elgreco no issues. Should I discard the sanity test then? It does work logically.

If both are NULL values, they both are assigned the false value, returning a true value.

If neither value is NULL, it works as a normal equal to sign.

If either of them is NULL, a number is checked for equality against false, returning a false value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants