Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 2, 2025

Thank you for the time you are putting into AlaSQL!

Description

This PR fixes issue #1414 where AlaSQL's IN and NOT IN operators were not following SQL semantics when handling NULL values. The implementation uses utility functions to correctly enforce SQL three-valued logic (TRUE, FALSE, UNKNOWN) for NULL comparisons, providing a clean and maintainable solution.

Problem

Previously, when a subquery or list contained NULL values:

alasql('CREATE TABLE R (a number)');
alasql('CREATE TABLE S (b number)');
alasql.tables.R.data = [ {a: 1}, {a: null} ];
alasql.tables.S.data = [ {b: null} ];
var res = alasql('select a from R where a not in (select b from S)');
// Expected: res = []
// Previous: res = [ {"a": 1} ]  ❌
// Now:      res = []             ✅

Solution

Added two utility functions and modified src/50expression.js to use them for SQL-compliant NULL handling:

  • NOT IN with NULL in subquery/list: Returns UNKNOWN (treated as false in WHERE clauses) when the left value doesn't match any value and the list contains NULL
  • NULL on left side: Both IN and NOT IN return UNKNOWN (false) when the left operand is NULL
  • IN with NULL: Returns TRUE only if there's an exact match with a non-NULL value; otherwise returns UNKNOWN (false) if NULL is present in the list
  • Clean implementation: Utility functions centralize the NULL-handling logic, making the code maintainable and less invasive

Changes Made

  1. src/15utility.js: Added two utility functions (37 lines):
    • sqlInCheck: Handles IN operator with SQL-compliant NULL semantics
    • sqlNotInCheck: Handles NOT IN operator with SQL-compliant NULL semantics
  2. src/50expression.js: Modified to use utility functions for NULL handling (net reduction of 132 lines):
    • Integrates with existing caching optimization from develop branch
    • Simple function calls replace complex inline logic
    • Works with both cached and uncached code paths
  3. test/test1414.js: Added comprehensive test suite with 6 test cases covering:
    • NOT IN with NULL in subquery
    • NOT IN with NULL on left side
    • NOT IN without NULL (baseline behavior)
    • NOT IN with multiple values including NULL
    • NOT IN with array literal containing NULL
    • IN with NULL in subquery

Testing

  • ✅ All 6 new tests pass
  • ✅ All 1896 existing tests pass (no regressions after merge with develop)
  • ✅ Code review completed and feedback addressed
  • ✅ Security scan completed (0 vulnerabilities found)
  • ✅ Build process completes successfully
  • ✅ Net code reduction of 95 lines (added 37, removed 132)

The implementation follows SQL standard behavior where NULL comparisons produce UNKNOWN, which is treated as false in WHERE clause filtering. The utility-based approach provides a clean, maintainable solution that integrates seamlessly with existing caching optimization while ensuring SQL compliance.

Original prompt

This section details on the original issue you should resolve

<issue_title>Comprison on NULL differes from SQL</issue_title>
<issue_description>Hi,

We have noticed that the result of the following example doesn't return the expected result (based on https://doi.org/10.14778/3151113.3151116)

alasql('CREATE TABLE R (a number)');
alasql('CREATE TABLE S (b number)');
alasql.tables.R.data = [ {a: 1}, {a: null} ];
alasql.tables.S.data = [ {b: null} ];
var res = alasql('select a from R where a not in (select b from S)');
// expected: res = []
// alasql:   res = [ {"a": 1} ]

See https://jsfiddle.net/3ya794kr/.

This query compares each row of R with NULL which always returns unknown in SQL. Since not in unknown is unknown, the result should be the empty collection.
</issue_description>

Comments on the Issue (you are @copilot in this section)

@mathiasrw It is a very interesting areas where AlaSQL is stepping in the two worlds of NULL...

The programming language null is a value but database null is not a value.

I found great joy in reading this a while ago: https://arxiv.org/html/1606.00740

@piotr-m-bielski this will likely be a breaking change. I suggest we identify how to solve it, and then consider if its possible to make an option to use the old way or the SQL way. </comment_new>
<comment_new>@mathiasrw
Oh - we can work on a solution. So you can just start on this. But before the release we will discuss how to let the users configure this. How we do that depends on what the solution looks like. So I suggest you work on solving this, and then we talk about how the solution can be part of what Alasql offers. </comment_new>
<comment_new>@mathiasrw
@piotr-m-bielski Have you had a chance to dive into this? Any ideas?

@mathiasrw In agree. Its a nasty beast to dive into.

In short: the compiler makes an AST. The AST is transformed into a string of the javascript doing all the things needed to execute the SQL. Then the string is run with data as parameters.

NULL has a sad story in AlaSQL with the original design not tapping into its power, but treating undefined and null as kind of the same. </comment_new>
<comment_new>@mathiasrw
You can also have a look at #1415 (comment) for some inputs regarding diving into the code.

@mathiasrw Thank you for trying.

Di you find out anything interesting? Any thoughts to share?</comment_new>
<comment_new>@mathiasrw
@aotru Thank you so much.

Advice: Make a test first that illustrates the problem.

You might also benefit from the suggestions in this this comment about one way of running alasql with breaks and steps: #1415 (comment)</comment_new>
<comment_new>@mathiasrw
Any luck @aotru ?


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@mathiasrw
Copy link
Member

@copilot
Continue

Copy link
Contributor Author

Copilot AI commented Dec 4, 2025

@copilot
Continue

I've completed the work on fixing NULL comparison in IN and NOT IN operators to follow SQL semantics.

Changes implemented:

  1. Modified src/50expression.js to handle NULL values correctly in both IN and NOT IN operators
  2. Added comprehensive test suite in test/test1414.js with 6 test cases

SQL semantics now enforced:

  • x NOT IN (subquery with NULL) returns UNKNOWN (false) when x doesn't match any value and subquery contains NULL
  • NULL NOT IN (any list) returns UNKNOWN (false)
  • x IN (subquery with NULL) returns TRUE only if x matches a non-NULL value

Testing:

  • All 6 new tests pass ✓
  • All 1817 existing tests still pass ✓
  • No security vulnerabilities found ✓

Commits: 13261b4 (context fix), 06a51a1 (main implementation)

Copilot AI changed the title [WIP] Comprison on NULL differes from SQL Fix NULL comparison in IN and NOT IN operators to follow SQL semantics Dec 4, 2025
@mathiasrw mathiasrw added this to the Next major version bump milestone Dec 4, 2025
@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Dec 6, 2025

Found 1 test failure on Blacksmith runners:

Failure

Test View Logs
Test 1414 - NOT IN with NULL values should follow SQL semantics/
A) NOT IN with NULL in subquery should return empty result
View Logs

Fix in Cursor

@mathiasrw
Copy link
Member

@copilot

325 passing (458ms)
11 pending
1 failing

  1. Test 1414 - NOT IN with NULL values should follow SQL semantics
    A) NOT IN with NULL in subquery should return empty result:

    AssertionError [ERR_ASSERTION]: Expected values to be loosely deep-equal:

[
{
a: 1
}
]

should loosely deep-equal

[]
+ expected - actual

  -[
  -  {
  -    "a": 1
  -  }
  -]
  +[]
  
  at Context.<anonymous> (test/test1414.js:26:10)
  at process.processImmediate (node:internal/timers:476:21)

Copy link
Contributor Author

Copilot AI commented Dec 7, 2025

@copilot

...

The test failure was caused by a merge conflict. The develop branch introduced caching logic that was overwriting the SQL-compliant NULL handling. I've now integrated the NULL handling with both the cached and uncached code paths.

Fixed in commit 7f5f328:

  • Updated both IN and NOT IN operators to handle NULL values correctly in both cached and uncached scenarios
  • The cached version now filters out NULL values before creating the Set and tracks whether NULL was present
  • All 6 test cases now pass, including the failing test A

Test results:

  • ✅ Test 1414-A: NOT IN with NULL in subquery returns empty result
  • ✅ All 1896 tests passing
  • ✅ No security vulnerabilities

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like way way too much code to make sure we return something correct for in and not in operatoins. Please rethink how this could be implemented in a much less invasive way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've simplified the implementation significantly. The new approach:

Changes made (commit 386672b):

  • Added two simple utility functions: sqlInCheck and sqlNotInCheck (37 lines in src/15utility.js)
  • Replaced complex inline arrow functions with calls to these utilities
  • Net reduction of 95 lines of code (added 37, removed 132)

Result:

  • Much cleaner and less invasive
  • Integrates well with existing caching logic
  • All 1896 tests passing
  • Original issue verified fixed

The utilities handle SQL NULL semantics in a centralized, maintainable way rather than duplicating complex logic inline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Comprison on NULL differes from SQL

2 participants