Skip to content

Conversation

@gord02
Copy link
Contributor

@gord02 gord02 commented Dec 1, 2025

This PR is to add support for the Expression Nested List type.

Issue

Key Changes:

  • Substrait pojo to proto and back conversion
  • Substrait pojo to calcite and back conversion
  • Added new type in Isthmus to support Nested functions list during the conversion to Calcite from Substrait
  • Added code to not do a project remapping on an empty table

Testing:

  • added tests for roundtrip between substrait to proto
  • added tests for roundtrip between substrait and calcite

./gradlew test --tests io.substrait.isthmus.NestedExpressionsTest --debug-jvm

@gord02 gord02 force-pushed the gordon.hamilton/expressionNestedTypes branch from 18967e4 to 9e99091 Compare December 2, 2025 15:50
Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. Thanks!

@benbellick benbellick self-requested a review December 2, 2025 20:42
Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just left a few more comments but the core/ part is looking great!

Copy link
Member

@benbellick benbellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the core stuff looks great! I will try and take a pass at the isthmus stuff, but as I mentioned to you online, I am not as comfortable with that part of the codebase. If the changes there seem simple enough that I am comfortable approving, I will do so. Otherwise, I'll leave it to @vbarua or others to make the final judgement call 🙂

@vbarua vbarua changed the title Add support for Expression nested List Type feat: support Nested Lists Dec 3, 2025
@gord02 gord02 marked this pull request as draft December 9, 2025 14:31
@gord02 gord02 marked this pull request as ready for review December 9, 2025 19:51
Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working this, its looking really good overall. I did have some comments, but they are minor suggestions. If you could take a look and respond to them, I think we can get this PR merge in this week easily 🙂

nested.getList().getValuesList().stream().map(this::from).collect(Collectors.toList());
return ExpressionCreator.nestedList(nested.getNullable(), list);
default:
throw new IllegalStateException("Unimplemented nested type: " + nested.getNestedTypeCase());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: We can use UnsupportedOperationException here, which better matches your error message well 🙂

}

/**
* Creator a nested list expression with one or more elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Creator a nested list expression with one or more elements.
* Creates a nested list expression with one or more elements.

Expression.ScalarFunctionInvocation nonLiteralExpression = b.add(b.i32(7), b.i32(42));

@Test
void DifferentTypedLiteralsNestedListTest() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void DifferentTypedLiteralsNestedListTest() {
void rejectNestedListWithElementsOfDifferentTypes() {

minor suggestion: if we include the expected behaviour in the test name, it's easier to see at at glance what its testing. Similar suggestions for test below.

}

@Test
void EmptyNestedListTest() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void EmptyNestedListTest() {
void rejectEmptyNestedListTest() {

}

@Test
void SameTypedLiteralsNestedListTest() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void SameTypedLiteralsNestedListTest() {
void acceptNestedListWithElementsOfSameType() {

Beyond the name suggestion for this test, it is a bit redundant because it doesn't really verify anything differently than your normal tests below do.

.map(this::toExpression)
.collect(java.util.stream.Collectors.toList());

// if there is no input fields, don’t put a remapping on it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor/pedantic: small rewording + avoiding extra whitespace

Suggested change
// if there is no input fields, don’t put a remapping on it
// if there are no input fields, no remap is necessary

final Rel emptyTable = b.emptyScan();

@Test
void NestedListWithJustLiteralsTest() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General Java style point. Tests are methods, so their names should be in camelCase. Only constructor functions that share the name of the class are capitalized

Suggested change
void NestedListWithJustLiteralsTest() {
void nestedListWithJustLiteralsTest() {

}

@Test
void NestedListWithNonLiteralsTest() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void NestedListWithNonLiteralsTest() {
void nestedListWithNonLiteralsTest() {


RelNode relNode = substraitToCalcite.convert(project); // substrait rel to calcite
Rel project2 = SubstraitRelVisitor.convert(relNode, extensions); // calcite to substrait
assertEquals(project, project2); // pojo -> calcite -> pojo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would replace these 3 lines with just

assertFullRoundTrip(project);

which is a utility helper we have that checks more conversion than what you are checking now. Same for the test below.

RelNode relNode = substraitToCalcite.convert(project); // substrait rel to calcite
Rel project2 = SubstraitRelVisitor.convert(relNode, extensions); // calcite to substrait
assertEquals(project, project2); // pojo -> calcite -> pojo
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more test that might be good to include in this file is a NestedList that includes as an expression a field from the input using a FieldReference.

Copy link

@asolimando asolimando left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments purely on the Isthmus side of things

public Optional<Expression> convert(
RexCall call, Function<RexNode, Expression> topLevelConverter) {

if (!call.getOperator().getName().equals("NESTEDLIST")) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!call.getOperator().getName().equals("NESTEDLIST")) {
if (!(call.getOperator() instanceof NestedListConstructor)) {

string-comparison is brittle and should be avoided

.map(this::toExpression)
.collect(java.util.stream.Collectors.toList());

// if there is no input fields, don’t put a remapping on it

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// if there is no input fields, don’t put a remapping on it
// if there are no input fields, don’t put a remapping on it

nit

final Rel emptyTable = b.emptyScan();

@Test
void NestedListWithJustLiteralsTest() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void NestedListWithJustLiteralsTest() {
void NestedListWithLiteralsTest() {

nit: but reads better IMO

Comment on lines +14 to +15
* constructor creates a special type of SqlKind.ARRAY_VALUE_CONSTRUCTOR for lists that store
* non-literal expressions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually and by looking at the tests (NestedListWithJustLiteralsTest), it seems this can indeed handle literal expressions, can you double check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can handle both literals and non-literals. I'll update the comment to avoid the confusion

* constructor creates a special type of SqlKind.ARRAY_VALUE_CONSTRUCTOR for lists that store
* non-literal expressions.
*/
public class NestedListConstructor extends SqlMultisetValueConstructor {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you picked SqlMultisetValueConstructor instead of the more natural SqlArrayValueConstructor because CallConverters.java#L144 would then match on your NestedList and treat it as the regular SqlArrayValueConstructor (then invoke LiteralConstructorConverter.java#L32 which is not what we want here).

SqlMultisetValueConstructor is conceptually wrong as a multiset is radically different from an array/list in standard SQL, on top of that I imagine that tomorrow you might want to support something similar and hit the same issue you are trying to avoid here by using this class in the first place.

The impedance mismatch is that in SQL (and Calcite), arrays and lists are technically the same entity, while IIRC in Substrait they are treated as different entities (@benbellick can you confirm this?).

By looking at LiteralConstructorConverter, there is an implicit assumption that arrays store only literals, we go down that route without checking if elements in the array are really literals (LiteralConstructorConverter.java#L62).

It's probably enough to change LiteralConstructorConverter::toNonEmptyListLiteral to something like this (haven't tested it):

private Optional<Expression> toNonEmptyListLiteral(
      RexCall call, Function<RexNode, Expression> topLevelConverter) {
    List<Expression> expressions = call.operands.stream()
        .map(topLevelConverter)
        .collect(Collectors.toList());

    // Check if all operands are actually literals
    if (expressions.stream().allMatch(e -> e instanceof Expression.Literal)) {
      return Optional.of(ExpressionCreator.list(
          call.getType().isNullable(),
          expressions.stream()
              .map(e -> (Expression.Literal) e)
              .collect(Collectors.toList())));
    }

    return Optional.empty();
  }

I suggest to extend SqlArrayValueConstructor (which, I know, extends SqlMultisetValueConstructor, but still), then fix LiteralConstructorConverter as suggested, so that we can continue with NestedExpressionConverter which comes just after, and we should be good

Copy link
Contributor Author

@gord02 gord02 Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I am understanding it correctly, let me know your thoughts on the following scenarios: We want to ensure that the roundtrip of both a nested list with literals and non-literals are both returned to a nested list. If the literalConstructorConverter is run first on a list of just literals, then it would pass and then wouldn't be mapped back to a nested list. In the other case, where the nestedExpressionConverter is run first, the literal lists that were originally not a NestedList would become a nested list. Does the above account for this or is the difference important?

Also, is there a way to meaningfully extend the SqlArrayValueConstructor class? Its definition is bare with just a constructor to its parent type.


class NestedExpressionsTest extends PlanTestBase {

protected static final SimpleExtension.ExtensionCollection defaultExtensionCollection =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could add a few more tests.

  1. mixed literal and non-literal expressions (IIRC this is the key scenario that justifies having both ListLiteral and NestedList), so here I'd test an "array" containing both literals and runtime expressions (like function calls) in the same list
  2. field references: not sure it's legal in this context, but if it is, it would be nice to add a test for arrays containing references to input table columns (common use case for building dynamic arrays from row data)
  3. ListLiteral preservation: due to what discussed here, I'd rather make sure creating a ListLiteral (all literals) round-trip it through Calcite, and assert that it remains a ListLiteral and was not converted to NestedList (especially if you accept my suggestion from the aforementioned comment)

Additional tests could be:

  • single-element lists,
  • nullable round-trip preservation through Calcite
  • different data types beyond integers and booleans
  • type coercion/type unification

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants