-
Notifications
You must be signed in to change notification settings - Fork 93
feat: support Nested Lists #627
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: support Nested Lists #627
Conversation
18967e4 to
9e99091
Compare
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. Thanks!
core/src/main/java/io/substrait/expression/proto/ExpressionProtoConverter.java
Outdated
Show resolved
Hide resolved
core/src/main/java/io/substrait/expression/proto/ProtoExpressionConverter.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/NestedListExpressionTest.java
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/NestedListExpressionTest.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/NestedListExpressionTest.java
Outdated
Show resolved
Hide resolved
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just left a few more comments but the core/ part is looking great!
core/src/main/java/io/substrait/expression/proto/ProtoExpressionConverter.java
Outdated
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/NestedListExpressionTest.java
Show resolved
Hide resolved
core/src/test/java/io/substrait/type/proto/NestedListExpressionTest.java
Show resolved
Hide resolved
benbellick
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the core stuff looks great! I will try and take a pass at the isthmus stuff, but as I mentioned to you online, I am not as comfortable with that part of the codebase. If the changes there seem simple enough that I am comfortable approving, I will do so. Otherwise, I'll leave it to @vbarua or others to make the final judgement call 🙂
isthmus/src/main/java/io/substrait/isthmus/NestedFunctions.java
Outdated
Show resolved
Hide resolved
vbarua
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working this, its looking really good overall. I did have some comments, but they are minor suggestions. If you could take a look and respond to them, I think we can get this PR merge in this week easily 🙂
| nested.getList().getValuesList().stream().map(this::from).collect(Collectors.toList()); | ||
| return ExpressionCreator.nestedList(nested.getNullable(), list); | ||
| default: | ||
| throw new IllegalStateException("Unimplemented nested type: " + nested.getNestedTypeCase()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: We can use UnsupportedOperationException here, which better matches your error message well 🙂
| } | ||
|
|
||
| /** | ||
| * Creator a nested list expression with one or more elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Creator a nested list expression with one or more elements. | |
| * Creates a nested list expression with one or more elements. |
| Expression.ScalarFunctionInvocation nonLiteralExpression = b.add(b.i32(7), b.i32(42)); | ||
|
|
||
| @Test | ||
| void DifferentTypedLiteralsNestedListTest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void DifferentTypedLiteralsNestedListTest() { | |
| void rejectNestedListWithElementsOfDifferentTypes() { |
minor suggestion: if we include the expected behaviour in the test name, it's easier to see at at glance what its testing. Similar suggestions for test below.
| } | ||
|
|
||
| @Test | ||
| void EmptyNestedListTest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void EmptyNestedListTest() { | |
| void rejectEmptyNestedListTest() { |
| } | ||
|
|
||
| @Test | ||
| void SameTypedLiteralsNestedListTest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void SameTypedLiteralsNestedListTest() { | |
| void acceptNestedListWithElementsOfSameType() { |
Beyond the name suggestion for this test, it is a bit redundant because it doesn't really verify anything differently than your normal tests below do.
| .map(this::toExpression) | ||
| .collect(java.util.stream.Collectors.toList()); | ||
|
|
||
| // if there is no input fields, don’t put a remapping on it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor/pedantic: small rewording + avoiding extra whitespace
| // if there is no input fields, don’t put a remapping on it | |
| // if there are no input fields, no remap is necessary |
| final Rel emptyTable = b.emptyScan(); | ||
|
|
||
| @Test | ||
| void NestedListWithJustLiteralsTest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General Java style point. Tests are methods, so their names should be in camelCase. Only constructor functions that share the name of the class are capitalized
| void NestedListWithJustLiteralsTest() { | |
| void nestedListWithJustLiteralsTest() { |
| } | ||
|
|
||
| @Test | ||
| void NestedListWithNonLiteralsTest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void NestedListWithNonLiteralsTest() { | |
| void nestedListWithNonLiteralsTest() { |
|
|
||
| RelNode relNode = substraitToCalcite.convert(project); // substrait rel to calcite | ||
| Rel project2 = SubstraitRelVisitor.convert(relNode, extensions); // calcite to substrait | ||
| assertEquals(project, project2); // pojo -> calcite -> pojo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would replace these 3 lines with just
assertFullRoundTrip(project);which is a utility helper we have that checks more conversion than what you are checking now. Same for the test below.
| RelNode relNode = substraitToCalcite.convert(project); // substrait rel to calcite | ||
| Rel project2 = SubstraitRelVisitor.convert(relNode, extensions); // calcite to substrait | ||
| assertEquals(project, project2); // pojo -> calcite -> pojo | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more test that might be good to include in this file is a NestedList that includes as an expression a field from the input using a FieldReference.
asolimando
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments purely on the Isthmus side of things
| public Optional<Expression> convert( | ||
| RexCall call, Function<RexNode, Expression> topLevelConverter) { | ||
|
|
||
| if (!call.getOperator().getName().equals("NESTEDLIST")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (!call.getOperator().getName().equals("NESTEDLIST")) { | |
| if (!(call.getOperator() instanceof NestedListConstructor)) { |
string-comparison is brittle and should be avoided
| .map(this::toExpression) | ||
| .collect(java.util.stream.Collectors.toList()); | ||
|
|
||
| // if there is no input fields, don’t put a remapping on it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // if there is no input fields, don’t put a remapping on it | |
| // if there are no input fields, don’t put a remapping on it |
nit
| final Rel emptyTable = b.emptyScan(); | ||
|
|
||
| @Test | ||
| void NestedListWithJustLiteralsTest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| void NestedListWithJustLiteralsTest() { | |
| void NestedListWithLiteralsTest() { |
nit: but reads better IMO
| * constructor creates a special type of SqlKind.ARRAY_VALUE_CONSTRUCTOR for lists that store | ||
| * non-literal expressions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually and by looking at the tests (NestedListWithJustLiteralsTest), it seems this can indeed handle literal expressions, can you double check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can handle both literals and non-literals. I'll update the comment to avoid the confusion
| * constructor creates a special type of SqlKind.ARRAY_VALUE_CONSTRUCTOR for lists that store | ||
| * non-literal expressions. | ||
| */ | ||
| public class NestedListConstructor extends SqlMultisetValueConstructor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you picked SqlMultisetValueConstructor instead of the more natural SqlArrayValueConstructor because CallConverters.java#L144 would then match on your NestedList and treat it as the regular SqlArrayValueConstructor (then invoke LiteralConstructorConverter.java#L32 which is not what we want here).
SqlMultisetValueConstructor is conceptually wrong as a multiset is radically different from an array/list in standard SQL, on top of that I imagine that tomorrow you might want to support something similar and hit the same issue you are trying to avoid here by using this class in the first place.
The impedance mismatch is that in SQL (and Calcite), arrays and lists are technically the same entity, while IIRC in Substrait they are treated as different entities (@benbellick can you confirm this?).
By looking at LiteralConstructorConverter, there is an implicit assumption that arrays store only literals, we go down that route without checking if elements in the array are really literals (LiteralConstructorConverter.java#L62).
It's probably enough to change LiteralConstructorConverter::toNonEmptyListLiteral to something like this (haven't tested it):
private Optional<Expression> toNonEmptyListLiteral(
RexCall call, Function<RexNode, Expression> topLevelConverter) {
List<Expression> expressions = call.operands.stream()
.map(topLevelConverter)
.collect(Collectors.toList());
// Check if all operands are actually literals
if (expressions.stream().allMatch(e -> e instanceof Expression.Literal)) {
return Optional.of(ExpressionCreator.list(
call.getType().isNullable(),
expressions.stream()
.map(e -> (Expression.Literal) e)
.collect(Collectors.toList())));
}
return Optional.empty();
}
I suggest to extend SqlArrayValueConstructor (which, I know, extends SqlMultisetValueConstructor, but still), then fix LiteralConstructorConverter as suggested, so that we can continue with NestedExpressionConverter which comes just after, and we should be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I am understanding it correctly, let me know your thoughts on the following scenarios: We want to ensure that the roundtrip of both a nested list with literals and non-literals are both returned to a nested list. If the literalConstructorConverter is run first on a list of just literals, then it would pass and then wouldn't be mapped back to a nested list. In the other case, where the nestedExpressionConverter is run first, the literal lists that were originally not a NestedList would become a nested list. Does the above account for this or is the difference important?
Also, is there a way to meaningfully extend the SqlArrayValueConstructor class? Its definition is bare with just a constructor to its parent type.
|
|
||
| class NestedExpressionsTest extends PlanTestBase { | ||
|
|
||
| protected static final SimpleExtension.ExtensionCollection defaultExtensionCollection = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could add a few more tests.
- mixed literal and non-literal expressions (IIRC this is the key scenario that justifies having both ListLiteral and NestedList), so here I'd test an "array" containing both literals and runtime expressions (like function calls) in the same list
- field references: not sure it's legal in this context, but if it is, it would be nice to add a test for arrays containing references to input table columns (common use case for building dynamic arrays from row data)
- ListLiteral preservation: due to what discussed here, I'd rather make sure creating a
ListLiteral(all literals) round-trip it through Calcite, and assert that it remains aListLiteraland was not converted toNestedList(especially if you accept my suggestion from the aforementioned comment)
Additional tests could be:
- single-element lists,
- nullable round-trip preservation through Calcite
- different data types beyond integers and booleans
- type coercion/type unification
This PR is to add support for the Expression Nested List type.
Issue
Key Changes:
Testing:
./gradlew test --tests io.substrait.isthmus.NestedExpressionsTest --debug-jvm