-
Notifications
You must be signed in to change notification settings - Fork 180
Support mvzip eval function
#4805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dai-chen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: can SQL's ARRAYS_ZIP be used for this?
Thanks for the question! I considered ARRAYS_ZIP but it's not suitable for mvzip due to semantic differences:
|
Does the |
For |
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.java
Outdated
Show resolved
Hide resolved
📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughAdds MVZIP: a new multivalue-array built-in that zips two arrays element-wise with an optional delimiter. Changes include core implementation, operator registration, PPL lexer/parser updates, tests, and documentation. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java (1)
848-856: Anonymization test for mvzip looks good; consider also covering 2‑arg formThis test mirrors the existing mvjoin/mvappend/mvindex anonymization patterns and validates the 3‑argument (custom delimiter) variant. Consider adding a second test for
mvzip(array(...), array(...))without the delimiter argument to ensure the anonymizer also handles the default‑delimiter form correctly.ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java (1)
294-429: Solid mvzip Calcite coverage; consider asserting results for empty/edge casesThe new tests cover the core mvzip scenarios well (basic, custom delimiter, nested), validating both the logical plan and Spark SQL translation.
For the empty‑array variants (
testMvzipWithEmptyLeftArray,testMvzipWithEmptyRightArray,testMvzipWithBothEmptyArrays), you currently only verify the plan/SQL. To guard the runtime semantics promised in the implementation (empty arrays produce[]rather thannull), consider also addingverifyResult(...)assertions for these cases. If mvzip is expected to handle null or scalar inputs (treating scalars as single‑element arrays), adding one or two focused tests for those would further lock in the contract.Based on learnings, this is the main place to exercise Calcite SQL generation and behavior for the new function.
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java (1)
493-560: Consider adding a test for null input handling.The PR objectives state that mvzip "returns null if either input is null," but this integration test suite doesn't verify that behavior. Adding a test like
testMvzipWithNullInputwould ensure the documented null semantics are validated end-to-end.@Test public void testMvzipWithNullInput() throws IOException { // When one input is null, result should be null JSONObject actual = executeQuery( String.format( "source=%s | eval result = mvzip(null, array('a', 'b')) | head 1 | fields result", TEST_INDEX_BANK)); verifySchema(actual, schema("result", "array")); verifyDataRows(actual, rows((Object) null)); }core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.java (2)
6-6: Package name should use lowercase.Java convention is to use lowercase for package names.
CollectionUDFshould becollectionudfor restructured ascollection.udf.
112-127: Consider pre-sizing ArrayList for efficiency.Since
minLengthis computed before the loop, the ArrayList could be initialized with the known capacity to avoid potential resizing during iteration.- List<Object> result = new ArrayList<>(); + List<Object> result = new ArrayList<>(minLength);
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java(1 hunks)core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.java(1 hunks)core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java(2 hunks)core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java(2 hunks)docs/user/ppl/functions/collection.rst(1 hunks)integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java(1 hunks)ppl/src/main/antlr/OpenSearchPPLLexer.g4(1 hunks)ppl/src/main/antlr/OpenSearchPPLParser.g4(1 hunks)ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java(1 hunks)ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.java(1 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.javacore/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.javacore/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
⚙️ CodeRabbit configuration file
**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure methods are under 20 lines with single responsibility
- Verify proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.javacore/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.javacore/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.javacore/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java
**/*Test.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*Test.java: All new business logic requires unit tests
Name unit tests with*Test.javasuffix in OpenSearch SQL
Files:
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify test coverage for new business logic
- Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java
**/ppl/**/*.java
⚙️ CodeRabbit configuration file
**/ppl/**/*.java: - For PPL parser changes, verify grammar tests with positive/negative cases
- Check AST generation for new syntax
- Ensure corresponding AST builder classes are updated
- Validate edge cases and boundary conditions
Files:
ppl/src/test/java/org/opensearch/sql/ppl/utils/PPLQueryDataAnonymizerTest.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java
integ-test/**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
End-to-end scenarios need integration tests in
integ-test/module
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
⚙️ CodeRabbit configuration file
integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)
- Check tests can be run with ./gradlew :integ-test:integTest
- Ensure proper test data setup and teardown
- Validate end-to-end scenario coverage
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
Name integration tests with
*IT.javasuffix in OpenSearch SQL
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Verify SQL generation and optimization paths
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java
🧠 Learnings (3)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.javappl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*.java : Document Calcite-specific workarounds in code
Applied to files:
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.java
🧬 Code graph analysis (3)
core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java (1)
PPLBuiltinOperators(108-508)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.java (1)
core/src/main/java/org/opensearch/sql/expression/function/ImplementorUDF.java (1)
ImplementorUDF(17-40)
core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java (1)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.java (1)
MVZipFunctionImpl(45-128)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (27)
- GitHub Check: security-it-linux (21)
- GitHub Check: security-it-linux (25)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, doc)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (21, integration)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, doc)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (14)
core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java (1)
50-51: MVZIP UDF wiring is consistent with existing collection operatorsThe MVZipFunctionImpl import and MVZIP SqlOperator registration follow the same pattern as MVAPPEND and other collection UDFs; nothing else appears needed here.
Also applies to: 395-396
ppl/src/main/antlr/OpenSearchPPLLexer.g4 (1)
441-448: MVZIP lexer token is correctly addedRegistering
MVZIPalongside the other collection function tokens is consistent and should work fine with the case‑insensitive lexer.ppl/src/main/antlr/OpenSearchPPLParser.g4 (1)
1094-1107: Parser integration of MVZIP is correctAdding
MVZIPtocollectionFunctionNamecleanly plugs mvzip intoevalFunctionNameandsearchableKeyWordwithout extra grammar changes. This matches how other MV* collection functions are modeled.core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java (1)
153-158: MVZIP registration in the function implementation table is consistentWiring
BuiltinFunctionName.MVZIPtoPPLBuiltinOperators.MVZIPviaregisterOperatormatches the existing pattern forMVAPPENDandMVDEDUP, and will reuse the UDF’s operand metadata for type checking. No additional special handling seems necessary.Also applies to: 1005-1009
core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java (1)
68-85: MVZIP enum entry is properly added to built‑in function namesDefining
MVZIP(FunctionName.of("mvzip"))alongside the other collection functions is consistent and ensuresBuiltinFunctionName.of("mvzip")will resolve correctly, without incorrectly treating it as an aggregation or window function.integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java (4)
494-504: LGTM!The basic mvzip test correctly validates default delimiter behavior with host/port combination, matching the documented specification.
507-518: LGTM!Good test coverage for custom delimiter usage with the
|character.
520-533: LGTM!The nested mvzip test validates an important use case from the specification. The expected output
["a|c|e", "b|d|f"]correctly demonstrates chained zipping.
535-560: LGTM!Good edge case coverage for empty arrays - both single empty input and both empty inputs returning empty results rather than null.
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.java (5)
30-50: LGTM!Good JavaDoc documentation explaining the function behavior, null handling semantics, and zip behavior with unequal-length arrays. The class structure correctly extends
ImplementorUDFwithNullPolicy.ANY.
52-62: LGTM!Return type inference correctly produces a nullable array of nullable VARCHAR elements, matching the documented behavior.
64-73: LGTM!Operand metadata correctly constrains the first two arguments to arrays and the optional third argument to a character type.
75-101: LGTM!The implementor correctly handles both 2-argument and 3-argument cases, injecting the default comma delimiter for the 2-argument form. The
IllegalArgumentExceptionfor unexpected argument counts provides a clear error message.
113-115: Null check may be redundant with NullPolicy.ANY.
NullPolicy.ANYin the constructor (line 49) already handles null propagation at the Calcite layer. This explicit null check provides defense-in-depth but creates slight redundancy. If intentional for safety when the method is called outside Calcite's runtime, consider adding a brief comment explaining why.
|
|
||
| MVZIP | ||
| ----- | ||
|
|
||
| Description | ||
| >>>>>>>>>>> | ||
|
|
||
| Usage: mvzip(mv_left, mv_right, [delim]) combines the values in two multivalue fields. The delimiter is used to specify a delimiting character to join the two values. This is similar to the Python zip command. | ||
|
|
||
| The values are stitched together combining the first value of mv_left with the first value of mv_right, then the second with the second, and so on. The function stops at the length of the shorter field. | ||
|
|
||
| The delimiter is optional. When specified, it must be enclosed in quotation marks. The default delimiter is a comma ( , ). | ||
|
|
||
| Argument type: mv_left: ANY, mv_right: ANY, delim: STRING (optional) | ||
|
|
||
| Return type: ARRAY | ||
|
|
||
| Example:: | ||
|
|
||
| os> source=people | eval hosts = array('host1', 'host2'), ports = array(80, 443), nserver = mvzip(hosts, ports) | fields nserver | head 1 | ||
| fetched rows / total rows = 1/1 | ||
| +----------------------+ | ||
| | nserver | | ||
| |----------------------| | ||
| | [host1,80,host2,443] | | ||
| +----------------------+ | ||
|
|
||
| os> source=people | eval arr1 = array('a', 'b', 'c'), arr2 = array('x', 'y', 'z'), result = mvzip(arr1, arr2, '|') | fields result | head 1 | ||
| fetched rows / total rows = 1/1 | ||
| +---------------+ | ||
| | result | | ||
| |---------------| | ||
| | [a|x,b|y,c|z] | | ||
| +---------------+ | ||
|
|
||
| os> source=people | eval arr1 = array(1, 2, 3), arr2 = array('a', 'b'), result = mvzip(arr1, arr2) | fields result | head 1 | ||
| fetched rows / total rows = 1/1 | ||
| +-----------+ | ||
| | result | | ||
| |-----------| | ||
| | [1,a,2,b] | | ||
| +-----------+ | ||
|
|
||
| os> source=people | eval arr1 = array(1, 2, 3), arr2 = array('a', 'b', 'c'), arr3 = array('x', 'y', 'z'), result = mvzip(mvzip(arr1, arr2), arr3) | fields result | head 1 | ||
| fetched rows / total rows = 1/1 | ||
| +---------------------+ | ||
| | result | | ||
| |---------------------| | ||
| | [1,a,x,2,b,y,3,c,z] | | ||
| +---------------------+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align MVZIP docs with implementation (types, return, and behavior)
The current text doesn’t quite match the function contract:
MVZipFunctionImpl.getOperandMetadata()requires the first two arguments to be arrays, not genericANY.- The return type is
ARRAYofSTRING(concatenated values), not an untyped array. - Implementation returns
nullif either input is null and an empty array when either array is empty.
Consider updating this section along these lines:
-Usage: mvzip(mv_left, mv_right, [delim]) combines the values in two multivalue fields. The delimiter is used to specify a delimiting character to join the two values. This is similar to the Python zip command.
+Usage: mvzip(mv_left, mv_right, [delim]) combines the values in two multivalue fields. The delimiter is used to specify a delimiting character to join the two values. This is similar to the Python zip command, but returns an array of concatenated strings.
-Argument type: mv_left: ANY, mv_right: ANY, delim: STRING (optional)
-
-Return type: ARRAY
+Argument type: mv_left: ARRAY, mv_right: ARRAY, delim: STRING (optional)
+
+Return type: ARRAY of STRING
+
+If either input is null, mvzip returns null. If either array is empty, mvzip returns an empty array.Optionally, you might also tweak the first example result to make the pairing clearer, e.g. | [host1,80, host2,443] | or use a non-comma delimiter to visually distinguish the joined pairs from element separators.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| MVZIP | |
| ----- | |
| Description | |
| >>>>>>>>>>> | |
| Usage: mvzip(mv_left, mv_right, [delim]) combines the values in two multivalue fields. The delimiter is used to specify a delimiting character to join the two values. This is similar to the Python zip command. | |
| The values are stitched together combining the first value of mv_left with the first value of mv_right, then the second with the second, and so on. The function stops at the length of the shorter field. | |
| The delimiter is optional. When specified, it must be enclosed in quotation marks. The default delimiter is a comma ( , ). | |
| Argument type: mv_left: ANY, mv_right: ANY, delim: STRING (optional) | |
| Return type: ARRAY | |
| Example:: | |
| os> source=people | eval hosts = array('host1', 'host2'), ports = array(80, 443), nserver = mvzip(hosts, ports) | fields nserver | head 1 | |
| fetched rows / total rows = 1/1 | |
| +----------------------+ | |
| | nserver | | |
| |----------------------| | |
| | [host1,80,host2,443] | | |
| +----------------------+ | |
| os> source=people | eval arr1 = array('a', 'b', 'c'), arr2 = array('x', 'y', 'z'), result = mvzip(arr1, arr2, '|') | fields result | head 1 | |
| fetched rows / total rows = 1/1 | |
| +---------------+ | |
| | result | | |
| |---------------| | |
| | [a|x,b|y,c|z] | | |
| +---------------+ | |
| os> source=people | eval arr1 = array(1, 2, 3), arr2 = array('a', 'b'), result = mvzip(arr1, arr2) | fields result | head 1 | |
| fetched rows / total rows = 1/1 | |
| +-----------+ | |
| | result | | |
| |-----------| | |
| | [1,a,2,b] | | |
| +-----------+ | |
| os> source=people | eval arr1 = array(1, 2, 3), arr2 = array('a', 'b', 'c'), arr3 = array('x', 'y', 'z'), result = mvzip(mvzip(arr1, arr2), arr3) | fields result | head 1 | |
| fetched rows / total rows = 1/1 | |
| +---------------------+ | |
| | result | | |
| |---------------------| | |
| | [1,a,x,2,b,y,3,c,z] | | |
| +---------------------+ | |
| MVZIP | |
| ----- | |
| Description | |
| >>>>>>>>>>> | |
| Usage: mvzip(mv_left, mv_right, [delim]) combines the values in two multivalue fields. The delimiter is used to specify a delimiting character to join the two values. This is similar to the Python zip command, but returns an array of concatenated strings. | |
| The values are stitched together combining the first value of mv_left with the first value of mv_right, then the second with the second, and so on. The function stops at the length of the shorter field. | |
| The delimiter is optional. When specified, it must be enclosed in quotation marks. The default delimiter is a comma ( , ). | |
| Argument type: mv_left: ARRAY, mv_right: ARRAY, delim: STRING (optional) | |
| Return type: ARRAY of STRING | |
| If either input is null, mvzip returns null. If either array is empty, mvzip returns an empty array. | |
| Example:: | |
| os> source=people | eval hosts = array('host1', 'host2'), ports = array(80, 443), nserver = mvzip(hosts, ports) | fields nserver | head 1 | |
| fetched rows / total rows = 1/1 | |
| ----------------------+ | |
| | nserver | | |
| |----------------------| | |
| | [host1,80,host2,443] | | |
| ----------------------+ | |
| os> source=people | eval arr1 = array('a', 'b', 'c'), arr2 = array('x', 'y', 'z'), result = mvzip(arr1, arr2, '|') | fields result | head 1 | |
| fetched rows / total rows = 1/1 | |
| ---------------+ | |
| | result | | |
| |---------------| | |
| | [a|x,b|y,c|z] | | |
| ---------------+ | |
| os> source=people | eval arr1 = array(1, 2, 3), arr2 = array('a', 'b'), result = mvzip(arr1, arr2) | fields result | head 1 | |
| fetched rows / total rows = 1/1 | |
| -----------+ | |
| | result | | |
| |-----------| | |
| | [1,a,2,b] | | |
| -----------+ | |
| os> source=people | eval arr1 = array(1, 2, 3), arr2 = array('a', 'b', 'c'), arr3 = array('x', 'y', 'z'), result = mvzip(mvzip(arr1, arr2), arr3) | fields result | head 1 | |
| fetched rows / total rows = 1/1 | |
| ---------------------+ | |
| | result | | |
| |---------------------| | |
| | [1,a,x,2,b,y,3,c,z] | | |
| ---------------------+ |
🤖 Prompt for AI Agents
In docs/user/ppl/functions/collection.rst around lines 397 to 446, the MVZIP
documentation is inconsistent with the implementation: update the argument types
to require two ARRAY arguments (not ANY), specify the return as ARRAY of STRING
(concatenated values), and document null/empty behavior (function returns null
if either input is null and returns an empty array if either input array is
empty). Also adjust examples to reflect string concatenation (e.g., show joined
pairs as strings or use a non-comma delimiter) so outputs match the actual
implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
Can you resolve conflict, and address comment from CodeRabbit. |
Signed-off-by: Kai Huang <[email protected]> # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java # core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java # integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java # ppl/src/main/antlr/OpenSearchPPLLexer.g4 # ppl/src/main/antlr/OpenSearchPPLParser.g4 # ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java # docs/user/ppl/functions/collection.rst # ppl/src/main/antlr/OpenSearchPPLParser.g4 # ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
Signed-off-by: Kai Huang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java (1)
493-560: Excellent test coverage following established patterns.The five new tests comprehensively validate mvzip functionality:
- Basic usage with default delimiter
- Custom delimiter usage
- Nested mvzip calls
- Empty array edge cases
All tests follow the file's conventions and properly verify both schema and data rows.
Consider adding these optional tests for even more comprehensive coverage:
- Different-length arrays: Explicitly test zip behavior when arrays have different lengths to demonstrate the "shorter array" semantics mentioned in the PR description.
@Test public void testMvzipWithDifferentLengths() throws IOException { JSONObject actual = executeQuery( String.format( "source=%s | eval arr1 = array('a', 'b', 'c'), arr2 = array('x', 'y'), result" + " = mvzip(arr1, arr2) | head 1 | fields result", TEST_INDEX_BANK)); verifySchema(actual, schema("result", "array")); verifyDataRows(actual, rows(List.of("a,x", "b,y"))); }
- Real field usage: Similar to testMvjoinWithArrayFromRealFields, test mvzip with actual document fields to validate end-to-end integration.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java(1 hunks)core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.java(1 hunks)core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java(2 hunks)core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java(2 hunks)docs/user/ppl/functions/collection.md(1 hunks)integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java(1 hunks)ppl/src/main/antlr/OpenSearchPPLLexer.g4(1 hunks)ppl/src/main/antlr/OpenSearchPPLParser.g4(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/user/ppl/functions/collection.md
🚧 Files skipped from review as they are similar to previous changes (4)
- ppl/src/main/antlr/OpenSearchPPLLexer.g4
- ppl/src/main/antlr/OpenSearchPPLParser.g4
- core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java
- core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.java
🧰 Additional context used
📓 Path-based instructions (5)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
⚙️ CodeRabbit configuration file
**/*.java: - Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure methods are under 20 lines with single responsibility
- Verify proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.javacore/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.javainteg-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
integ-test/**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
End-to-end scenarios need integration tests in
integ-test/module
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
⚙️ CodeRabbit configuration file
integ-test/**/*IT.java: - Verify integration tests are in correct module (integ-test/)
- Check tests can be run with ./gradlew :integ-test:integTest
- Ensure proper test data setup and teardown
- Validate end-to-end scenario coverage
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
**/*IT.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
Name integration tests with
*IT.javasuffix in OpenSearch SQL
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify test coverage for new business logic
- Check test naming follows conventions (*Test.java for unit, *IT.java for integration)
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
**/calcite/**/*.java
⚙️ CodeRabbit configuration file
**/calcite/**/*.java: - Follow existing patterns in CalciteRelNodeVisitor and CalciteRexNodeVisitor
- Verify SQL generation and optimization paths
- Document any Calcite-specific workarounds
- Test compatibility with Calcite version constraints
Files:
integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java
🧬 Code graph analysis (1)
core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java (1)
core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/MVZipFunctionImpl.java (1)
MVZipFunctionImpl(45-128)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: bwc-tests-full-restart (21)
- GitHub Check: build-linux (25, unit)
- GitHub Check: build-linux (25, integration)
- GitHub Check: build-linux (21, doc)
- GitHub Check: build-linux (21, integration)
- GitHub Check: bwc-tests-full-restart (25)
- GitHub Check: build-linux (21, unit)
- GitHub Check: build-linux (25, doc)
- GitHub Check: bwc-tests-rolling-upgrade (21)
- GitHub Check: bwc-tests-rolling-upgrade (25)
- GitHub Check: security-it-linux (25)
- GitHub Check: security-it-linux (21)
- GitHub Check: build-windows-macos (macos-14, 25, integration)
- GitHub Check: build-windows-macos (macos-14, 21, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, unit)
- GitHub Check: build-windows-macos (macos-14, 25, unit)
- GitHub Check: build-windows-macos (macos-14, 25, doc)
- GitHub Check: build-windows-macos (windows-latest, 25, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, integration)
- GitHub Check: build-windows-macos (macos-14, 21, unit)
- GitHub Check: build-windows-macos (macos-14, 21, integration)
- GitHub Check: build-windows-macos (windows-latest, 21, -PbuildPlatform=windows, unit)
- GitHub Check: security-it-windows-macos (macos-14, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 25)
- GitHub Check: security-it-windows-macos (windows-latest, 21)
- GitHub Check: security-it-windows-macos (macos-14, 21)
- GitHub Check: test-sql-cli-integration (21)
- GitHub Check: CodeQL-Scan (java)
🔇 Additional comments (2)
core/src/main/java/org/opensearch/sql/expression/function/PPLBuiltinOperators.java (1)
50-50: LGTM! Clean integration following established patterns.The import and operator registration for MVZIP are correctly implemented following the existing pattern used for other collection functions like MVAPPEND.
Also applies to: 396-396
core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java (1)
78-78: LGTM! Proper enum registration.The MVZIP constant is correctly added to the Collection functions section and will be automatically registered in ALL_NATIVE_FUNCTIONS through the static initializer.
Signed-off-by: Kai Huang <[email protected]>
* support mvzip eval function Signed-off-by: Kai Huang <[email protected]> # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java # core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java # integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java # ppl/src/main/antlr/OpenSearchPPLLexer.g4 # ppl/src/main/antlr/OpenSearchPPLParser.g4 # ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java # docs/user/ppl/functions/collection.rst # ppl/src/main/antlr/OpenSearchPPLParser.g4 # ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java * add anonymizer test Signed-off-by: Kai Huang <[email protected]> * fix UT Signed-off-by: Kai Huang <[email protected]> * updates Signed-off-by: Kai Huang <[email protected]> * update to use List.class Signed-off-by: Kai Huang <[email protected]> * Update md doc Signed-off-by: Kai Huang <[email protected]> * Addressing comments from coderabbit Signed-off-by: Kai Huang <[email protected]> --------- Signed-off-by: Kai Huang <[email protected]> (cherry picked from commit 52a691a) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* support mvzip eval function # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java # core/src/main/java/org/opensearch/sql/expression/function/PPLFuncImpTable.java # integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteArrayFunctionIT.java # ppl/src/main/antlr/OpenSearchPPLLexer.g4 # ppl/src/main/antlr/OpenSearchPPLParser.g4 # ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java # Conflicts: # core/src/main/java/org/opensearch/sql/expression/function/BuiltinFunctionName.java # docs/user/ppl/functions/collection.rst # ppl/src/main/antlr/OpenSearchPPLParser.g4 # ppl/src/test/java/org/opensearch/sql/ppl/calcite/CalcitePPLArrayFunctionTest.java * add anonymizer test * fix UT * updates * update to use List.class * Update md doc * Addressing comments from coderabbit --------- (cherry picked from commit 52a691a) Signed-off-by: Kai Huang <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Description
The
mvzipfunction combines values from two multivalue fields pairwise with a delimiter.It stitches together corresponding elements from each field, similar to Python’s
zip()function.The function supports two modes of operation:
Default delimiter:
Combines fields using a comma (
,) as the default delimiter.Custom delimiter:
Combines fields using the specified delimiter.
Key Features
zip()behavior).nullif either input isnull.,) when delimiter is not specified.Usage Examples
Basic Usage with Default Delimiter
Custom Delimiter
Different Length Arrays
Nested mvzip Calls
Null Handling
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.