From e5ab39e9737c2131a7a6cef8e800983a32986f2e Mon Sep 17 00:00:00 2001 From: Alexander Taepper Date: Wed, 25 Feb 2026 13:01:09 +0100 Subject: [PATCH 1/4] feat(silo)!: change query interface to saneql --- .clang-tidy | 4 +- .../GroupByLineageInvalidOrderBy.json | 11 +- ...stRecentCommonAncestor_invalidBoolean.json | 16 +- ...ostRecentCommonAncestor_invalidColumn.json | 14 +- .../test/invalidQueries/OffsetNegative.json | 24 +- .../invalidQueries/Subtree_invalidColumn.json | 14 +- .../aa_mutations_no_proportion.json | 11 +- .../insertionContains_empty.json | 11 +- .../insertionContains_invalidPattern.json | 11 +- .../insertionContains_invalidPattern2.json | 11 +- .../insertionsAAseparation.json | 10 +- .../insertionsInvalidSequence.json | 10 +- .../test/invalidQueries/invalidAction.json | 14 +- .../invalidMutationsMinProportion.json | 13 +- .../nuc_mutations_no_proportion.json | 11 +- .../phyloDescendantOf_invalidNode.json | 11 +- .../invalidQueries/sequencePos0Filter.json | 11 +- .../stringSearch_nonExistingColumn.json | 11 +- .../stringSearch_nonStringColumn.json | 11 +- .../stringSearch_withInvalidRegex.json | 11 +- .../test/queries/AASymbolEquals.json | 12 +- endToEndTests/test/queries/And.json | 22 +- .../test/queries/DetailsOrderBy.json | 23 +- .../test/queries/DetailsOrderByLimit.json | 25 +- endToEndTests/test/queries/Exact.json | 14 +- .../test/queries/GroupByDivision.json | 11 +- .../test/queries/GroupByLineage.json | 11 +- .../GroupByLineageOrderByCountLimit.json | 17 +- endToEndTests/test/queries/HasAAMutation.json | 11 +- .../test/queries/LimitLargerThanTable.json | 24 +- endToEndTests/test/queries/Maybe.json | 14 +- .../MostRecentCommonAncestor_SimpleQuery.json | 23 +- ...RecentCommonAncestor_onlyMissingNodes.json | 24 +- ...tRecentCommonAncestor_withMissingNode.json | 39 +- endToEndTests/test/queries/N_notIndexed.json | 11 +- endToEndTests/test/queries/Not.json | 14 +- .../queries/NotUnindexedStringEquals.json | 14 +- endToEndTests/test/queries/Offset0.json | 24 +- endToEndTests/test/queries/OffsetFull.json | 12 +- .../test/queries/OffsetLargerThanTable.json | 24 +- .../test/queries/OffsetLimitOverlap.json | 12 +- endToEndTests/test/queries/Or.json | 23 +- endToEndTests/test/queries/OrderByAge.json | 16 +- endToEndTests/test/queries/OrderByFloat.json | 11 +- .../test/queries/OrderByFloatDesc.json | 16 +- .../test/queries/OrderByFloatFiltered.json | 14 +- .../test/queries/PangoLineageAlias.json | 16 +- .../test/queries/PhyloDescendantOf.json | 18 +- .../queries/Subtree_onlyMissingNodes.json | 29 +- .../test/queries/Subtree_simpleQuery.json | 24 +- ...tree_simple_query_without_unary_nodes.json | 24 +- .../test/queries/Subtree_withMissingNode.json | 34 +- .../test/queries/aaInsertionsAction.json | 10 +- .../queries/aaInsertionsActionAndFilter.json | 13 +- .../aaInsertionsActionOneSequence.json | 11 +- .../test/queries/aaInsertionsContains.json | 14 +- .../test/queries/aaMutDistribution.json | 11 +- .../test/queries/aaMutDistribution_all.json | 11 +- .../test/queries/aaMutDistribution_min0.json | 12 +- .../queries/aaMutDistribution_multiple.json | 11 +- .../queries/aaMutDistribution_very_low.json | 12 +- endToEndTests/test/queries/booleanEquals.json | 11 +- .../test/queries/booleanEquals_And.json | 22 +- .../test/queries/booleanEquals_Or.json | 22 +- .../test/queries/boolean_Details.json | 62 +- endToEndTests/test/queries/complexQuery.json | 110 +- endToEndTests/test/queries/dateBetween.json | 12 +- .../test/queries/dateBetween_noBounds.json | 12 +- .../queries/dateBetween_null_excluded.json | 12 +- .../test/queries/dateBetween_openFrom1.json | 12 +- .../test/queries/dateBetween_openFrom2.json | 12 +- .../test/queries/dateBetween_openFrom3.json | 12 +- .../test/queries/dateBetween_openTo1.json | 12 +- .../test/queries/dateBetween_openTo2.json | 12 +- .../test/queries/dateBetween_openTo3.json | 12 +- .../test/queries/detailsLimitAscending5.json | 17 +- .../queries/detailsLimitDescending10.json | 17 +- .../queries/detailsLimitDescending15.json | 17 +- .../test/queries/divisionFilter.json | 17 +- .../test/queries/explicitDefaultSequence.json | 12 +- endToEndTests/test/queries/fastaAligned.json | 511 +++++++-- .../test/queries/fastaAligned_multiple.json | 14 +- .../test/queries/fasta_allTestSequences.json | 506 +++++++-- .../test/queries/fasta_manySequences.json | 13 +- .../fasta_oneRowTwoUnalignedSequences.json | 12 +- .../queries/fasta_oneSequenceUnaligned.json | 12 +- endToEndTests/test/queries/floatBetween.json | 12 +- .../test/queries/floatBetween_noBound.json | 12 +- .../test/queries/floatBetween_openFrom.json | 12 +- .../test/queries/floatBetween_openTo.json | 12 +- endToEndTests/test/queries/floatEquals.json | 11 +- .../queries/insertionContainsStopCodon.json | 25 +- .../test/queries/insertionContains_exact.json | 11 +- .../queries/insertionContains_noSeqCol.json | 22 +- .../queries/insertionContains_not_exact1.json | 11 +- .../queries/insertionContains_not_exact2.json | 11 +- .../queries/insertionContains_not_exact3.json | 11 +- .../queries/insertionContains_not_exact4.json | 11 +- .../test/queries/insertionsAction.json | 10 +- .../queries/insertionsActionAndFilter.json | 12 +- endToEndTests/test/queries/intBetween.json | 12 +- .../test/queries/intBetween_noBounds.json | 12 +- .../test/queries/intBetween_openFrom.json | 12 +- .../test/queries/intBetween_openTo.json | 12 +- endToEndTests/test/queries/intEquals.json | 11 +- endToEndTests/test/queries/matchAll.json | 9 +- .../test/queries/mutations_False.json | 10 +- .../test/queries/nOf_2of3_aggregated.json | 28 +- .../test/queries/nOf_2of3_aggregated2.json | 28 +- .../test/queries/nOf_2of3_details.json | 31 +- .../queries/nOf_2of3_details_selection.json | 30 +- .../test/queries/nOf_2of3_mutations.json | 29 +- .../test/queries/notUnsortedDateBetween.json | 15 +- .../pangoLIneageIncludingSublineages.json | 12 +- .../pangoLIneageWithoutSublineages.json | 12 +- .../test/queries/recombinantLineage.json | 12 +- .../queries/recombinantLineageWithAlias.json | 12 +- .../test/queries/secondSequence.json | 12 +- .../queries/secondSequenceHasMutation.json | 11 +- .../test/queries/sequenceEndFilter.json | 11 +- .../queries/sequenceStartEndMutations.json | 22 +- .../test/queries/sequenceStartFilter.json | 11 +- endToEndTests/test/queries/stringEquals.json | 11 +- .../stringEqualsOnUnindexedColumn.json | 11 +- .../queries/stringSearch_basic_regex.json | 14 +- .../queries/stringSearch_digitAmount.json | 11 +- .../queries/stringSearch_justAString.json | 11 +- .../test/queries/stringSearch_prefix.json | 11 +- .../symbolEquals/testSeqPos2SymbolA.json | 12 +- .../symbolEquals/testSeqPos2SymbolC.json | 12 +- .../symbolEquals/testSeqPos2SymbolExactA.json | 15 +- .../symbolEquals/testSeqPos2SymbolExactC.json | 15 +- .../symbolEquals/testSeqPos2SymbolExactG.json | 15 +- .../testSeqPos2SymbolExactGAP.json | 15 +- .../symbolEquals/testSeqPos2SymbolExactN.json | 15 +- .../symbolEquals/testSeqPos2SymbolExactR.json | 15 +- .../symbolEquals/testSeqPos2SymbolExactT.json | 15 +- .../symbolEquals/testSeqPos2SymbolExactY.json | 15 +- .../symbolEquals/testSeqPos2SymbolG.json | 12 +- .../symbolEquals/testSeqPos2SymbolGAP.json | 12 +- .../symbolEquals/testSeqPos2SymbolMaybeA.json | 15 +- .../symbolEquals/testSeqPos2SymbolMaybeC.json | 15 +- .../symbolEquals/testSeqPos2SymbolMaybeG.json | 15 +- .../testSeqPos2SymbolMaybeGAP.json | 15 +- .../symbolEquals/testSeqPos2SymbolMaybeN.json | 15 +- .../symbolEquals/testSeqPos2SymbolMaybeR.json | 15 +- .../symbolEquals/testSeqPos2SymbolMaybeT.json | 15 +- .../symbolEquals/testSeqPos2SymbolMaybeY.json | 15 +- .../symbolEquals/testSeqPos2SymbolN.json | 12 +- .../symbolEquals/testSeqPos2SymbolR.json | 12 +- .../symbolEquals/testSeqPos2SymbolT.json | 12 +- .../symbolEquals/testSeqPos2SymbolY.json | 12 +- .../test/queries/unsortedDateBetween.json | 12 +- endToEndTests/test/query.test.js | 56 +- endToEndTests/test/requestId.test.js | 6 +- performance/many_short_read_filters.cpp | 30 +- performance/many_string_equals.cpp | 41 +- performance/mutation_benchmark.cpp | 52 +- python/silodb/database.pxd | 2 +- python/silodb/database.pyx | 35 +- python/tests/test_database.py | 101 +- saneql.examples | 250 +++++ src/silo/api/query_handler.cpp | 11 +- src/silo/database.cpp | 70 +- src/silo/database.h | 6 +- src/silo/database.test.cpp | 40 +- src/silo/preprocessing/preprocessing.test.cpp | 165 +-- src/silo/query_engine/action_query.cpp | 37 - src/silo/query_engine/action_query.h | 35 - src/silo/query_engine/actions/action.cpp | 131 --- src/silo/query_engine/actions/action.h | 64 -- src/silo/query_engine/actions/aggregated.cpp | 58 -- src/silo/query_engine/actions/aggregated.h | 29 - .../query_engine/actions/aggregated.test.cpp | 458 --------- src/silo/query_engine/actions/details.cpp | 19 - src/silo/query_engine/actions/details.h | 24 - .../query_engine/actions/details.test.cpp | 450 -------- src/silo/query_engine/actions/fasta.cpp | 61 -- src/silo/query_engine/actions/fasta.h | 30 - .../query_engine/actions/fasta_aligned.cpp | 69 -- src/silo/query_engine/actions/fasta_aligned.h | 27 - .../actions/fasta_aligned.test.cpp | 468 --------- src/silo/query_engine/actions/insertions.cpp | 82 -- src/silo/query_engine/actions/insertions.h | 42 - .../query_engine/actions/insertions.test.cpp | 215 ---- .../actions/most_recent_common_ancestor.cpp | 50 - .../actions/most_recent_common_ancestor.h | 26 - src/silo/query_engine/actions/mutations.cpp | 142 --- src/silo/query_engine/actions/mutations.h | 63 -- .../query_engine/actions/mutations.test.cpp | 336 ------ .../query_engine/actions/phylo_subtree.cpp | 59 -- src/silo/query_engine/actions/phylo_subtree.h | 27 - src/silo/query_engine/binder.cpp | 416 -------- src/silo/query_engine/binder.h | 16 - .../query_engine/filter/expressions/and.cpp | 13 - .../query_engine/filter/expressions/and.h | 5 - .../filter/expressions/and.test.cpp | 42 +- .../filter/expressions/bool_equals.cpp | 24 - .../filter/expressions/bool_equals.h | 5 - .../filter/expressions/date_between.cpp | 42 - .../filter/expressions/date_between.h | 5 - .../filter/expressions/date_equals.cpp | 30 - .../filter/expressions/date_equals.h | 5 - .../query_engine/filter/expressions/exact.cpp | 8 - .../query_engine/filter/expressions/exact.h | 5 - .../filter/expressions/expression.cpp | 74 -- .../filter/expressions/expression.h | 5 - .../query_engine/filter/expressions/false.cpp | 5 - .../query_engine/filter/expressions/false.h | 5 - .../filter/expressions/float_between.cpp | 34 - .../filter/expressions/float_between.h | 5 - .../filter/expressions/float_equals.cpp | 24 - .../filter/expressions/float_equals.h | 5 - .../filter/expressions/has_mutation.cpp | 37 - .../filter/expressions/has_mutation.h | 6 - .../filter/expressions/insertion_contains.cpp | 44 - .../filter/expressions/insertion_contains.h | 6 - .../filter/expressions/int_between.cpp | 33 - .../filter/expressions/int_between.h | 3 - .../filter/expressions/int_equals.cpp | 24 - .../filter/expressions/int_equals.h | 3 - .../filter/expressions/is_null.cpp | 13 - .../query_engine/filter/expressions/is_null.h | 5 - .../filter/expressions/is_null.test.cpp | 196 +--- .../filter/expressions/lineage_filter.cpp | 78 -- .../filter/expressions/lineage_filter.h | 3 - .../expressions/lineage_filter.test.cpp | 81 +- .../query_engine/filter/expressions/maybe.cpp | 7 - .../query_engine/filter/expressions/maybe.h | 3 - .../filter/expressions/negation.cpp | 7 - .../filter/expressions/negation.h | 3 - .../query_engine/filter/expressions/nof.cpp | 32 - .../query_engine/filter/expressions/nof.h | 5 - .../query_engine/filter/expressions/or.cpp | 13 - src/silo/query_engine/filter/expressions/or.h | 5 - .../filter/expressions/or.test.cpp | 154 +-- .../filter/expressions/phylo_child_filter.cpp | 23 - .../filter/expressions/phylo_child_filter.h | 5 - .../filter/expressions/string_equals.cpp | 26 - .../filter/expressions/string_equals.h | 5 - .../filter/expressions/string_equals.test.cpp | 94 +- .../filter/expressions/string_in_set.cpp | 28 - .../filter/expressions/string_in_set.h | 5 - .../filter/expressions/string_in_set.test.cpp | 223 +--- .../filter/expressions/string_search.cpp | 30 - .../filter/expressions/string_search.h | 3 - .../filter/expressions/symbol_equals.cpp | 62 -- .../filter/expressions/symbol_equals.h | 6 - .../filter/expressions/symbol_in_set.h | 2 - .../query_engine/filter/expressions/true.cpp | 5 - .../query_engine/filter/expressions/true.h | 5 - .../query_engine/operators/aggregate_node.cpp | 75 +- .../query_engine/operators/aggregate_node.h | 17 +- .../query_engine/operators/filter_node.cpp | 24 + src/silo/query_engine/operators/filter_node.h | 33 + .../most_recent_common_ancestor_node.cpp | 29 +- .../most_recent_common_ancestor_node.h | 9 + .../query_engine/operators/mutations_node.h | 10 + .../query_engine/operators/order_by_node.cpp | 22 + .../operators/phylo_subtree_node.cpp | 31 +- .../operators/phylo_subtree_node.h | 10 + .../query_engine/operators/project_node.cpp | 40 + .../query_engine/operators/project_node.h | 32 + src/silo/query_engine/operators/scan_node.cpp | 30 + src/silo/query_engine/operators/scan_node.h | 33 + .../operators/unresolved_insertions_node.h | 33 + ...esolved_most_recent_common_ancestor_node.h | 39 + .../operators/unresolved_mutations_node.h | 42 + .../operators/unresolved_phylo_subtree_node.h | 40 + src/silo/query_engine/planner.cpp | 390 ++++++- src/silo/query_engine/planner.h | 14 + src/silo/query_engine/query_plan.h | 2 - src/silo/query_engine/saneql/ast.cpp | 253 +++++ src/silo/query_engine/saneql/ast.h | 139 +++ src/silo/query_engine/saneql/ast_to_query.cpp | 973 ++++++++++++++++++ src/silo/query_engine/saneql/ast_to_query.h | 32 + .../query_engine/saneql/function_registry.cpp | 163 +++ .../query_engine/saneql/function_registry.h | 118 +++ src/silo/query_engine/saneql/lexer.cpp | 319 ++++++ src/silo/query_engine/saneql/lexer.h | 38 + src/silo/query_engine/saneql/lexer.test.cpp | 264 +++++ .../query_engine/saneql/parse_exception.cpp | 11 + .../query_engine/saneql/parse_exception.h | 34 + src/silo/query_engine/saneql/parser.cpp | 382 +++++++ src/silo/query_engine/saneql/parser.h | 47 + src/silo/query_engine/saneql/parser.test.cpp | 229 +++++ .../query_engine/saneql/source_location.h | 22 + src/silo/query_engine/saneql/token.cpp | 98 ++ src/silo/query_engine/saneql/token.h | 55 + .../amino_acid_insertion_contains.test.cpp | 37 +- .../test/amino_acid_symbol_equals.test.cpp | 23 +- src/silo/test/date_between.test.cpp | 66 +- src/silo/test/date_equals.test.cpp | 70 +- src/silo/test/default_sequence.test.cpp | 68 +- src/silo/test/fasta.test.cpp | 146 +-- .../test/float_equals_and_between.test.cpp | 70 +- src/silo/test/has_mutation.test.cpp | 43 +- src/silo/test/insertion_contains.test.cpp | 46 +- src/silo/test/int_equals_and_between.test.cpp | 258 +++-- .../test/nucleotide_symbol_equals.test.cpp | 21 +- src/silo/test/query_fixture.test.cpp | 26 +- src/silo/test/query_fixture.test.h | 43 +- src/silo/test/randomize.test.cpp | 68 +- src/silo/test/string_search.test.cpp | 43 +- 304 files changed, 5983 insertions(+), 8488 deletions(-) create mode 100644 saneql.examples delete mode 100644 src/silo/query_engine/action_query.cpp delete mode 100644 src/silo/query_engine/action_query.h delete mode 100644 src/silo/query_engine/actions/action.cpp delete mode 100644 src/silo/query_engine/actions/action.h delete mode 100644 src/silo/query_engine/actions/aggregated.cpp delete mode 100644 src/silo/query_engine/actions/aggregated.h delete mode 100644 src/silo/query_engine/actions/aggregated.test.cpp delete mode 100644 src/silo/query_engine/actions/details.cpp delete mode 100644 src/silo/query_engine/actions/details.h delete mode 100644 src/silo/query_engine/actions/details.test.cpp delete mode 100644 src/silo/query_engine/actions/fasta.cpp delete mode 100644 src/silo/query_engine/actions/fasta.h delete mode 100644 src/silo/query_engine/actions/fasta_aligned.cpp delete mode 100644 src/silo/query_engine/actions/fasta_aligned.h delete mode 100644 src/silo/query_engine/actions/fasta_aligned.test.cpp delete mode 100644 src/silo/query_engine/actions/insertions.cpp delete mode 100644 src/silo/query_engine/actions/insertions.h delete mode 100644 src/silo/query_engine/actions/insertions.test.cpp delete mode 100644 src/silo/query_engine/actions/most_recent_common_ancestor.cpp delete mode 100644 src/silo/query_engine/actions/most_recent_common_ancestor.h delete mode 100644 src/silo/query_engine/actions/mutations.cpp delete mode 100644 src/silo/query_engine/actions/mutations.h delete mode 100644 src/silo/query_engine/actions/mutations.test.cpp delete mode 100644 src/silo/query_engine/actions/phylo_subtree.cpp delete mode 100644 src/silo/query_engine/actions/phylo_subtree.h delete mode 100644 src/silo/query_engine/binder.cpp delete mode 100644 src/silo/query_engine/binder.h create mode 100644 src/silo/query_engine/operators/filter_node.cpp create mode 100644 src/silo/query_engine/operators/filter_node.h create mode 100644 src/silo/query_engine/operators/project_node.cpp create mode 100644 src/silo/query_engine/operators/project_node.h create mode 100644 src/silo/query_engine/operators/scan_node.cpp create mode 100644 src/silo/query_engine/operators/scan_node.h create mode 100644 src/silo/query_engine/operators/unresolved_insertions_node.h create mode 100644 src/silo/query_engine/operators/unresolved_most_recent_common_ancestor_node.h create mode 100644 src/silo/query_engine/operators/unresolved_mutations_node.h create mode 100644 src/silo/query_engine/operators/unresolved_phylo_subtree_node.h create mode 100644 src/silo/query_engine/saneql/ast.cpp create mode 100644 src/silo/query_engine/saneql/ast.h create mode 100644 src/silo/query_engine/saneql/ast_to_query.cpp create mode 100644 src/silo/query_engine/saneql/ast_to_query.h create mode 100644 src/silo/query_engine/saneql/function_registry.cpp create mode 100644 src/silo/query_engine/saneql/function_registry.h create mode 100644 src/silo/query_engine/saneql/lexer.cpp create mode 100644 src/silo/query_engine/saneql/lexer.h create mode 100644 src/silo/query_engine/saneql/lexer.test.cpp create mode 100644 src/silo/query_engine/saneql/parse_exception.cpp create mode 100644 src/silo/query_engine/saneql/parse_exception.h create mode 100644 src/silo/query_engine/saneql/parser.cpp create mode 100644 src/silo/query_engine/saneql/parser.h create mode 100644 src/silo/query_engine/saneql/parser.test.cpp create mode 100644 src/silo/query_engine/saneql/source_location.h create mode 100644 src/silo/query_engine/saneql/token.cpp create mode 100644 src/silo/query_engine/saneql/token.h diff --git a/.clang-tidy b/.clang-tidy index 7e635f05b..6597f7c6b 100644 --- a/.clang-tidy +++ b/.clang-tidy @@ -34,9 +34,9 @@ Checks: >- # positives for overridden 'T& operator=(T&)'. CheckOptions: - key: readability-identifier-length.IgnoredVariableNames - value: '_|to' + value: '_|to|it|op' - key: readability-identifier-length.IgnoredParameterNames - value: '_|to' + value: '_|to|it|op' - key: readability-identifier-naming.NamespaceCase value: lower_case - key: readability-identifier-naming.ClassCase diff --git a/endToEndTests/test/invalidQueries/GroupByLineageInvalidOrderBy.json b/endToEndTests/test/invalidQueries/GroupByLineageInvalidOrderBy.json index 86e4c8342..915bd405a 100644 --- a/endToEndTests/test/invalidQueries/GroupByLineageInvalidOrderBy.json +++ b/endToEndTests/test/invalidQueries/GroupByLineageInvalidOrderBy.json @@ -1,15 +1,6 @@ { "testCaseName": "An invalid orderByField for the Aggregated Action", - "query": { - "action": { - "type": "Aggregated", - "groupByFields": ["pango_lineage"], - "orderByFields": ["age"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.groupBy({count:=count()},{pango_lineage}).orderBy({age})", "expectedError": { "error": "Bad request", "message": "OrderByField age is not contained in the result of this operation. Allowed values are pango_lineage, count." diff --git a/endToEndTests/test/invalidQueries/MostRecentCommonAncestor_invalidBoolean.json b/endToEndTests/test/invalidQueries/MostRecentCommonAncestor_invalidBoolean.json index db2b104ee..18037d90f 100644 --- a/endToEndTests/test/invalidQueries/MostRecentCommonAncestor_invalidBoolean.json +++ b/endToEndTests/test/invalidQueries/MostRecentCommonAncestor_invalidBoolean.json @@ -1,20 +1,8 @@ { "testCaseName": "MostRecentCommonAncestor action with invalid printNodesNotInTree value ", - "query": { - "action": { - "type": "MostRecentCommonAncestor", - "columnName": "primary_key", - "orderByFields": ["mrcaNode"], - "printNodesNotInTree": "T" - }, - "filterExpression": { - "type": "StringEquals", - "column": "country", - "value": "Switzerland" - } - }, + "query": "default.filter(country = 'Switzerland').mostRecentCommonAncestor('primary_key', printNodesNotInTree:='T').orderBy({mrcaNode})", "expectedError": { "error": "Bad request", - "message": "error: 'printNodesNotInTree' field in MostRecentCommonAncestor action must be a boolean" + "message": "expected boolean literal at 1:102" } } diff --git a/endToEndTests/test/invalidQueries/MostRecentCommonAncestor_invalidColumn.json b/endToEndTests/test/invalidQueries/MostRecentCommonAncestor_invalidColumn.json index 9486dbec8..53ebbfb87 100644 --- a/endToEndTests/test/invalidQueries/MostRecentCommonAncestor_invalidColumn.json +++ b/endToEndTests/test/invalidQueries/MostRecentCommonAncestor_invalidColumn.json @@ -1,18 +1,6 @@ { "testCaseName": "MostRecentCommonAncestor action on column not labelled as isPhyloTreeField", - "query": { - "action": { - "type": "MostRecentCommonAncestor", - "columnName": "country", - "orderByFields": ["mrcaNode"], - "printNodesNotInTree": true - }, - "filterExpression": { - "type": "StringEquals", - "column": "country", - "value": "Switzerland" - } - }, + "query": "default.filter(country = 'Switzerland').mostRecentCommonAncestor('country', printNodesNotInTree:=true).orderBy({mrcaNode})", "expectedError": { "error": "Bad request", "message": "MostRecentCommonAncestor action cannot be called on Column 'country' as it does not have a phylogenetic tree associated with it" diff --git a/endToEndTests/test/invalidQueries/OffsetNegative.json b/endToEndTests/test/invalidQueries/OffsetNegative.json index c0b13c80e..9d3bd40e7 100644 --- a/endToEndTests/test/invalidQueries/OffsetNegative.json +++ b/endToEndTests/test/invalidQueries/OffsetNegative.json @@ -1,28 +1,6 @@ { "testCaseName": "Details action with negative offset", - "query": { - "action": { - "type": "Details", - "orderByFields": ["primary_key"], - "offset": -1231 - }, - "filterExpression": { - "type": "And", - "children": [ - { - "type": "StringEquals", - "column": "country", - "value": "Switzerland" - }, - { - "type": "Lineage", - "column": "pango_lineage", - "value": "B.1.1.7", - "includeSublineages": true - } - ] - } - }, + "query": "default.filter((country = 'Switzerland') && (pango_lineage.lineage('B.1.1.7', includeSublineages:=true))).orderBy({primary_key}).offset(-1231)", "expectedError": { "error": "Bad request", "message": "If the action contains an offset, it must be a non-negative number" diff --git a/endToEndTests/test/invalidQueries/Subtree_invalidColumn.json b/endToEndTests/test/invalidQueries/Subtree_invalidColumn.json index f66bcb747..0c7975882 100644 --- a/endToEndTests/test/invalidQueries/Subtree_invalidColumn.json +++ b/endToEndTests/test/invalidQueries/Subtree_invalidColumn.json @@ -1,18 +1,6 @@ { "testCaseName": "PhyloSubtree action on column not labelled as isPhyloTreeField", - "query": { - "action": { - "type": "PhyloSubtree", - "columnName": "country", - "orderByFields": ["subtreeNewick"], - "printNodesNotInTree": true - }, - "filterExpression": { - "type": "StringEquals", - "column": "country", - "value": "Switzerland" - } - }, + "query": "default.filter(country = 'Switzerland').phyloSubtree('country', printNodesNotInTree:=true).orderBy({subtreeNewick})", "expectedError": { "error": "Bad request", "message": "PhyloSubtree action cannot be called on Column 'country' as it does not have a phylogenetic tree associated with it" diff --git a/endToEndTests/test/invalidQueries/aa_mutations_no_proportion.json b/endToEndTests/test/invalidQueries/aa_mutations_no_proportion.json index 31eac73b8..d7850dc16 100644 --- a/endToEndTests/test/invalidQueries/aa_mutations_no_proportion.json +++ b/endToEndTests/test/invalidQueries/aa_mutations_no_proportion.json @@ -1,15 +1,8 @@ { "testCaseName": "AminoAcidMutations action without minProportion", - "query": { - "action": { - "type": "AminoAcidMutations" - }, - "filterExpression": { - "type": "False" - } - }, + "query": "default.filter(false).aminoAcidMutations()", "expectedError": { "error": "Bad request", - "message": "Mutations action must contain the field minProportion of type number with limits [0.0, 1.0]. Only mutations are returned if the proportion of sequences having this mutation, is at least minProportion" + "message": "aminoAcidMutations() requires argument 'minProportion'" } } diff --git a/endToEndTests/test/invalidQueries/insertionContains_empty.json b/endToEndTests/test/invalidQueries/insertionContains_empty.json index f931d86de..8c0dbe633 100644 --- a/endToEndTests/test/invalidQueries/insertionContains_empty.json +++ b/endToEndTests/test/invalidQueries/insertionContains_empty.json @@ -1,15 +1,6 @@ { "testCaseName": "Insertion Contains empty", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "InsertionContains", - "position": 25701, - "value": "" - } - }, + "query": "default.filter(insertionContains(position:=25701, value:='')).groupBy({count:=count()})", "expectedError": { "error": "Bad request", "message": "The field 'value' in an InsertionContains expression must not be an empty string" diff --git a/endToEndTests/test/invalidQueries/insertionContains_invalidPattern.json b/endToEndTests/test/invalidQueries/insertionContains_invalidPattern.json index efbf963bb..9483e033b 100644 --- a/endToEndTests/test/invalidQueries/insertionContains_invalidPattern.json +++ b/endToEndTests/test/invalidQueries/insertionContains_invalidPattern.json @@ -1,15 +1,6 @@ { "testCaseName": "Insertion Contains with invalid pattern CC+++", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "InsertionContains", - "position": 25701, - "value": "CC+++" - } - }, + "query": "default.filter(insertionContains(position:=25701, value:='CC+++')).groupBy({count:=count()})", "expectedError": { "error": "Bad request", "message": "The field 'value' in the InsertionContains expression does not contain a valid regex pattern: \"CC+++\". It must only consist of nucleotide symbols and the regex symbol '.*'. Also note that the stop codon * must be escaped correctly with a \\ in amino acid queries." diff --git a/endToEndTests/test/invalidQueries/insertionContains_invalidPattern2.json b/endToEndTests/test/invalidQueries/insertionContains_invalidPattern2.json index a431b13fa..802dd39e0 100644 --- a/endToEndTests/test/invalidQueries/insertionContains_invalidPattern2.json +++ b/endToEndTests/test/invalidQueries/insertionContains_invalidPattern2.json @@ -1,15 +1,6 @@ { "testCaseName": "Insertion Contains with invalid pattern CC..*", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "InsertionContains", - "position": 25701, - "value": "CC..*" - } - }, + "query": "default.filter(insertionContains(position:=25701, value:='CC..*')).groupBy({count:=count()})", "expectedError": { "error": "Bad request", "message": "The field 'value' in the InsertionContains expression does not contain a valid regex pattern: \"CC..*\". It must only consist of nucleotide symbols and the regex symbol '.*'. Also note that the stop codon * must be escaped correctly with a \\ in amino acid queries." diff --git a/endToEndTests/test/invalidQueries/insertionsAAseparation.json b/endToEndTests/test/invalidQueries/insertionsAAseparation.json index a85227067..933ae82bf 100644 --- a/endToEndTests/test/invalidQueries/insertionsAAseparation.json +++ b/endToEndTests/test/invalidQueries/insertionsAAseparation.json @@ -1,14 +1,6 @@ { "testCaseName": "The insertions action with an amino acid sequence name", - "query": { - "action": { - "type": "Insertions", - "sequenceNames": ["S"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.insertions(sequenceNames:={S})", "expectedError": { "error": "Bad request", "message": "The database does not contain the Nucleotide sequence 'S'" diff --git a/endToEndTests/test/invalidQueries/insertionsInvalidSequence.json b/endToEndTests/test/invalidQueries/insertionsInvalidSequence.json index 77c6e7aae..3486e9d08 100644 --- a/endToEndTests/test/invalidQueries/insertionsInvalidSequence.json +++ b/endToEndTests/test/invalidQueries/insertionsInvalidSequence.json @@ -1,14 +1,6 @@ { "testCaseName": "The insertions action with an invalid sequence", - "query": { - "action": { - "type": "Insertions", - "sequenceNames": ["notAValidSequence"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.insertions(sequenceNames:={notAValidSequence})", "expectedError": { "error": "Bad request", "message": "The database does not contain the Nucleotide sequence 'notAValidSequence'" diff --git a/endToEndTests/test/invalidQueries/invalidAction.json b/endToEndTests/test/invalidQueries/invalidAction.json index bfa6cff9e..db3c416e0 100644 --- a/endToEndTests/test/invalidQueries/invalidAction.json +++ b/endToEndTests/test/invalidQueries/invalidAction.json @@ -1,18 +1,8 @@ { "testCaseName": "query with invalid action", - "query": { - "action": { - "type": "invalid action" - }, - "filterExpression": { - "type": "N-Of", - "numberOfMatchers": 2, - "matchExactly": false, - "children": [] - } - }, + "query": "default.filter(true).invalidAction()", "expectedError": { "error": "Bad request", - "message": "invalid action is not a valid action" + "message": "unknown function 'invalidAction' at 1:22" } } diff --git a/endToEndTests/test/invalidQueries/invalidMutationsMinProportion.json b/endToEndTests/test/invalidQueries/invalidMutationsMinProportion.json index f5bbf98ab..df0f16fd1 100644 --- a/endToEndTests/test/invalidQueries/invalidMutationsMinProportion.json +++ b/endToEndTests/test/invalidQueries/invalidMutationsMinProportion.json @@ -1,17 +1,6 @@ { "testCaseName": "query with invalid minProportions in Mutations action", - "query": { - "action": { - "type": "Mutations", - "minProportion": -0.5 - }, - "filterExpression": { - "type": "N-Of", - "numberOfMatchers": 2, - "matchExactly": false, - "children": [] - } - }, + "query": "default.filter(true).mutations(minProportion:=-0.5)", "expectedError": { "error": "Bad request", "message": "Invalid proportion: minProportion must be in interval [0.0, 1.0]" diff --git a/endToEndTests/test/invalidQueries/nuc_mutations_no_proportion.json b/endToEndTests/test/invalidQueries/nuc_mutations_no_proportion.json index 5b281ee82..96d03e4da 100644 --- a/endToEndTests/test/invalidQueries/nuc_mutations_no_proportion.json +++ b/endToEndTests/test/invalidQueries/nuc_mutations_no_proportion.json @@ -1,15 +1,8 @@ { "testCaseName": "Mutations action without minProportion", - "query": { - "action": { - "type": "Mutations" - }, - "filterExpression": { - "type": "False" - } - }, + "query": "default.filter(false).mutations()", "expectedError": { "error": "Bad request", - "message": "Mutations action must contain the field minProportion of type number with limits [0.0, 1.0]. Only mutations are returned if the proportion of sequences having this mutation, is at least minProportion" + "message": "mutations() requires argument 'minProportion'" } } diff --git a/endToEndTests/test/invalidQueries/phyloDescendantOf_invalidNode.json b/endToEndTests/test/invalidQueries/phyloDescendantOf_invalidNode.json index aa08a6edc..c07cd39bd 100644 --- a/endToEndTests/test/invalidQueries/phyloDescendantOf_invalidNode.json +++ b/endToEndTests/test/invalidQueries/phyloDescendantOf_invalidNode.json @@ -1,15 +1,6 @@ { "testCaseName": "PhyloDescendantOf called on aa node that does not exist in the tree", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "PhyloDescendantOf", - "column": "usherTree", - "internalNode": "NON_EXISTING_NODE" - } - }, + "query": "default.filter(usherTree.phyloDescendantOf('NON_EXISTING_NODE')).groupBy({count:=count()})", "expectedError": { "error": "Bad request", "message": "The node 'NON_EXISTING_NODE' does not exist in the phylogenetic tree of column 'usherTree'" diff --git a/endToEndTests/test/invalidQueries/sequencePos0Filter.json b/endToEndTests/test/invalidQueries/sequencePos0Filter.json index c2da51d1f..737edfb04 100644 --- a/endToEndTests/test/invalidQueries/sequencePos0Filter.json +++ b/endToEndTests/test/invalidQueries/sequencePos0Filter.json @@ -1,15 +1,6 @@ { "testCaseName": "Filtering for the position 0", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "position": 0, - "symbol": "-", - "type": "NucleotideEquals" - } - }, + "query": "default.filter(nucleotideEquals(position:=0, symbol:='-')).groupBy({count:=count()})", "expectedError": { "error": "Bad request", "message": "The field 'position' is 1-indexed. Value of 0 not allowed." diff --git a/endToEndTests/test/invalidQueries/stringSearch_nonExistingColumn.json b/endToEndTests/test/invalidQueries/stringSearch_nonExistingColumn.json index a58e4508e..a6ebd430b 100644 --- a/endToEndTests/test/invalidQueries/stringSearch_nonExistingColumn.json +++ b/endToEndTests/test/invalidQueries/stringSearch_nonExistingColumn.json @@ -1,15 +1,6 @@ { "testCaseName": "StringSearch that wants to match a non-existing column", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "StringSearch", - "column": "this_column_does_not_exist", - "searchExpression": "test" - } - }, + "query": "default.filter(this_column_does_not_exist.like('test')).groupBy({count:=count()})", "expectedError": { "error": "Bad request", "message": "The database does not contain the string column 'this_column_does_not_exist'" diff --git a/endToEndTests/test/invalidQueries/stringSearch_nonStringColumn.json b/endToEndTests/test/invalidQueries/stringSearch_nonStringColumn.json index f308661d3..ef59e9c24 100644 --- a/endToEndTests/test/invalidQueries/stringSearch_nonStringColumn.json +++ b/endToEndTests/test/invalidQueries/stringSearch_nonStringColumn.json @@ -1,15 +1,6 @@ { "testCaseName": "StringSearch that wants to match a non-string column", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "StringSearch", - "column": "age", - "searchExpression": "test" - } - }, + "query": "default.filter(age.like('test')).groupBy({count:=count()})", "expectedError": { "error": "Bad request", "message": "The database does not contain the string column 'age'" diff --git a/endToEndTests/test/invalidQueries/stringSearch_withInvalidRegex.json b/endToEndTests/test/invalidQueries/stringSearch_withInvalidRegex.json index 84d4790bf..f44a2d23a 100644 --- a/endToEndTests/test/invalidQueries/stringSearch_withInvalidRegex.json +++ b/endToEndTests/test/invalidQueries/stringSearch_withInvalidRegex.json @@ -1,15 +1,6 @@ { "testCaseName": "StringSearch that contains an invalid regex", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "StringSearch", - "column": "primary_key", - "searchExpression": "\\" - } - }, + "query": "default.filter(primary_key.like('\\\\')).groupBy({count:=count()})", "expectedError": { "error": "Bad request", "message": "Invalid Regular Expression. The parsing of the regular expression failed with the error 'trailing \\'. See https://github.com/google/re2/wiki/Syntax for a Syntax specification." diff --git a/endToEndTests/test/queries/AASymbolEquals.json b/endToEndTests/test/queries/AASymbolEquals.json index 28d2d57cc..f21fa8be8 100644 --- a/endToEndTests/test/queries/AASymbolEquals.json +++ b/endToEndTests/test/queries/AASymbolEquals.json @@ -1,16 +1,6 @@ { "testCaseName": "Amino acid equals gene E", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "AminoAcidEquals", - "position": 2, - "symbol": "Y", - "sequenceName": "E" - } - }, + "query": "default.filter(aminoAcidEquals(position:=2, symbol:='Y', sequenceName:='E')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 99 diff --git a/endToEndTests/test/queries/And.json b/endToEndTests/test/queries/And.json index 131762cf5..fb7a52610 100644 --- a/endToEndTests/test/queries/And.json +++ b/endToEndTests/test/queries/And.json @@ -1,26 +1,6 @@ { "testCaseName": "And Query with two children", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "And", - "children": [ - { - "type": "StringEquals", - "column": "country", - "value": "Switzerland" - }, - { - "type": "Lineage", - "column": "pango_lineage", - "value": "B.1.1.7", - "includeSublineages": true - } - ] - } - }, + "query": "default.filter((country = 'Switzerland') && (pango_lineage.lineage('B.1.1.7', includeSublineages:=true))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 51 diff --git a/endToEndTests/test/queries/DetailsOrderBy.json b/endToEndTests/test/queries/DetailsOrderBy.json index cbc9a04e4..0c032a0ff 100644 --- a/endToEndTests/test/queries/DetailsOrderBy.json +++ b/endToEndTests/test/queries/DetailsOrderBy.json @@ -1,27 +1,6 @@ { "testCaseName": "Details action with order by field EPI_ISL", - "query": { - "action": { - "type": "Details", - "orderByFields": ["primary_key"] - }, - "filterExpression": { - "type": "And", - "children": [ - { - "type": "StringEquals", - "column": "country", - "value": "Switzerland" - }, - { - "type": "Lineage", - "column": "pango_lineage", - "value": "B.1.1.7", - "includeSublineages": true - } - ] - } - }, + "query": "default.filter((country = 'Switzerland') && (pango_lineage.lineage('B.1.1.7', includeSublineages:=true))).orderBy({primary_key}).project({age, country, date, division, primary_key, pango_lineage, qc_value, region, test_boolean_column, unsorted_date, usherTree})", "expectedQueryResult": [ { "age": 57, diff --git a/endToEndTests/test/queries/DetailsOrderByLimit.json b/endToEndTests/test/queries/DetailsOrderByLimit.json index 4034e3f94..103b32ce2 100644 --- a/endToEndTests/test/queries/DetailsOrderByLimit.json +++ b/endToEndTests/test/queries/DetailsOrderByLimit.json @@ -1,29 +1,6 @@ { "testCaseName": "Details action with order by, limit and offset", - "query": { - "action": { - "type": "Details", - "orderByFields": ["primary_key"], - "offset": 9, - "limit": 2 - }, - "filterExpression": { - "type": "And", - "children": [ - { - "type": "StringEquals", - "column": "country", - "value": "Switzerland" - }, - { - "type": "Lineage", - "column": "pango_lineage", - "value": "B.1.1.7", - "includeSublineages": true - } - ] - } - }, + "query": "default.filter((country = 'Switzerland') && (pango_lineage.lineage('B.1.1.7', includeSublineages:=true))).orderBy({primary_key}).offset(9).limit(2).project({age, country, date, division, primary_key, pango_lineage, qc_value, region, test_boolean_column, unsorted_date, usherTree})", "expectedQueryResult": [ { "age": 4, diff --git a/endToEndTests/test/queries/Exact.json b/endToEndTests/test/queries/Exact.json index 3c1ff1312..23e1f14b6 100644 --- a/endToEndTests/test/queries/Exact.json +++ b/endToEndTests/test/queries/Exact.json @@ -1,18 +1,6 @@ { "testCaseName": "Exact Query", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Exact", - "child": { - "type": "NucleotideEquals", - "position": 122, - "symbol": "A" - } - } - }, + "query": "default.filter(exact(nucleotideEquals(position:=122, symbol:='A'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 93 diff --git a/endToEndTests/test/queries/GroupByDivision.json b/endToEndTests/test/queries/GroupByDivision.json index cb9807b5e..41c90f573 100644 --- a/endToEndTests/test/queries/GroupByDivision.json +++ b/endToEndTests/test/queries/GroupByDivision.json @@ -1,15 +1,6 @@ { "testCaseName": "Group all sequences by division", - "query": { - "action": { - "type": "Aggregated", - "groupByFields": ["division"], - "orderByFields": ["division", "count"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.groupBy({count:=count()},{division}).orderBy({division, count})", "expectedQueryResult": [ { "count": 2, diff --git a/endToEndTests/test/queries/GroupByLineage.json b/endToEndTests/test/queries/GroupByLineage.json index 1bdb12626..70a2b2d96 100644 --- a/endToEndTests/test/queries/GroupByLineage.json +++ b/endToEndTests/test/queries/GroupByLineage.json @@ -1,15 +1,6 @@ { "testCaseName": "Group all sequences by lineage", - "query": { - "action": { - "type": "Aggregated", - "groupByFields": ["pango_lineage"], - "orderByFields": ["pango_lineage"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.groupBy({count:=count()},{pango_lineage}).orderBy({pango_lineage})", "expectedQueryResult": [ { "count": 1, diff --git a/endToEndTests/test/queries/GroupByLineageOrderByCountLimit.json b/endToEndTests/test/queries/GroupByLineageOrderByCountLimit.json index 1e84ef33c..0f6542e9a 100644 --- a/endToEndTests/test/queries/GroupByLineageOrderByCountLimit.json +++ b/endToEndTests/test/queries/GroupByLineageOrderByCountLimit.json @@ -1,21 +1,6 @@ { "testCaseName": "Group all sequences by lineage and order by count descending, with limit 4", - "query": { - "action": { - "type": "Aggregated", - "groupByFields": ["pango_lineage"], - "orderByFields": [ - { - "field": "count", - "order": "descending" - } - ], - "limit": 4 - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.groupBy({count:=count()},{pango_lineage}).orderBy({count.desc()}).limit(4)", "expectedQueryResult": [ { "count": 48, diff --git a/endToEndTests/test/queries/HasAAMutation.json b/endToEndTests/test/queries/HasAAMutation.json index 032f60e7b..cadf0b38f 100644 --- a/endToEndTests/test/queries/HasAAMutation.json +++ b/endToEndTests/test/queries/HasAAMutation.json @@ -1,15 +1,6 @@ { "testCaseName": "Amino acid mutation gene S", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "HasAminoAcidMutation", - "position": 28, - "sequenceName": "S" - } - }, + "query": "default.filter(hasAAMutation(position:=28, sequenceName:='S')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 1 diff --git a/endToEndTests/test/queries/LimitLargerThanTable.json b/endToEndTests/test/queries/LimitLargerThanTable.json index ff2bfd384..393f1287f 100644 --- a/endToEndTests/test/queries/LimitLargerThanTable.json +++ b/endToEndTests/test/queries/LimitLargerThanTable.json @@ -1,28 +1,6 @@ { "testCaseName": "Limit larger than table", - "query": { - "action": { - "type": "Details", - "orderByFields": ["primary_key"], - "limit": 1231241 - }, - "filterExpression": { - "type": "And", - "children": [ - { - "type": "StringEquals", - "column": "country", - "value": "Switzerland" - }, - { - "type": "Lineage", - "column": "pango_lineage", - "value": "B.1.1.7", - "includeSublineages": true - } - ] - } - }, + "query": "default.filter((country = 'Switzerland') && (pango_lineage.lineage('B.1.1.7', includeSublineages:=true))).orderBy({primary_key}).limit(1231241).project({age, country, date, division, primary_key, pango_lineage, qc_value, region, test_boolean_column, unsorted_date, usherTree})", "expectedQueryResult": [ { "age": 57, diff --git a/endToEndTests/test/queries/Maybe.json b/endToEndTests/test/queries/Maybe.json index d36a8f95b..ebd864972 100644 --- a/endToEndTests/test/queries/Maybe.json +++ b/endToEndTests/test/queries/Maybe.json @@ -1,18 +1,6 @@ { "testCaseName": "Maybe Query", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Maybe", - "child": { - "type": "NucleotideEquals", - "position": 122, - "symbol": "A" - } - } - }, + "query": "default.filter(maybe(nucleotideEquals(position:=122, symbol:='A'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 96 diff --git a/endToEndTests/test/queries/MostRecentCommonAncestor_SimpleQuery.json b/endToEndTests/test/queries/MostRecentCommonAncestor_SimpleQuery.json index 73f94043b..ba0630037 100644 --- a/endToEndTests/test/queries/MostRecentCommonAncestor_SimpleQuery.json +++ b/endToEndTests/test/queries/MostRecentCommonAncestor_SimpleQuery.json @@ -1,27 +1,6 @@ { "testCaseName": "MostRecentCommonAncestor query returns correct node", - "query": { - "action": { - "type": "MostRecentCommonAncestor", - "columnName": "usherTree", - "orderByFields": ["mrcaNode"] - }, - "filterExpression": { - "type": "Or", - "children": [ - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_11" - }, - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_22" - } - ] - } - }, + "query": "default.filter((primary_key = 'key_11') || (primary_key = 'key_22')).mostRecentCommonAncestor('usherTree').orderBy({mrcaNode})", "expectedQueryResult": [ { "mrcaNode": "NODE_0000072", diff --git a/endToEndTests/test/queries/MostRecentCommonAncestor_onlyMissingNodes.json b/endToEndTests/test/queries/MostRecentCommonAncestor_onlyMissingNodes.json index 4e5acf6e8..779edf9e0 100644 --- a/endToEndTests/test/queries/MostRecentCommonAncestor_onlyMissingNodes.json +++ b/endToEndTests/test/queries/MostRecentCommonAncestor_onlyMissingNodes.json @@ -1,28 +1,6 @@ { "testCaseName": "MostRecentCommonAncestor query returns correct node with only missing nodes", - "query": { - "action": { - "type": "MostRecentCommonAncestor", - "columnName": "usherTree", - "orderByFields": ["mrcaNode"], - "printNodesNotInTree": true - }, - "filterExpression": { - "type": "Or", - "children": [ - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_1" - }, - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_35" - } - ] - } - }, + "query": "default.filter((primary_key = 'key_1') || (primary_key = 'key_35')).mostRecentCommonAncestor('usherTree', printNodesNotInTree:=true).orderBy({mrcaNode})", "expectedQueryResult": [ { "mrcaNode": null, diff --git a/endToEndTests/test/queries/MostRecentCommonAncestor_withMissingNode.json b/endToEndTests/test/queries/MostRecentCommonAncestor_withMissingNode.json index a7684add2..a6b1516d1 100644 --- a/endToEndTests/test/queries/MostRecentCommonAncestor_withMissingNode.json +++ b/endToEndTests/test/queries/MostRecentCommonAncestor_withMissingNode.json @@ -1,43 +1,6 @@ { "testCaseName": "MRCA query returns correct node with missingNode", - "query": { - "action": { - "type": "MostRecentCommonAncestor", - "columnName": "usherTree", - "orderByFields": ["mrcaNode"], - "printNodesNotInTree": true - }, - "filterExpression": { - "type": "Or", - "children": [ - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_1" - }, - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_12" - }, - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_7" - }, - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_35" - }, - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_29" - } - ] - } - }, + "query": "default.filter((primary_key = 'key_1') || (primary_key = 'key_12') || (primary_key = 'key_7') || (primary_key = 'key_35') || (primary_key = 'key_29')).mostRecentCommonAncestor('usherTree', printNodesNotInTree:=true).orderBy({mrcaNode})", "expectedQueryResult": [ { "mrcaNode": "NODE_0000096", diff --git a/endToEndTests/test/queries/N_notIndexed.json b/endToEndTests/test/queries/N_notIndexed.json index c7807837f..3483cb96d 100644 --- a/endToEndTests/test/queries/N_notIndexed.json +++ b/endToEndTests/test/queries/N_notIndexed.json @@ -1,15 +1,6 @@ { "testCaseName": "Nucleotide equals query for symbol N", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "NucleotideEquals", - "position": 122, - "symbol": "N" - } - }, + "query": "default.filter(nucleotideEquals(position:=122, symbol:='N')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 3 diff --git a/endToEndTests/test/queries/Not.json b/endToEndTests/test/queries/Not.json index 106bf805e..5dbd1cf80 100644 --- a/endToEndTests/test/queries/Not.json +++ b/endToEndTests/test/queries/Not.json @@ -1,18 +1,6 @@ { "testCaseName": "Not Query", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Not", - "child": { - "type": "StringEquals", - "column": "country", - "value": "Switzerland" - } - } - }, + "query": "default.filter(!(country = 'Switzerland')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 0 diff --git a/endToEndTests/test/queries/NotUnindexedStringEquals.json b/endToEndTests/test/queries/NotUnindexedStringEquals.json index f151ce995..dfde8cdba 100644 --- a/endToEndTests/test/queries/NotUnindexedStringEquals.json +++ b/endToEndTests/test/queries/NotUnindexedStringEquals.json @@ -1,18 +1,6 @@ { "testCaseName": "Not Query on unindexed string column", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Not", - "child": { - "type": "StringEquals", - "column": "primary_key", - "value": "key_41" - } - } - }, + "query": "default.filter(!(primary_key = 'key_41')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 99 diff --git a/endToEndTests/test/queries/Offset0.json b/endToEndTests/test/queries/Offset0.json index 8604248c1..f268b52a1 100644 --- a/endToEndTests/test/queries/Offset0.json +++ b/endToEndTests/test/queries/Offset0.json @@ -1,28 +1,6 @@ { "testCaseName": "Offset by 0", - "query": { - "action": { - "type": "Details", - "orderByFields": ["primary_key"], - "offset": 0 - }, - "filterExpression": { - "type": "And", - "children": [ - { - "type": "StringEquals", - "column": "country", - "value": "Switzerland" - }, - { - "type": "Lineage", - "column": "pango_lineage", - "value": "B.1.1.7", - "includeSublineages": true - } - ] - } - }, + "query": "default.filter((country = 'Switzerland') && (pango_lineage.lineage('B.1.1.7', includeSublineages:=true))).orderBy({primary_key}).offset(0).project({age, country, date, division, primary_key, pango_lineage, qc_value, region, test_boolean_column, unsorted_date, usherTree})", "expectedQueryResult": [ { "age": 57, diff --git a/endToEndTests/test/queries/OffsetFull.json b/endToEndTests/test/queries/OffsetFull.json index 53c03fa35..f8250fd2d 100644 --- a/endToEndTests/test/queries/OffsetFull.json +++ b/endToEndTests/test/queries/OffsetFull.json @@ -1,15 +1,5 @@ { "testCaseName": "Offset by exactly table size", - "query": { - "action": { - "type": "Details", - "orderByFields": ["primary_key"], - "offset": 100, - "limit": 90 - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.orderBy({primary_key}).offset(100).limit(90)", "expectedQueryResult": [] } diff --git a/endToEndTests/test/queries/OffsetLargerThanTable.json b/endToEndTests/test/queries/OffsetLargerThanTable.json index edaaeb32b..829bc2ac1 100644 --- a/endToEndTests/test/queries/OffsetLargerThanTable.json +++ b/endToEndTests/test/queries/OffsetLargerThanTable.json @@ -1,27 +1,5 @@ { "testCaseName": "Offset is larger than the table", - "query": { - "action": { - "type": "Details", - "orderByFields": ["primary_key"], - "offset": 1231241 - }, - "filterExpression": { - "type": "And", - "children": [ - { - "type": "StringEquals", - "column": "country", - "value": "Switzerland" - }, - { - "type": "Lineage", - "column": "pango_lineage", - "value": "B.1.1.7", - "includeSublineages": true - } - ] - } - }, + "query": "default.filter((country = 'Switzerland') && (pango_lineage.lineage('B.1.1.7', includeSublineages:=true))).orderBy({primary_key}).offset(1231241)", "expectedQueryResult": [] } diff --git a/endToEndTests/test/queries/OffsetLimitOverlap.json b/endToEndTests/test/queries/OffsetLimitOverlap.json index 23b6951d2..55cd173a2 100644 --- a/endToEndTests/test/queries/OffsetLimitOverlap.json +++ b/endToEndTests/test/queries/OffsetLimitOverlap.json @@ -1,16 +1,6 @@ { "testCaseName": "Offset overlaps with results to show", - "query": { - "action": { - "type": "Details", - "orderByFields": ["primary_key"], - "offset": 90, - "limit": 90 - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.orderBy({primary_key}).offset(90).limit(90).project({age, country, date, division, primary_key, pango_lineage, qc_value, region, test_boolean_column, unsorted_date, usherTree})", "expectedQueryResult": [ { "age": 50, diff --git a/endToEndTests/test/queries/Or.json b/endToEndTests/test/queries/Or.json index e03222a9b..cf9e6f521 100644 --- a/endToEndTests/test/queries/Or.json +++ b/endToEndTests/test/queries/Or.json @@ -1,27 +1,6 @@ { "testCaseName": "Or Query with two children", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Or", - "children": [ - { - "type": "Lineage", - "column": "pango_lineage", - "value": "B.1.1.7", - "includeSublineages": false - }, - { - "type": "Lineage", - "column": "pango_lineage", - "value": "B.1.1.7", - "includeSublineages": true - } - ] - } - }, + "query": "default.filter((pango_lineage.lineage('B.1.1.7')) || (pango_lineage.lineage('B.1.1.7', includeSublineages:=true))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 51 diff --git a/endToEndTests/test/queries/OrderByAge.json b/endToEndTests/test/queries/OrderByAge.json index e03d9fc9d..9c8749f85 100644 --- a/endToEndTests/test/queries/OrderByAge.json +++ b/endToEndTests/test/queries/OrderByAge.json @@ -1,20 +1,6 @@ { "testCaseName": "Order By age column ascending", - "query": { - "action": { - "type": "Aggregated", - "groupByFields": ["age"], - "orderByFields": [ - { - "field": "age", - "order": "ascending" - } - ] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.groupBy({count:=count()},{age}).orderBy({age})", "expectedQueryResult": [ { "age": null, diff --git a/endToEndTests/test/queries/OrderByFloat.json b/endToEndTests/test/queries/OrderByFloat.json index ae558c168..e7752a7dd 100644 --- a/endToEndTests/test/queries/OrderByFloat.json +++ b/endToEndTests/test/queries/OrderByFloat.json @@ -1,15 +1,6 @@ { "testCaseName": "Order By QC float column ascending", - "query": { - "action": { - "type": "Aggregated", - "groupByFields": ["qc_value"], - "orderByFields": ["qc_value"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.groupBy({count:=count()},{qc_value}).orderBy({qc_value})", "expectedQueryResult": [ { "count": 2, diff --git a/endToEndTests/test/queries/OrderByFloatDesc.json b/endToEndTests/test/queries/OrderByFloatDesc.json index ade799987..890bf4afa 100644 --- a/endToEndTests/test/queries/OrderByFloatDesc.json +++ b/endToEndTests/test/queries/OrderByFloatDesc.json @@ -1,20 +1,6 @@ { "testCaseName": "Order By QC float column descending", - "query": { - "action": { - "type": "Aggregated", - "groupByFields": ["qc_value"], - "orderByFields": [ - { - "field": "qc_value", - "order": "descending" - } - ] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.groupBy({count:=count()},{qc_value}).orderBy({qc_value.desc()})", "expectedQueryResult": [ { "count": 10, diff --git a/endToEndTests/test/queries/OrderByFloatFiltered.json b/endToEndTests/test/queries/OrderByFloatFiltered.json index 26d355e40..9c07409c4 100644 --- a/endToEndTests/test/queries/OrderByFloatFiltered.json +++ b/endToEndTests/test/queries/OrderByFloatFiltered.json @@ -1,18 +1,6 @@ { "testCaseName": "Order By QC float column ascending after filtering", - "query": { - "action": { - "type": "Aggregated", - "groupByFields": ["qc_value"], - "orderByFields": ["qc_value"] - }, - "filterExpression": { - "from": 0.1, - "to": 2121.1, - "column": "qc_value", - "type": "FloatBetween" - } - }, + "query": "default.filter(qc_value.between(0.1, 2121.1)).groupBy({count:=count()},{qc_value}).orderBy({qc_value})", "expectedQueryResult": [ { "count": 10, diff --git a/endToEndTests/test/queries/PangoLineageAlias.json b/endToEndTests/test/queries/PangoLineageAlias.json index f0cd1c8fe..024312136 100644 --- a/endToEndTests/test/queries/PangoLineageAlias.json +++ b/endToEndTests/test/queries/PangoLineageAlias.json @@ -1,20 +1,6 @@ { "testCaseName": "Pango lineage should return only aliased", - "query": { - "action": { - "type": "Aggregated", - "groupByFields": ["pango_lineage"], - "orderByFields": [ - { - "field": "pango_lineage", - "order": "ascending" - } - ] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.groupBy({count:=count()},{pango_lineage}).orderBy({pango_lineage})", "expectedQueryResult": [ { "count": 1, diff --git a/endToEndTests/test/queries/PhyloDescendantOf.json b/endToEndTests/test/queries/PhyloDescendantOf.json index bb411c2d3..ef588c0cd 100644 --- a/endToEndTests/test/queries/PhyloDescendantOf.json +++ b/endToEndTests/test/queries/PhyloDescendantOf.json @@ -1,22 +1,6 @@ { "testCaseName": "PhyloDescendantOf should return only descendent nodes of a given node", - "query": { - "action": { - "type": "Aggregated", - "groupByFields": ["usherTree"], - "orderByFields": [ - { - "field": "usherTree", - "order": "ascending" - } - ] - }, - "filterExpression": { - "type": "PhyloDescendantOf", - "column": "usherTree", - "internalNode": "NODE_0000072" - } - }, + "query": "default.filter(usherTree.phyloDescendantOf('NODE_0000072')).groupBy({count:=count()},{usherTree}).orderBy({usherTree})", "expectedQueryResult": [ { "count": 1, diff --git a/endToEndTests/test/queries/Subtree_onlyMissingNodes.json b/endToEndTests/test/queries/Subtree_onlyMissingNodes.json index 8f7c087a3..db8d2cacc 100644 --- a/endToEndTests/test/queries/Subtree_onlyMissingNodes.json +++ b/endToEndTests/test/queries/Subtree_onlyMissingNodes.json @@ -1,33 +1,6 @@ { "testCaseName": "PhyloSubtree query returns correct nwk with only missing nodes", - "query": { - "action": { - "type": "PhyloSubtree", - "columnName": "usherTree", - "orderByFields": ["subtreeNewick"], - "printNodesNotInTree": true - }, - "filterExpression": { - "type": "Or", - "children": [ - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_1" - }, - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_35" - }, - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_29" - } - ] - } - }, + "query": "default.filter((primary_key = 'key_1') || (primary_key = 'key_35') || (primary_key = 'key_29')).phyloSubtree('usherTree', printNodesNotInTree:=true).orderBy({subtreeNewick})", "expectedQueryResult": [ { "subtreeNewick": "", diff --git a/endToEndTests/test/queries/Subtree_simpleQuery.json b/endToEndTests/test/queries/Subtree_simpleQuery.json index d08f415fa..d243b9aa9 100644 --- a/endToEndTests/test/queries/Subtree_simpleQuery.json +++ b/endToEndTests/test/queries/Subtree_simpleQuery.json @@ -1,28 +1,6 @@ { "testCaseName": "PhyloSubtree query returns correct nwk", - "query": { - "action": { - "type": "PhyloSubtree", - "columnName": "usherTree", - "orderByFields": ["subtreeNewick"], - "contractUnaryNodes": false - }, - "filterExpression": { - "type": "Or", - "children": [ - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_83" - }, - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_87" - } - ] - } - }, + "query": "default.filter((primary_key = 'key_83') || (primary_key = 'key_87')).phyloSubtree('usherTree').orderBy({subtreeNewick})", "expectedQueryResult": [ { "subtreeNewick": "((key_83:0.00027051)NODE_0000077:3.291e-05,((((key_87:1e-06)NODE_0000082:0.00013487)NODE_0000081:3.368e-05)NODE_0000080:6.689e-05)NODE_0000079:1e-06)NODE_0000076;", diff --git a/endToEndTests/test/queries/Subtree_simple_query_without_unary_nodes.json b/endToEndTests/test/queries/Subtree_simple_query_without_unary_nodes.json index aeb273707..62f65a38f 100644 --- a/endToEndTests/test/queries/Subtree_simple_query_without_unary_nodes.json +++ b/endToEndTests/test/queries/Subtree_simple_query_without_unary_nodes.json @@ -1,28 +1,6 @@ { "testCaseName": "PhyloSubtree query returns correct nwk without unary nodes", - "query": { - "action": { - "type": "PhyloSubtree", - "columnName": "usherTree", - "orderByFields": ["subtreeNewick"], - "contractUnaryNodes": true - }, - "filterExpression": { - "type": "Or", - "children": [ - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_83" - }, - { - "type": "StringEquals", - "column": "primary_key", - "value": "key_87" - } - ] - } - }, + "query": "default.filter((primary_key = 'key_83') || (primary_key = 'key_87')).phyloSubtree('usherTree', contractUnaryNodes:=true).orderBy({subtreeNewick})", "expectedQueryResult": [ { "subtreeNewick": "(key_83:0.00030342,key_87:0.00023744)NODE_0000076;", diff --git a/endToEndTests/test/queries/Subtree_withMissingNode.json b/endToEndTests/test/queries/Subtree_withMissingNode.json index b98383c3a..6529fffa7 100644 --- a/endToEndTests/test/queries/Subtree_withMissingNode.json +++ b/endToEndTests/test/queries/Subtree_withMissingNode.json @@ -1,38 +1,6 @@ { "testCaseName": "PhyloSubtree query returns correct nwk with missingNode", - "query": { - "action": { - "type": "PhyloSubtree", - "columnName": "usherTree", - "orderByFields": ["subtreeNewick"], - "printNodesNotInTree": true - }, - "filterExpression": { - "type": "Or", - "children": [ - { - "type": "StringEquals", - "column": "usherTree", - "value": "key_1" - }, - { - "type": "StringEquals", - "column": "usherTree", - "value": "key_12" - }, - { - "type": "StringEquals", - "column": "usherTree", - "value": "key_7" - }, - { - "type": "StringEquals", - "column": "usherTree", - "value": "key_35" - } - ] - } - }, + "query": "default.filter((usherTree = 'key_1') || (usherTree = 'key_12') || (usherTree = 'key_7') || (usherTree = 'key_35')).phyloSubtree('usherTree', printNodesNotInTree:=true).orderBy({subtreeNewick})", "expectedQueryResult": [ { "subtreeNewick": "(key_7:0.00010761,key_12:0.00013378)NODE_0000096;", diff --git a/endToEndTests/test/queries/aaInsertionsAction.json b/endToEndTests/test/queries/aaInsertionsAction.json index 6151e03b5..3291141e4 100644 --- a/endToEndTests/test/queries/aaInsertionsAction.json +++ b/endToEndTests/test/queries/aaInsertionsAction.json @@ -1,14 +1,6 @@ { "testCaseName": "amino acid insertions action", - "query": { - "action": { - "type": "AminoAcidInsertions", - "orderByFields": ["insertion", "position"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.aminoAcidInsertions().orderBy({insertion, position})", "expectedQueryResult": [ { "count": 1, diff --git a/endToEndTests/test/queries/aaInsertionsActionAndFilter.json b/endToEndTests/test/queries/aaInsertionsActionAndFilter.json index 39282662c..48a7c558c 100644 --- a/endToEndTests/test/queries/aaInsertionsActionAndFilter.json +++ b/endToEndTests/test/queries/aaInsertionsActionAndFilter.json @@ -1,17 +1,6 @@ { "testCaseName": "amino acid insertions action and insertion contains filter", - "query": { - "action": { - "type": "AminoAcidInsertions", - "orderByFields": ["insertedSymbols", "position"] - }, - "filterExpression": { - "type": "AminoAcidInsertionContains", - "sequenceName": "S", - "value": ".*PE", - "position": 214 - } - }, + "query": "default.filter(aminoAcidInsertionContains(position:=214, value:='.*PE', sequenceName:='S')).aminoAcidInsertions().orderBy({insertedSymbols, position})", "expectedQueryResult": [ { "count": 1, diff --git a/endToEndTests/test/queries/aaInsertionsActionOneSequence.json b/endToEndTests/test/queries/aaInsertionsActionOneSequence.json index 2cf676970..7d51b3a76 100644 --- a/endToEndTests/test/queries/aaInsertionsActionOneSequence.json +++ b/endToEndTests/test/queries/aaInsertionsActionOneSequence.json @@ -1,15 +1,6 @@ { "testCaseName": "amino acid insertions action only one sequence", - "query": { - "action": { - "type": "AminoAcidInsertions", - "orderByFields": ["insertedSymbols", "position"], - "sequenceNames": ["S"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.aminoAcidInsertions(sequenceNames:={S}).orderBy({insertedSymbols, position})", "expectedQueryResult": [ { "count": 1, diff --git a/endToEndTests/test/queries/aaInsertionsContains.json b/endToEndTests/test/queries/aaInsertionsContains.json index 8cbf17cdd..ede5a1aaa 100644 --- a/endToEndTests/test/queries/aaInsertionsContains.json +++ b/endToEndTests/test/queries/aaInsertionsContains.json @@ -1,18 +1,6 @@ { "testCaseName": "amino acid insertions contains filter", - "query": { - "action": { - "type": "Details", - "fields": ["primary_key"], - "orderByFields": ["primary_key"] - }, - "filterExpression": { - "type": "AminoAcidInsertionContains", - "sequenceName": "S", - "value": "E.*E", - "position": 214 - } - }, + "query": "default.filter(aminoAcidInsertionContains(position:=214, value:='E.*E', sequenceName:='S')).project(primary_key).orderBy({primary_key})", "expectedQueryResult": [ { "primary_key": "key_29" diff --git a/endToEndTests/test/queries/aaMutDistribution.json b/endToEndTests/test/queries/aaMutDistribution.json index 83e324f99..c6d74388e 100644 --- a/endToEndTests/test/queries/aaMutDistribution.json +++ b/endToEndTests/test/queries/aaMutDistribution.json @@ -1,15 +1,6 @@ { "testCaseName": "The distribution of Amino Acid Mutations action", - "query": { - "action": { - "type": "AminoAcidMutations", - "sequenceNames": ["S"], - "minProportion": 0.3 - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.aminoAcidMutations(minProportion:=0.3, sequenceNames:={S})", "expectedQueryResult": [ { "count": 37, diff --git a/endToEndTests/test/queries/aaMutDistribution_all.json b/endToEndTests/test/queries/aaMutDistribution_all.json index 1505cd365..4b63083a3 100644 --- a/endToEndTests/test/queries/aaMutDistribution_all.json +++ b/endToEndTests/test/queries/aaMutDistribution_all.json @@ -1,15 +1,6 @@ { "testCaseName": "The distribution of Amino Acid Mutations action for all sequences", - "query": { - "action": { - "type": "AminoAcidMutations", - "minProportion": 0.4, - "orderByFields": ["position", "mutationFrom"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.aminoAcidMutations(minProportion:=0.4).orderBy({position, mutationFrom})", "expectedQueryResult": [ { "count": 37, diff --git a/endToEndTests/test/queries/aaMutDistribution_min0.json b/endToEndTests/test/queries/aaMutDistribution_min0.json index 297a61c1e..4ea31fba5 100644 --- a/endToEndTests/test/queries/aaMutDistribution_min0.json +++ b/endToEndTests/test/queries/aaMutDistribution_min0.json @@ -1,16 +1,6 @@ { "testCaseName": "The distribution of Amino Acid Mutations action with minProportion 0", - "query": { - "action": { - "type": "AminoAcidMutations", - "sequenceNames": ["E"], - "minProportion": 0.0, - "orderByFields": ["mutation"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.aminoAcidMutations(minProportion:=0.0, sequenceNames:={E}).orderBy({mutation})", "expectedQueryResult": [ { "count": 1, diff --git a/endToEndTests/test/queries/aaMutDistribution_multiple.json b/endToEndTests/test/queries/aaMutDistribution_multiple.json index 5cd7cafd1..3bb2b121b 100644 --- a/endToEndTests/test/queries/aaMutDistribution_multiple.json +++ b/endToEndTests/test/queries/aaMutDistribution_multiple.json @@ -1,15 +1,6 @@ { "testCaseName": "The distribution of Amino Acid Mutations action for multiple sequences", - "query": { - "action": { - "type": "AminoAcidMutations", - "sequenceNames": ["S", "N"], - "minProportion": 0.3 - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.aminoAcidMutations(minProportion:=0.3, sequenceNames:={S, N})", "expectedQueryResult": [ { "count": 37, diff --git a/endToEndTests/test/queries/aaMutDistribution_very_low.json b/endToEndTests/test/queries/aaMutDistribution_very_low.json index 0c7f5ee61..f45a0fc18 100644 --- a/endToEndTests/test/queries/aaMutDistribution_very_low.json +++ b/endToEndTests/test/queries/aaMutDistribution_very_low.json @@ -1,16 +1,6 @@ { "testCaseName": "The distribution of Amino Acid Mutations action with minProportion 0.0001", - "query": { - "action": { - "type": "AminoAcidMutations", - "sequenceNames": ["E"], - "minProportion": 0.0001, - "orderByFields": ["mutation"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.aminoAcidMutations(minProportion:=0.0001, sequenceNames:={E}).orderBy({mutation})", "expectedQueryResult": [ { "count": 1, diff --git a/endToEndTests/test/queries/booleanEquals.json b/endToEndTests/test/queries/booleanEquals.json index d23bd241b..483a2de9e 100644 --- a/endToEndTests/test/queries/booleanEquals.json +++ b/endToEndTests/test/queries/booleanEquals.json @@ -1,15 +1,6 @@ { "testCaseName": "BooleanEquals for test_boolean_column", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "BooleanEquals", - "column": "test_boolean_column", - "value": true - } - }, + "query": "default.filter(test_boolean_column = true).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 41 diff --git a/endToEndTests/test/queries/booleanEquals_And.json b/endToEndTests/test/queries/booleanEquals_And.json index d1dbb0acf..4bd5f636e 100644 --- a/endToEndTests/test/queries/booleanEquals_And.json +++ b/endToEndTests/test/queries/booleanEquals_And.json @@ -1,26 +1,6 @@ { "testCaseName": "BooleanEquals with And", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "And", - "children": [ - { - "type": "BooleanEquals", - "column": "test_boolean_column", - "value": false - }, - { - "type": "Lineage", - "column": "pango_lineage", - "value": "B.1", - "includeSublineages": true - } - ] - } - }, + "query": "default.filter((test_boolean_column = false) && (pango_lineage.lineage('B.1', includeSublineages:=true))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 37 diff --git a/endToEndTests/test/queries/booleanEquals_Or.json b/endToEndTests/test/queries/booleanEquals_Or.json index d7290e2b9..57e4a1199 100644 --- a/endToEndTests/test/queries/booleanEquals_Or.json +++ b/endToEndTests/test/queries/booleanEquals_Or.json @@ -1,26 +1,6 @@ { "testCaseName": "BooleanEquals with Or", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Or", - "children": [ - { - "type": "BooleanEquals", - "column": "test_boolean_column", - "value": null - }, - { - "type": "Lineage", - "column": "pango_lineage", - "value": "B.1.1", - "includeSublineages": true - } - ] - } - }, + "query": "default.filter((test_boolean_column.isNull()) || (pango_lineage.lineage('B.1.1', includeSublineages:=true))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 68 diff --git a/endToEndTests/test/queries/boolean_Details.json b/endToEndTests/test/queries/boolean_Details.json index 9e91cf8dc..dc019b70c 100644 --- a/endToEndTests/test/queries/boolean_Details.json +++ b/endToEndTests/test/queries/boolean_Details.json @@ -1,26 +1,46 @@ { "testCaseName": "boolean Details", - "query": { - "action": { - "type": "Details", - "fields": ["test_boolean_column", "primary_key"], - "orderByFields": ["primary_key"], - "limit": 10 - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.project({test_boolean_column, primary_key}).orderBy({primary_key}).limit(10)", "expectedQueryResult": [ - { "test_boolean_column": true, "primary_key": "key_1" }, - { "test_boolean_column": null, "primary_key": "key_10" }, - { "test_boolean_column": true, "primary_key": "key_100" }, - { "test_boolean_column": false, "primary_key": "key_11" }, - { "test_boolean_column": false, "primary_key": "key_12" }, - { "test_boolean_column": null, "primary_key": "key_13" }, - { "test_boolean_column": true, "primary_key": "key_14" }, - { "test_boolean_column": false, "primary_key": "key_15" }, - { "test_boolean_column": true, "primary_key": "key_16" }, - { "test_boolean_column": false, "primary_key": "key_17" } + { + "test_boolean_column": true, + "primary_key": "key_1" + }, + { + "test_boolean_column": null, + "primary_key": "key_10" + }, + { + "test_boolean_column": true, + "primary_key": "key_100" + }, + { + "test_boolean_column": false, + "primary_key": "key_11" + }, + { + "test_boolean_column": false, + "primary_key": "key_12" + }, + { + "test_boolean_column": null, + "primary_key": "key_13" + }, + { + "test_boolean_column": true, + "primary_key": "key_14" + }, + { + "test_boolean_column": false, + "primary_key": "key_15" + }, + { + "test_boolean_column": true, + "primary_key": "key_16" + }, + { + "test_boolean_column": false, + "primary_key": "key_17" + } ] } diff --git a/endToEndTests/test/queries/complexQuery.json b/endToEndTests/test/queries/complexQuery.json index bdb209a82..06e1bd466 100644 --- a/endToEndTests/test/queries/complexQuery.json +++ b/endToEndTests/test/queries/complexQuery.json @@ -1,114 +1,6 @@ { "testCaseName": "Some complex filter query", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "children": [ - { - "children": [ - { - "children": [ - { - "children": [ - { - "children": [ - { - "children": [ - { - "position": 300, - "symbol": "G", - "type": "NucleotideEquals" - }, - { - "children": [ - { - "position": 400, - "symbol": "-", - "type": "NucleotideEquals" - }, - { - "position": 500, - "symbol": "B", - "type": "NucleotideEquals" - } - ], - "type": "Or" - } - ], - "type": "And" - }, - { - "child": { - "position": 600, - "symbol": "-", - "type": "NucleotideEquals" - }, - "type": "Not" - } - ], - "type": "And" - }, - { - "child": { - "children": [ - { - "position": 700, - "symbol": "B", - "type": "NucleotideEquals" - }, - { - "position": 800, - "symbol": "-", - "type": "NucleotideEquals" - } - ], - "type": "Or" - }, - "type": "Maybe" - } - ], - "type": "And" - }, - { - "numberOfMatchers": 3, - "matchExactly": false, - "children": [ - { - "position": 123, - "symbol": "A", - "type": "NucleotideEquals" - }, - { - "position": 234, - "symbol": "T", - "type": "NucleotideEquals" - }, - { - "position": 345, - "symbol": "G", - "type": "NucleotideEquals" - } - ], - "type": "N-Of" - } - ], - "type": "And" - }, - { - "column": "pango_lineage", - "value": "B", - "includeSublineages": true, - "type": "Lineage" - } - ], - "type": "And" - } - ], - "type": "And" - } - }, + "query": "default.filter(((((((nucleotideEquals(position:=300, symbol:='G')) && ((nucleotideEquals(position:=400, symbol:='-')) || (nucleotideEquals(position:=500, symbol:='B')))) && (!(nucleotideEquals(position:=600, symbol:='-')))) && (maybe((nucleotideEquals(position:=700, symbol:='B')) || (nucleotideEquals(position:=800, symbol:='-'))))) && (nOf(3, {nucleotideEquals(position:=123, symbol:='A'), nucleotideEquals(position:=234, symbol:='T'), nucleotideEquals(position:=345, symbol:='G')}))) && (pango_lineage.lineage('B', includeSublineages:=true)))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 0 diff --git a/endToEndTests/test/queries/dateBetween.json b/endToEndTests/test/queries/dateBetween.json index ca839a16e..3ff3fa4dc 100644 --- a/endToEndTests/test/queries/dateBetween.json +++ b/endToEndTests/test/queries/dateBetween.json @@ -1,16 +1,6 @@ { "testCaseName": "DateBetween Query with 'from' and 'to' value", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "DateBetween", - "column": "date", - "from": "2021-03-18", - "to": "2021-03-18" - } - }, + "query": "default.filter(date.between('2021-03-18'::date, '2021-03-18'::date)).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 1 diff --git a/endToEndTests/test/queries/dateBetween_noBounds.json b/endToEndTests/test/queries/dateBetween_noBounds.json index 60372e6b0..8aa8b8ab6 100644 --- a/endToEndTests/test/queries/dateBetween_noBounds.json +++ b/endToEndTests/test/queries/dateBetween_noBounds.json @@ -1,16 +1,6 @@ { "testCaseName": "DateBetween Query without bounds", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "DateBetween", - "column": "date", - "from": null, - "to": null - } - }, + "query": "default.filter(date.isNotNull()).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 99 diff --git a/endToEndTests/test/queries/dateBetween_null_excluded.json b/endToEndTests/test/queries/dateBetween_null_excluded.json index 43d2171cf..ee525f5e3 100644 --- a/endToEndTests/test/queries/dateBetween_null_excluded.json +++ b/endToEndTests/test/queries/dateBetween_null_excluded.json @@ -1,16 +1,6 @@ { "testCaseName": "DateBetween Query from an early date with unbounded 'to'", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "DateBetween", - "column": "date", - "from": "2012-03-18", - "to": null - } - }, + "query": "default.filter(date >= '2012-03-18'::date).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 99 diff --git a/endToEndTests/test/queries/dateBetween_openFrom1.json b/endToEndTests/test/queries/dateBetween_openFrom1.json index 36a0d818c..8d9771a88 100644 --- a/endToEndTests/test/queries/dateBetween_openFrom1.json +++ b/endToEndTests/test/queries/dateBetween_openFrom1.json @@ -1,16 +1,6 @@ { "testCaseName": "DateBetween Query with open 'from' range 1", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "DateBetween", - "column": "date", - "from": null, - "to": "2021-03-17" - } - }, + "query": "default.filter(date <= '2021-03-17'::date).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 51 diff --git a/endToEndTests/test/queries/dateBetween_openFrom2.json b/endToEndTests/test/queries/dateBetween_openFrom2.json index 2c1da304b..cc5271d10 100644 --- a/endToEndTests/test/queries/dateBetween_openFrom2.json +++ b/endToEndTests/test/queries/dateBetween_openFrom2.json @@ -1,16 +1,6 @@ { "testCaseName": "DateBetween Query with open 'from' range 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "DateBetween", - "column": "date", - "from": null, - "to": "2021-03-18" - } - }, + "query": "default.filter(date <= '2021-03-18'::date).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 52 diff --git a/endToEndTests/test/queries/dateBetween_openFrom3.json b/endToEndTests/test/queries/dateBetween_openFrom3.json index 656616b83..5a66d155d 100644 --- a/endToEndTests/test/queries/dateBetween_openFrom3.json +++ b/endToEndTests/test/queries/dateBetween_openFrom3.json @@ -1,16 +1,6 @@ { "testCaseName": "DateBetween Query with open 'from' range 3", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "DateBetween", - "column": "date", - "from": null, - "to": "2021-03-19" - } - }, + "query": "default.filter(date <= '2021-03-19'::date).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 53 diff --git a/endToEndTests/test/queries/dateBetween_openTo1.json b/endToEndTests/test/queries/dateBetween_openTo1.json index 83af8bde5..4d50bb62c 100644 --- a/endToEndTests/test/queries/dateBetween_openTo1.json +++ b/endToEndTests/test/queries/dateBetween_openTo1.json @@ -1,16 +1,6 @@ { "testCaseName": "DateBetween Query with open 'to' range 1", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "DateBetween", - "column": "date", - "from": "2021-03-17", - "to": null - } - }, + "query": "default.filter(date >= '2021-03-17'::date).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 48 diff --git a/endToEndTests/test/queries/dateBetween_openTo2.json b/endToEndTests/test/queries/dateBetween_openTo2.json index 3c0e182f6..5242e2359 100644 --- a/endToEndTests/test/queries/dateBetween_openTo2.json +++ b/endToEndTests/test/queries/dateBetween_openTo2.json @@ -1,16 +1,6 @@ { "testCaseName": "DateBetween Query with open 'to' range 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "DateBetween", - "column": "date", - "from": "2021-03-18", - "to": null - } - }, + "query": "default.filter(date >= '2021-03-18'::date).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 48 diff --git a/endToEndTests/test/queries/dateBetween_openTo3.json b/endToEndTests/test/queries/dateBetween_openTo3.json index a16146f57..13c5b8698 100644 --- a/endToEndTests/test/queries/dateBetween_openTo3.json +++ b/endToEndTests/test/queries/dateBetween_openTo3.json @@ -1,16 +1,6 @@ { "testCaseName": "DateBetween Query with open 'to' range 3", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "DateBetween", - "column": "date", - "from": "2021-03-19", - "to": null - } - }, + "query": "default.filter(date >= '2021-03-19'::date).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 47 diff --git a/endToEndTests/test/queries/detailsLimitAscending5.json b/endToEndTests/test/queries/detailsLimitAscending5.json index 1648ea6f4..29baceb85 100644 --- a/endToEndTests/test/queries/detailsLimitAscending5.json +++ b/endToEndTests/test/queries/detailsLimitAscending5.json @@ -1,21 +1,6 @@ { "testCaseName": "Details action ordered by division ascending with limit 5", - "query": { - "action": { - "type": "Details", - "fields": ["division"], - "orderByFields": [ - { - "field": "division", - "order": "ascending" - } - ], - "limit": 5 - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.project(division).orderBy({division}).limit(5)", "expectedQueryResult": [ { "division": null diff --git a/endToEndTests/test/queries/detailsLimitDescending10.json b/endToEndTests/test/queries/detailsLimitDescending10.json index e1fb46e6a..5ea7efc7c 100644 --- a/endToEndTests/test/queries/detailsLimitDescending10.json +++ b/endToEndTests/test/queries/detailsLimitDescending10.json @@ -1,21 +1,6 @@ { "testCaseName": "Details action ordered by division descending with limit 10", - "query": { - "action": { - "type": "Details", - "fields": ["division"], - "orderByFields": [ - { - "field": "division", - "order": "descending" - } - ], - "limit": 10 - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.project(division).orderBy({division.desc()}).limit(10)", "expectedQueryResult": [ { "division": "ZΓΌrich" diff --git a/endToEndTests/test/queries/detailsLimitDescending15.json b/endToEndTests/test/queries/detailsLimitDescending15.json index 4dfa73868..6d0970c70 100644 --- a/endToEndTests/test/queries/detailsLimitDescending15.json +++ b/endToEndTests/test/queries/detailsLimitDescending15.json @@ -1,21 +1,6 @@ { "testCaseName": "Details action ordered by division descending with limit 15", - "query": { - "action": { - "type": "Details", - "fields": ["division"], - "orderByFields": [ - { - "field": "division", - "order": "descending" - } - ], - "limit": 15 - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.project(division).orderBy({division.desc()}).limit(15)", "expectedQueryResult": [ { "division": "ZΓΌrich" diff --git a/endToEndTests/test/queries/divisionFilter.json b/endToEndTests/test/queries/divisionFilter.json index 0e467e9d9..cbab35c49 100644 --- a/endToEndTests/test/queries/divisionFilter.json +++ b/endToEndTests/test/queries/divisionFilter.json @@ -1,21 +1,6 @@ { "testCaseName": "Filter by division then aggregate", - "query": { - "action": { - "limit": 100, - "type": "Aggregated" - }, - "filterExpression": { - "children": [ - { - "column": "division", - "value": "Aargau", - "type": "StringEquals" - } - ], - "type": "And" - } - }, + "query": "default.filter((division = 'Aargau')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 6 diff --git a/endToEndTests/test/queries/explicitDefaultSequence.json b/endToEndTests/test/queries/explicitDefaultSequence.json index 9ce9ab63a..d07dd4e51 100644 --- a/endToEndTests/test/queries/explicitDefaultSequence.json +++ b/endToEndTests/test/queries/explicitDefaultSequence.json @@ -1,16 +1,6 @@ { "testCaseName": "Explicit default sequence", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "NucleotideEquals", - "position": 13, - "symbol": "T", - "sequenceName": "main" - } - }, + "query": "default.filter(nucleotideEquals(position:=13, symbol:='T', sequenceName:='main')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 12 diff --git a/endToEndTests/test/queries/fastaAligned.json b/endToEndTests/test/queries/fastaAligned.json index 489748da1..7cb46530d 100644 --- a/endToEndTests/test/queries/fastaAligned.json +++ b/endToEndTests/test/queries/fastaAligned.json @@ -1,115 +1,406 @@ { "testCaseName": "FastaAligned action with one sequence", - "query": { - "action": { - "type": "FastaAligned", - "sequenceNames": ["testSecondSequence"], - "orderByFields": ["primary_key"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.project({primary_key, testSecondSequence}).orderBy({primary_key})", "expectedQueryResult": [ - { "primary_key": "key_1", "testSecondSequence": "ACGT" }, - { "primary_key": "key_10", "testSecondSequence": "ACGT" }, - { "primary_key": "key_100", "testSecondSequence": "ACGT" }, - { "primary_key": "key_11", "testSecondSequence": "ACGT" }, - { "primary_key": "key_12", "testSecondSequence": "ARGT" }, - { "primary_key": "key_13", "testSecondSequence": "ACGT" }, - { "primary_key": "key_14", "testSecondSequence": "ACGT" }, - { "primary_key": "key_15", "testSecondSequence": "ATGT" }, - { "primary_key": "key_16", "testSecondSequence": "ACGT" }, - { "primary_key": "key_17", "testSecondSequence": "ACGN" }, - { "primary_key": "key_18", "testSecondSequence": "ACGT" }, - { "primary_key": "key_19", "testSecondSequence": "ACGT" }, - { "primary_key": "key_2", "testSecondSequence": "ARGT" }, - { "primary_key": "key_20", "testSecondSequence": "AYGT" }, - { "primary_key": "key_21", "testSecondSequence": "ACGT" }, - { "primary_key": "key_22", "testSecondSequence": "ACGT" }, - { "primary_key": "key_23", "testSecondSequence": "ANGT" }, - { "primary_key": "key_24", "testSecondSequence": "ACGT" }, - { "primary_key": "key_25", "testSecondSequence": "ACGT" }, - { "primary_key": "key_26", "testSecondSequence": "ACGT" }, - { "primary_key": "key_27", "testSecondSequence": "ACGT" }, - { "primary_key": "key_28", "testSecondSequence": "ACGT" }, - { "primary_key": "key_29", "testSecondSequence": "ACGT" }, - { "primary_key": "key_3", "testSecondSequence": "ACGT" }, - { "primary_key": "key_30", "testSecondSequence": "ACGT" }, - { "primary_key": "key_31", "testSecondSequence": "ACGT" }, - { "primary_key": "key_32", "testSecondSequence": "ACGT" }, - { "primary_key": "key_33", "testSecondSequence": "ACGT" }, - { "primary_key": "key_34", "testSecondSequence": "ACGT" }, - { "primary_key": "key_35", "testSecondSequence": "ACGT" }, - { "primary_key": "key_36", "testSecondSequence": "ACGT" }, - { "primary_key": "key_37", "testSecondSequence": "ACGT" }, - { "primary_key": "key_38", "testSecondSequence": "ACGT" }, - { "primary_key": "key_39", "testSecondSequence": "ACGT" }, - { "primary_key": "key_4", "testSecondSequence": "ACGN" }, - { "primary_key": "key_40", "testSecondSequence": "ACGT" }, - { "primary_key": "key_41", "testSecondSequence": "AAGN" }, - { "primary_key": "key_42", "testSecondSequence": "ACGT" }, - { "primary_key": "key_43", "testSecondSequence": "ACGT" }, - { "primary_key": "key_44", "testSecondSequence": "ACGT" }, - { "primary_key": "key_45", "testSecondSequence": "ACGT" }, - { "primary_key": "key_46", "testSecondSequence": "ACGN" }, - { "primary_key": "key_47", "testSecondSequence": "ACGT" }, - { "primary_key": "key_48", "testSecondSequence": "ANGT" }, - { "primary_key": "key_49", "testSecondSequence": "A-GT" }, - { "primary_key": "key_5", "testSecondSequence": "ACGT" }, - { "primary_key": "key_50", "testSecondSequence": "ACGT" }, - { "primary_key": "key_51", "testSecondSequence": "ACGT" }, - { "primary_key": "key_52", "testSecondSequence": "ACGT" }, - { "primary_key": "key_53", "testSecondSequence": "ACGT" }, - { "primary_key": "key_54", "testSecondSequence": "ACGT" }, - { "primary_key": "key_55", "testSecondSequence": "ACGT" }, - { "primary_key": "key_56", "testSecondSequence": "ACGT" }, - { "primary_key": "key_57", "testSecondSequence": "ACGT" }, - { "primary_key": "key_58", "testSecondSequence": "ACGT" }, - { "primary_key": "key_59", "testSecondSequence": null }, - { "primary_key": "key_6", "testSecondSequence": "ACGT" }, - { "primary_key": "key_60", "testSecondSequence": "ACGT" }, - { "primary_key": "key_61", "testSecondSequence": "ACGT" }, - { "primary_key": "key_62", "testSecondSequence": null }, - { "primary_key": "key_63", "testSecondSequence": "ACGT" }, - { "primary_key": "key_64", "testSecondSequence": "ACGT" }, - { "primary_key": "key_65", "testSecondSequence": "NCGT" }, - { "primary_key": "key_66", "testSecondSequence": "ACGT" }, - { "primary_key": "key_67", "testSecondSequence": "ACGT" }, - { "primary_key": "key_68", "testSecondSequence": "ACGT" }, - { "primary_key": "key_69", "testSecondSequence": "ACGT" }, - { "primary_key": "key_7", "testSecondSequence": "ACGT" }, - { "primary_key": "key_70", "testSecondSequence": "ACGT" }, - { "primary_key": "key_71", "testSecondSequence": "ACGT" }, - { "primary_key": "key_72", "testSecondSequence": "ACGT" }, - { "primary_key": "key_73", "testSecondSequence": "AAGT" }, - { "primary_key": "key_74", "testSecondSequence": "ACGT" }, - { "primary_key": "key_75", "testSecondSequence": "ACGT" }, - { "primary_key": "key_76", "testSecondSequence": "ACGT" }, - { "primary_key": "key_77", "testSecondSequence": "ACGT" }, - { "primary_key": "key_78", "testSecondSequence": "ACGT" }, - { "primary_key": "key_79", "testSecondSequence": "ACGT" }, - { "primary_key": "key_8", "testSecondSequence": "ACGT" }, - { "primary_key": "key_80", "testSecondSequence": "ACGT" }, - { "primary_key": "key_81", "testSecondSequence": "ACGT" }, - { "primary_key": "key_82", "testSecondSequence": "ACGT" }, - { "primary_key": "key_83", "testSecondSequence": null }, - { "primary_key": "key_84", "testSecondSequence": "ACGT" }, - { "primary_key": "key_85", "testSecondSequence": "ACGT" }, - { "primary_key": "key_86", "testSecondSequence": "ACGT" }, - { "primary_key": "key_87", "testSecondSequence": "ACGT" }, - { "primary_key": "key_88", "testSecondSequence": "ACGT" }, - { "primary_key": "key_89", "testSecondSequence": "ACGT" }, - { "primary_key": "key_9", "testSecondSequence": "ACGT" }, - { "primary_key": "key_90", "testSecondSequence": "ACGT" }, - { "primary_key": "key_91", "testSecondSequence": "ACGT" }, - { "primary_key": "key_92", "testSecondSequence": "ACGT" }, - { "primary_key": "key_93", "testSecondSequence": "ACGT" }, - { "primary_key": "key_94", "testSecondSequence": "ACGT" }, - { "primary_key": "key_95", "testSecondSequence": "ACGT" }, - { "primary_key": "key_96", "testSecondSequence": "ACGT" }, - { "primary_key": "key_97", "testSecondSequence": "ACGT" }, - { "primary_key": "key_98", "testSecondSequence": "ACGT" }, - { "primary_key": "key_99", "testSecondSequence": "ACGT" } + { + "primary_key": "key_1", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_10", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_100", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_11", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_12", + "testSecondSequence": "ARGT" + }, + { + "primary_key": "key_13", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_14", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_15", + "testSecondSequence": "ATGT" + }, + { + "primary_key": "key_16", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_17", + "testSecondSequence": "ACGN" + }, + { + "primary_key": "key_18", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_19", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_2", + "testSecondSequence": "ARGT" + }, + { + "primary_key": "key_20", + "testSecondSequence": "AYGT" + }, + { + "primary_key": "key_21", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_22", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_23", + "testSecondSequence": "ANGT" + }, + { + "primary_key": "key_24", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_25", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_26", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_27", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_28", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_29", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_3", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_30", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_31", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_32", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_33", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_34", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_35", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_36", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_37", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_38", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_39", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_4", + "testSecondSequence": "ACGN" + }, + { + "primary_key": "key_40", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_41", + "testSecondSequence": "AAGN" + }, + { + "primary_key": "key_42", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_43", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_44", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_45", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_46", + "testSecondSequence": "ACGN" + }, + { + "primary_key": "key_47", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_48", + "testSecondSequence": "ANGT" + }, + { + "primary_key": "key_49", + "testSecondSequence": "A-GT" + }, + { + "primary_key": "key_5", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_50", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_51", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_52", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_53", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_54", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_55", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_56", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_57", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_58", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_59", + "testSecondSequence": null + }, + { + "primary_key": "key_6", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_60", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_61", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_62", + "testSecondSequence": null + }, + { + "primary_key": "key_63", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_64", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_65", + "testSecondSequence": "NCGT" + }, + { + "primary_key": "key_66", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_67", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_68", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_69", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_7", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_70", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_71", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_72", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_73", + "testSecondSequence": "AAGT" + }, + { + "primary_key": "key_74", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_75", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_76", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_77", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_78", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_79", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_8", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_80", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_81", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_82", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_83", + "testSecondSequence": null + }, + { + "primary_key": "key_84", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_85", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_86", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_87", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_88", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_89", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_9", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_90", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_91", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_92", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_93", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_94", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_95", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_96", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_97", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_98", + "testSecondSequence": "ACGT" + }, + { + "primary_key": "key_99", + "testSecondSequence": "ACGT" + } ] } diff --git a/endToEndTests/test/queries/fastaAligned_multiple.json b/endToEndTests/test/queries/fastaAligned_multiple.json index 48dbef7fd..2c07bef6f 100644 --- a/endToEndTests/test/queries/fastaAligned_multiple.json +++ b/endToEndTests/test/queries/fastaAligned_multiple.json @@ -1,18 +1,6 @@ { "testCaseName": "FastaAligned action with multiple sequences", - "query": { - "action": { - "type": "FastaAligned", - "sequenceNames": ["testSecondSequence", "S"], - "orderByFields": ["primary_key"] - }, - "filterExpression": { - "type": "IntBetween", - "column": "age", - "from": null, - "to": 30 - } - }, + "query": "default.filter(age <= 30).project({primary_key, testSecondSequence, S}).orderBy({primary_key})", "expectedQueryResult": [ { "S": "MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAI--SGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGV-YHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTYGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIDDTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQGVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSHRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPINFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILARLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTHNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT*", diff --git a/endToEndTests/test/queries/fasta_allTestSequences.json b/endToEndTests/test/queries/fasta_allTestSequences.json index babed6806..42955c97f 100644 --- a/endToEndTests/test/queries/fasta_allTestSequences.json +++ b/endToEndTests/test/queries/fasta_allTestSequences.json @@ -1,118 +1,406 @@ { "testCaseName": "Get the unaligned fasta for all test sequences", - "query": { - "action": { - "type": "Fasta", - "sequenceNames": ["unaligned_testSecondSequence"], - "orderByFields": ["primary_key"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.project({primary_key, unaligned_testSecondSequence}).orderBy({primary_key})", "expectedQueryResult": [ - { "primary_key": "key_1", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_10", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_100", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_11", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_12", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_13", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_14", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_15", "unaligned_testSecondSequence": "ATGT" }, - { "primary_key": "key_16", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_17", "unaligned_testSecondSequence": "ACGN" }, - { "primary_key": "key_18", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_19", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_2", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_20", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_21", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_22", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_23", "unaligned_testSecondSequence": "ANGT" }, - { "primary_key": "key_24", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_25", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_26", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_27", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_28", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_29", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_3", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_30", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_31", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_32", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_33", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_34", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_35", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_36", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_37", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_38", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_39", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_4", "unaligned_testSecondSequence": "ACGN" }, - { "primary_key": "key_40", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_41", "unaligned_testSecondSequence": "AAGN" }, - { "primary_key": "key_42", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_43", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_44", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_45", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_46", "unaligned_testSecondSequence": "ACGN" }, - { "primary_key": "key_47", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_48", "unaligned_testSecondSequence": "ANGT" }, - { "primary_key": "key_49", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_5", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_50", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_51", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_52", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_53", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_54", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_55", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_56", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_57", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_58", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_59", "unaligned_testSecondSequence": null }, - { "primary_key": "key_6", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_60", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_61", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_62", "unaligned_testSecondSequence": null }, - { "primary_key": "key_63", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_64", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_65", "unaligned_testSecondSequence": "NCGT" }, - { "primary_key": "key_66", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_67", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_68", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_69", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_7", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_70", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_71", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_72", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_73", "unaligned_testSecondSequence": "AAGT" }, - { "primary_key": "key_74", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_75", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_76", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_77", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_78", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_79", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_8", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_80", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_81", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_82", "unaligned_testSecondSequence": "ACGTACGT" }, - { "primary_key": "key_83", "unaligned_testSecondSequence": null }, + { + "primary_key": "key_1", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_10", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_100", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_11", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_12", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_13", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_14", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_15", + "unaligned_testSecondSequence": "ATGT" + }, + { + "primary_key": "key_16", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_17", + "unaligned_testSecondSequence": "ACGN" + }, + { + "primary_key": "key_18", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_19", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_2", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_20", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_21", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_22", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_23", + "unaligned_testSecondSequence": "ANGT" + }, + { + "primary_key": "key_24", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_25", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_26", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_27", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_28", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_29", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_3", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_30", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_31", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_32", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_33", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_34", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_35", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_36", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_37", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_38", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_39", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_4", + "unaligned_testSecondSequence": "ACGN" + }, + { + "primary_key": "key_40", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_41", + "unaligned_testSecondSequence": "AAGN" + }, + { + "primary_key": "key_42", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_43", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_44", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_45", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_46", + "unaligned_testSecondSequence": "ACGN" + }, + { + "primary_key": "key_47", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_48", + "unaligned_testSecondSequence": "ANGT" + }, + { + "primary_key": "key_49", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_5", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_50", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_51", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_52", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_53", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_54", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_55", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_56", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_57", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_58", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_59", + "unaligned_testSecondSequence": null + }, + { + "primary_key": "key_6", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_60", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_61", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_62", + "unaligned_testSecondSequence": null + }, + { + "primary_key": "key_63", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_64", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_65", + "unaligned_testSecondSequence": "NCGT" + }, + { + "primary_key": "key_66", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_67", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_68", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_69", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_7", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_70", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_71", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_72", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_73", + "unaligned_testSecondSequence": "AAGT" + }, + { + "primary_key": "key_74", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_75", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_76", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_77", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_78", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_79", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_8", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_80", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_81", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_82", + "unaligned_testSecondSequence": "ACGTACGT" + }, + { + "primary_key": "key_83", + "unaligned_testSecondSequence": null + }, { "primary_key": "key_84", "unaligned_testSecondSequence": "JRZFHVKQIQGIVPUNJZCDKLOPDFTWZWXEXKZIHLGFWZNIGUAAPJBXPQCJBFUYHHIOPNDMTMHAFPHMZRCNUGIBRZCNKAJZMWXMBMPQRTZQUHTIFSOBXAQWMESDRWVJQWRE" }, - { "primary_key": "key_85", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_86", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_87", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_88", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_89", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_9", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_90", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_91", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_92", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_93", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_94", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_95", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_96", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_97", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_98", "unaligned_testSecondSequence": "ACGT" }, - { "primary_key": "key_99", "unaligned_testSecondSequence": "ACGT" } + { + "primary_key": "key_85", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_86", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_87", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_88", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_89", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_9", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_90", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_91", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_92", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_93", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_94", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_95", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_96", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_97", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_98", + "unaligned_testSecondSequence": "ACGT" + }, + { + "primary_key": "key_99", + "unaligned_testSecondSequence": "ACGT" + } ] } diff --git a/endToEndTests/test/queries/fasta_manySequences.json b/endToEndTests/test/queries/fasta_manySequences.json index d03b35d32..077adc1ab 100644 --- a/endToEndTests/test/queries/fasta_manySequences.json +++ b/endToEndTests/test/queries/fasta_manySequences.json @@ -1,17 +1,6 @@ { "testCaseName": "Get the unaligned fasta for many sequences", - "query": { - "action": { - "type": "Fasta", - "sequenceNames": ["unaligned_testSecondSequence"], - "orderByFields": ["primary_key"] - }, - "filterExpression": { - "type": "StringEquals", - "column": "division", - "value": "Vaud" - } - }, + "query": "default.filter(division = 'Vaud').project({primary_key, unaligned_testSecondSequence}).orderBy({primary_key})", "expectedQueryResult": [ { "primary_key": "key_1", diff --git a/endToEndTests/test/queries/fasta_oneRowTwoUnalignedSequences.json b/endToEndTests/test/queries/fasta_oneRowTwoUnalignedSequences.json index d94709af0..9688288f9 100644 --- a/endToEndTests/test/queries/fasta_oneRowTwoUnalignedSequences.json +++ b/endToEndTests/test/queries/fasta_oneRowTwoUnalignedSequences.json @@ -1,16 +1,6 @@ { "testCaseName": "Get two unaligned fastas for one row", - "query": { - "action": { - "type": "Fasta", - "sequenceNames": ["unaligned_main", "unaligned_testSecondSequence"] - }, - "filterExpression": { - "type": "StringEquals", - "column": "primary_key", - "value": "key_41" - } - }, + "query": "default.filter(primary_key = 'key_41').project({primary_key, unaligned_main, unaligned_testSecondSequence})", "expectedQueryResult": [ { "primary_key": "key_41", diff --git a/endToEndTests/test/queries/fasta_oneSequenceUnaligned.json b/endToEndTests/test/queries/fasta_oneSequenceUnaligned.json index 8dbc7d58e..bc2e98463 100644 --- a/endToEndTests/test/queries/fasta_oneSequenceUnaligned.json +++ b/endToEndTests/test/queries/fasta_oneSequenceUnaligned.json @@ -1,16 +1,6 @@ { "testCaseName": "Get the unaligned fasta for one sequence", - "query": { - "action": { - "type": "Fasta", - "sequenceNames": ["unaligned_main"] - }, - "filterExpression": { - "type": "StringEquals", - "column": "primary_key", - "value": "key_41" - } - }, + "query": "default.filter(primary_key = 'key_41').project({primary_key, unaligned_main})", "expectedQueryResult": [ { "primary_key": "key_41", diff --git a/endToEndTests/test/queries/floatBetween.json b/endToEndTests/test/queries/floatBetween.json index 48d24af71..60dc0aee9 100644 --- a/endToEndTests/test/queries/floatBetween.json +++ b/endToEndTests/test/queries/floatBetween.json @@ -1,16 +1,6 @@ { "testCaseName": "FloatBetween for column", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "FloatBetween", - "column": "qc_value", - "from": 0.905, - "to": 0.935 - } - }, + "query": "default.filter(qc_value.between(0.905, 0.935)).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 30 diff --git a/endToEndTests/test/queries/floatBetween_noBound.json b/endToEndTests/test/queries/floatBetween_noBound.json index 309bcf703..69d653b4c 100644 --- a/endToEndTests/test/queries/floatBetween_noBound.json +++ b/endToEndTests/test/queries/floatBetween_noBound.json @@ -1,16 +1,6 @@ { "testCaseName": "FloatBetween for column without bounds returns all non null values", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "FloatBetween", - "column": "qc_value", - "from": null, - "to": null - } - }, + "query": "default.filter(qc_value.isNotNull()).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 98 diff --git a/endToEndTests/test/queries/floatBetween_openFrom.json b/endToEndTests/test/queries/floatBetween_openFrom.json index 12ec48259..5bfe55b82 100644 --- a/endToEndTests/test/queries/floatBetween_openFrom.json +++ b/endToEndTests/test/queries/floatBetween_openFrom.json @@ -1,16 +1,6 @@ { "testCaseName": "FloatBetween for column with open lower bound", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "FloatBetween", - "column": "qc_value", - "from": null, - "to": 0.935 - } - }, + "query": "default.filter(qc_value <= 0.935).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 50 diff --git a/endToEndTests/test/queries/floatBetween_openTo.json b/endToEndTests/test/queries/floatBetween_openTo.json index ff8a872aa..74b7a0672 100644 --- a/endToEndTests/test/queries/floatBetween_openTo.json +++ b/endToEndTests/test/queries/floatBetween_openTo.json @@ -1,16 +1,6 @@ { "testCaseName": "FloatBetween for column with open upper bound", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "FloatBetween", - "column": "qc_value", - "from": 0.905, - "to": null - } - }, + "query": "default.filter(qc_value >= 0.905).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 78 diff --git a/endToEndTests/test/queries/floatEquals.json b/endToEndTests/test/queries/floatEquals.json index 5b24df45a..3fcfe2312 100644 --- a/endToEndTests/test/queries/floatEquals.json +++ b/endToEndTests/test/queries/floatEquals.json @@ -1,15 +1,6 @@ { "testCaseName": "FloatEquals for column", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "FloatEquals", - "column": "qc_value", - "value": 0.9 - } - }, + "query": "default.filter(qc_value = 0.9).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 10 diff --git a/endToEndTests/test/queries/insertionContainsStopCodon.json b/endToEndTests/test/queries/insertionContainsStopCodon.json index ed37bf013..bad828fdb 100644 --- a/endToEndTests/test/queries/insertionContainsStopCodon.json +++ b/endToEndTests/test/queries/insertionContainsStopCodon.json @@ -1,23 +1,10 @@ { "testCaseName": "insertionContains with a StopCodon", - "query": { - "action": { - "groupByFields": ["date"], - "orderByFields": [ - { - "field": "date", - "order": "ascending" - } - ], - "randomize": false, - "type": "Aggregated" - }, - "filterExpression": { - "position": 214, - "value": "\\*EPE", - "sequenceName": "S", - "type": "AminoAcidInsertionContains" + "query": "default.filter(aminoAcidInsertionContains(position:=214, value:='\\\\*EPE', sequenceName:='S')).groupBy({count:=count()},{date}).orderBy({date})", + "expectedQueryResult": [ + { + "count": 1, + "date": "2021-01-25" } - }, - "expectedQueryResult": [{ "count": 1, "date": "2021-01-25" }] + ] } diff --git a/endToEndTests/test/queries/insertionContains_exact.json b/endToEndTests/test/queries/insertionContains_exact.json index 9b231b53b..424f495ff 100644 --- a/endToEndTests/test/queries/insertionContains_exact.json +++ b/endToEndTests/test/queries/insertionContains_exact.json @@ -1,15 +1,6 @@ { "testCaseName": "Insertion Contains with exact match CCC", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "InsertionContains", - "position": 25701, - "value": "CCC" - } - }, + "query": "default.filter(insertionContains(position:=25701, value:='CCC')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 17 diff --git a/endToEndTests/test/queries/insertionContains_noSeqCol.json b/endToEndTests/test/queries/insertionContains_noSeqCol.json index 5f4bd3bce..380df3940 100644 --- a/endToEndTests/test/queries/insertionContains_noSeqCol.json +++ b/endToEndTests/test/queries/insertionContains_noSeqCol.json @@ -1,25 +1,5 @@ { "testCaseName": "Insertion Contains without sequence or column specified", - "query": { - "action": { - "type": "Mutations", - "minProportion": 1 - }, - "filterExpression": { - "type": "And", - "children": [ - { - "type": "InsertionContains", - "position": 25701, - "value": "CC.*" - }, - { - "type": "InsertionContains", - "position": 22339, - "value": ".*C.*G.*" - } - ] - } - }, + "query": "default.filter((insertionContains(position:=25701, value:='CC.*')) && (insertionContains(position:=22339, value:='.*C.*G.*'))).mutations(minProportion:=1)", "expectedQueryResult": [] } diff --git a/endToEndTests/test/queries/insertionContains_not_exact1.json b/endToEndTests/test/queries/insertionContains_not_exact1.json index f00d651ef..312e5871a 100644 --- a/endToEndTests/test/queries/insertionContains_not_exact1.json +++ b/endToEndTests/test/queries/insertionContains_not_exact1.json @@ -1,15 +1,6 @@ { "testCaseName": "Insertion Contains with non-exact match .*GCT.*GGT.*", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "InsertionContains", - "position": 22339, - "value": ".*GCT.*GGT.*" - } - }, + "query": "default.filter(insertionContains(position:=22339, value:='.*GCT.*GGT.*')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 1 diff --git a/endToEndTests/test/queries/insertionContains_not_exact2.json b/endToEndTests/test/queries/insertionContains_not_exact2.json index cbf5275c7..bf36ee77b 100644 --- a/endToEndTests/test/queries/insertionContains_not_exact2.json +++ b/endToEndTests/test/queries/insertionContains_not_exact2.json @@ -1,15 +1,6 @@ { "testCaseName": "Insertion Contains with non-exact match CAG.*AA", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "InsertionContains", - "position": 22204, - "value": "CAG.*AA" - } - }, + "query": "default.filter(insertionContains(position:=22204, value:='CAG.*AA')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 1 diff --git a/endToEndTests/test/queries/insertionContains_not_exact3.json b/endToEndTests/test/queries/insertionContains_not_exact3.json index b42be92b5..9272f5291 100644 --- a/endToEndTests/test/queries/insertionContains_not_exact3.json +++ b/endToEndTests/test/queries/insertionContains_not_exact3.json @@ -1,15 +1,6 @@ { "testCaseName": "Insertion Contains with non-exact match TCAG.*AA", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "InsertionContains", - "position": 25701, - "value": "TCAG.*AA" - } - }, + "query": "default.filter(insertionContains(position:=25701, value:='TCAG.*AA')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 0 diff --git a/endToEndTests/test/queries/insertionContains_not_exact4.json b/endToEndTests/test/queries/insertionContains_not_exact4.json index ced833fca..f9853b22c 100644 --- a/endToEndTests/test/queries/insertionContains_not_exact4.json +++ b/endToEndTests/test/queries/insertionContains_not_exact4.json @@ -1,15 +1,6 @@ { "testCaseName": "Insertion Contains with non-exact match CC.*", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "InsertionContains", - "position": 25701, - "value": "CC.*" - } - }, + "query": "default.filter(insertionContains(position:=25701, value:='CC.*')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 17 diff --git a/endToEndTests/test/queries/insertionsAction.json b/endToEndTests/test/queries/insertionsAction.json index bdcf267c2..e8c5afbed 100644 --- a/endToEndTests/test/queries/insertionsAction.json +++ b/endToEndTests/test/queries/insertionsAction.json @@ -1,14 +1,6 @@ { "testCaseName": "The insertions action", - "query": { - "action": { - "type": "Insertions", - "orderByFields": ["insertion"] - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.insertions().orderBy({insertion})", "expectedQueryResult": [ { "count": 1, diff --git a/endToEndTests/test/queries/insertionsActionAndFilter.json b/endToEndTests/test/queries/insertionsActionAndFilter.json index cf047d508..679c6e021 100644 --- a/endToEndTests/test/queries/insertionsActionAndFilter.json +++ b/endToEndTests/test/queries/insertionsActionAndFilter.json @@ -1,16 +1,6 @@ { "testCaseName": "The insertions action and insertions contains filter", - "query": { - "action": { - "type": "Insertions", - "sequenceNames": ["main"] - }, - "filterExpression": { - "type": "InsertionContains", - "position": 22339, - "value": ".*C.*G.*" - } - }, + "query": "default.filter(insertionContains(position:=22339, value:='.*C.*G.*')).insertions(sequenceNames:={main})", "expectedQueryResult": [ { "count": 1, diff --git a/endToEndTests/test/queries/intBetween.json b/endToEndTests/test/queries/intBetween.json index 4b203fb02..21de85ca9 100644 --- a/endToEndTests/test/queries/intBetween.json +++ b/endToEndTests/test/queries/intBetween.json @@ -1,16 +1,6 @@ { "testCaseName": "IntBetween Query with 'from' and 'to' value", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "IntBetween", - "column": "age", - "from": 52, - "to": 55 - } - }, + "query": "default.filter(age.between(52, 55)).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 33 diff --git a/endToEndTests/test/queries/intBetween_noBounds.json b/endToEndTests/test/queries/intBetween_noBounds.json index b8a507988..b7eeec849 100644 --- a/endToEndTests/test/queries/intBetween_noBounds.json +++ b/endToEndTests/test/queries/intBetween_noBounds.json @@ -1,16 +1,6 @@ { "testCaseName": "IntBetween Query without bounds", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "IntBetween", - "column": "age", - "from": null, - "to": null - } - }, + "query": "default.filter(age.isNotNull()).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 98 diff --git a/endToEndTests/test/queries/intBetween_openFrom.json b/endToEndTests/test/queries/intBetween_openFrom.json index 45ee971a9..5a1f8723a 100644 --- a/endToEndTests/test/queries/intBetween_openFrom.json +++ b/endToEndTests/test/queries/intBetween_openFrom.json @@ -1,16 +1,6 @@ { "testCaseName": "IntBetween Query with open 'from'", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "IntBetween", - "column": "age", - "from": null, - "to": 52 - } - }, + "query": "default.filter(age <= 52).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 36 diff --git a/endToEndTests/test/queries/intBetween_openTo.json b/endToEndTests/test/queries/intBetween_openTo.json index 52d7e31f4..72b72fb25 100644 --- a/endToEndTests/test/queries/intBetween_openTo.json +++ b/endToEndTests/test/queries/intBetween_openTo.json @@ -1,16 +1,6 @@ { "testCaseName": "IntBetween Query open 'to' value", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "IntBetween", - "column": "age", - "from": 55, - "to": null - } - }, + "query": "default.filter(age >= 55).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 45 diff --git a/endToEndTests/test/queries/intEquals.json b/endToEndTests/test/queries/intEquals.json index ffcf93e35..b73ed5b02 100644 --- a/endToEndTests/test/queries/intEquals.json +++ b/endToEndTests/test/queries/intEquals.json @@ -1,15 +1,6 @@ { "testCaseName": "IntEquals for column", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "IntEquals", - "column": "age", - "value": 55 - } - }, + "query": "default.filter(age = 55).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 8 diff --git a/endToEndTests/test/queries/matchAll.json b/endToEndTests/test/queries/matchAll.json index 5f0f3d415..a39e8f302 100644 --- a/endToEndTests/test/queries/matchAll.json +++ b/endToEndTests/test/queries/matchAll.json @@ -1,13 +1,6 @@ { "testCaseName": "MatchAll query requesting all entries", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "True" - } - }, + "query": "default.groupBy({count:=count()})", "expectedQueryResult": [ { "count": 100 diff --git a/endToEndTests/test/queries/mutations_False.json b/endToEndTests/test/queries/mutations_False.json index 0fdf935e4..91b15a9d3 100644 --- a/endToEndTests/test/queries/mutations_False.json +++ b/endToEndTests/test/queries/mutations_False.json @@ -1,13 +1,5 @@ { "testCaseName": "Mutations action without sequence and empty result", - "query": { - "action": { - "type": "Mutations", - "minProportion": 1 - }, - "filterExpression": { - "type": "False" - } - }, + "query": "default.filter(false).mutations(minProportion:=1)", "expectedQueryResult": [] } diff --git a/endToEndTests/test/queries/nOf_2of3_aggregated.json b/endToEndTests/test/queries/nOf_2of3_aggregated.json index c04057ad6..fdd3e49ba 100644 --- a/endToEndTests/test/queries/nOf_2of3_aggregated.json +++ b/endToEndTests/test/queries/nOf_2of3_aggregated.json @@ -1,32 +1,6 @@ { "testCaseName": "N-Of query requesting 2 of 3 mutations with aggregated action", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "N-Of", - "numberOfMatchers": 2, - "matchExactly": false, - "children": [ - { - "type": "NucleotideEquals", - "position": 241, - "symbol": "T" - }, - { - "type": "NucleotideEquals", - "position": 29734, - "symbol": "T" - }, - { - "type": "NucleotideEquals", - "position": 28330, - "symbol": "G" - } - ] - } - }, + "query": "default.filter(nOf(2, {nucleotideEquals(position:=241, symbol:='T'), nucleotideEquals(position:=29734, symbol:='T'), nucleotideEquals(position:=28330, symbol:='G')})).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 0 diff --git a/endToEndTests/test/queries/nOf_2of3_aggregated2.json b/endToEndTests/test/queries/nOf_2of3_aggregated2.json index f43fb677d..d55d6bef9 100644 --- a/endToEndTests/test/queries/nOf_2of3_aggregated2.json +++ b/endToEndTests/test/queries/nOf_2of3_aggregated2.json @@ -1,32 +1,6 @@ { "testCaseName": "2nd N-Of query requesting 2 of 3 mutations with aggregated action", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "N-Of", - "numberOfMatchers": 2, - "matchExactly": false, - "children": [ - { - "type": "NucleotideEquals", - "position": 1, - "symbol": "-" - }, - { - "type": "NucleotideEquals", - "position": 2, - "symbol": "T" - }, - { - "type": "NucleotideEquals", - "position": 27542, - "symbol": "N" - } - ] - } - }, + "query": "default.filter(nOf(2, {nucleotideEquals(position:=1, symbol:='-'), nucleotideEquals(position:=2, symbol:='T'), nucleotideEquals(position:=27542, symbol:='N')})).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 4 diff --git a/endToEndTests/test/queries/nOf_2of3_details.json b/endToEndTests/test/queries/nOf_2of3_details.json index 592cbacd4..dae710bed 100644 --- a/endToEndTests/test/queries/nOf_2of3_details.json +++ b/endToEndTests/test/queries/nOf_2of3_details.json @@ -1,35 +1,6 @@ { "testCaseName": "N-Of query requesting 2 of 3 mutations with details action", - "query": { - "action": { - "type": "Details", - "randomize": { - "seed": 321 - } - }, - "filterExpression": { - "type": "N-Of", - "numberOfMatchers": 2, - "matchExactly": false, - "children": [ - { - "type": "NucleotideEquals", - "position": 1, - "symbol": "-" - }, - { - "type": "NucleotideEquals", - "position": 2, - "symbol": "T" - }, - { - "type": "NucleotideEquals", - "position": 27542, - "symbol": "N" - } - ] - } - }, + "query": "default.filter(nOf(2, {nucleotideEquals(position:=1, symbol:='-'), nucleotideEquals(position:=2, symbol:='T'), nucleotideEquals(position:=27542, symbol:='N')})).randomize(seed:=321).project({age, country, date, division, primary_key, pango_lineage, qc_value, region, test_boolean_column, unsorted_date, usherTree})", "expectedQueryResult": [ { "age": 58, diff --git a/endToEndTests/test/queries/nOf_2of3_details_selection.json b/endToEndTests/test/queries/nOf_2of3_details_selection.json index 96f735564..4f3826251 100644 --- a/endToEndTests/test/queries/nOf_2of3_details_selection.json +++ b/endToEndTests/test/queries/nOf_2of3_details_selection.json @@ -1,34 +1,6 @@ { "testCaseName": "N-Of query requesting 2 of 3 mutations with details action where 2 fields are selected", - "query": { - "action": { - "type": "Details", - "fields": ["age", "pango_lineage"], - "orderByFields": ["age", "pango_lineage"] - }, - "filterExpression": { - "type": "N-Of", - "numberOfMatchers": 2, - "matchExactly": false, - "children": [ - { - "type": "NucleotideEquals", - "position": 1, - "symbol": "-" - }, - { - "type": "NucleotideEquals", - "position": 2, - "symbol": "T" - }, - { - "type": "NucleotideEquals", - "position": 27542, - "symbol": "N" - } - ] - } - }, + "query": "default.filter(nOf(2, {nucleotideEquals(position:=1, symbol:='-'), nucleotideEquals(position:=2, symbol:='T'), nucleotideEquals(position:=27542, symbol:='N')})).project({age, pango_lineage}).orderBy({age, pango_lineage})", "expectedQueryResult": [ { "age": 50, diff --git a/endToEndTests/test/queries/nOf_2of3_mutations.json b/endToEndTests/test/queries/nOf_2of3_mutations.json index 4f7b63f71..afd3b7c0e 100644 --- a/endToEndTests/test/queries/nOf_2of3_mutations.json +++ b/endToEndTests/test/queries/nOf_2of3_mutations.json @@ -1,33 +1,6 @@ { "testCaseName": "N-Of query requesting 2 of 3 mutations with mutations action", - "query": { - "action": { - "type": "Mutations", - "minProportion": 0.7 - }, - "filterExpression": { - "type": "N-Of", - "numberOfMatchers": 2, - "matchExactly": false, - "children": [ - { - "type": "NucleotideEquals", - "position": 2, - "symbol": "N" - }, - { - "type": "NucleotideEquals", - "position": 86, - "symbol": "G" - }, - { - "type": "NucleotideEquals", - "position": 27342, - "symbol": "N" - } - ] - } - }, + "query": "default.filter(nOf(2, {nucleotideEquals(position:=2, symbol:='N'), nucleotideEquals(position:=86, symbol:='G'), nucleotideEquals(position:=27342, symbol:='N')})).mutations(minProportion:=0.7)", "expectedQueryResult": [ { "count": 1, diff --git a/endToEndTests/test/queries/notUnsortedDateBetween.json b/endToEndTests/test/queries/notUnsortedDateBetween.json index 5a52459b1..875e4ac0c 100644 --- a/endToEndTests/test/queries/notUnsortedDateBetween.json +++ b/endToEndTests/test/queries/notUnsortedDateBetween.json @@ -1,19 +1,6 @@ { "testCaseName": "Not DateBetween Query for date column that is not sorted", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Not", - "child": { - "type": "DateBetween", - "column": "unsorted_date", - "from": "2021-03-18", - "to": "2021-03-20" - } - } - }, + "query": "default.filter(!(unsorted_date.between('2021-03-18'::date, '2021-03-20'::date))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 98 diff --git a/endToEndTests/test/queries/pangoLIneageIncludingSublineages.json b/endToEndTests/test/queries/pangoLIneageIncludingSublineages.json index 1d8801429..d7d5b8a0e 100644 --- a/endToEndTests/test/queries/pangoLIneageIncludingSublineages.json +++ b/endToEndTests/test/queries/pangoLIneageIncludingSublineages.json @@ -1,16 +1,6 @@ { "testCaseName": "pango lineage B.1.1.7 including sublineages", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Lineage", - "column": "pango_lineage", - "value": "B.1.1.7", - "includeSublineages": true - } - }, + "query": "default.filter(pango_lineage.lineage('B.1.1.7', includeSublineages:=true)).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 51 diff --git a/endToEndTests/test/queries/pangoLIneageWithoutSublineages.json b/endToEndTests/test/queries/pangoLIneageWithoutSublineages.json index e194f553b..175a709d4 100644 --- a/endToEndTests/test/queries/pangoLIneageWithoutSublineages.json +++ b/endToEndTests/test/queries/pangoLIneageWithoutSublineages.json @@ -1,16 +1,6 @@ { "testCaseName": "pango lineage B.1.1.7 without sublineages", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Lineage", - "column": "pango_lineage", - "value": "B.1.1.7", - "includeSublineages": false - } - }, + "query": "default.filter(pango_lineage.lineage('B.1.1.7')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 48 diff --git a/endToEndTests/test/queries/recombinantLineage.json b/endToEndTests/test/queries/recombinantLineage.json index b7d36cfe5..f1cede4bb 100644 --- a/endToEndTests/test/queries/recombinantLineage.json +++ b/endToEndTests/test/queries/recombinantLineage.json @@ -1,16 +1,6 @@ { "testCaseName": "Recombinant lineage XBB including sublineages", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Lineage", - "column": "pango_lineage", - "value": "XBB", - "includeSublineages": true - } - }, + "query": "default.filter(pango_lineage.lineage('XBB', includeSublineages:=true)).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 1 diff --git a/endToEndTests/test/queries/recombinantLineageWithAlias.json b/endToEndTests/test/queries/recombinantLineageWithAlias.json index 6d11e8bb7..1de40bd11 100644 --- a/endToEndTests/test/queries/recombinantLineageWithAlias.json +++ b/endToEndTests/test/queries/recombinantLineageWithAlias.json @@ -1,16 +1,6 @@ { "testCaseName": "Recombinant lineage GD with unaliasing", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Lineage", - "column": "pango_lineage", - "value": "GD", - "includeSublineages": true - } - }, + "query": "default.filter(pango_lineage.lineage('GD', includeSublineages:=true)).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 1 diff --git a/endToEndTests/test/queries/secondSequence.json b/endToEndTests/test/queries/secondSequence.json index b03a37099..f4ea096c4 100644 --- a/endToEndTests/test/queries/secondSequence.json +++ b/endToEndTests/test/queries/secondSequence.json @@ -1,16 +1,6 @@ { "testCaseName": "Access on second sequence - NucleotideEquals", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "NucleotideEquals", - "position": 1, - "symbol": "A", - "sequenceName": "testSecondSequence" - } - }, + "query": "default.filter(nucleotideEquals(position:=1, symbol:='A', sequenceName:='testSecondSequence')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 96 diff --git a/endToEndTests/test/queries/secondSequenceHasMutation.json b/endToEndTests/test/queries/secondSequenceHasMutation.json index 56b4a9020..ff4fbb8e0 100644 --- a/endToEndTests/test/queries/secondSequenceHasMutation.json +++ b/endToEndTests/test/queries/secondSequenceHasMutation.json @@ -1,15 +1,6 @@ { "testCaseName": "Access on second sequence - HasNucleotideMutation", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "HasNucleotideMutation", - "position": 2, - "sequenceName": "testSecondSequence" - } - }, + "query": "default.filter(hasMutation(position:=2, sequenceName:='testSecondSequence')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 6 diff --git a/endToEndTests/test/queries/sequenceEndFilter.json b/endToEndTests/test/queries/sequenceEndFilter.json index ade4dbb49..2f4a05fb8 100644 --- a/endToEndTests/test/queries/sequenceEndFilter.json +++ b/endToEndTests/test/queries/sequenceEndFilter.json @@ -1,15 +1,6 @@ { "testCaseName": "Filtering for the last genome position", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "position": 29903, - "symbol": "-", - "type": "NucleotideEquals" - } - }, + "query": "default.filter(nucleotideEquals(position:=29903, symbol:='-')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 45 diff --git a/endToEndTests/test/queries/sequenceStartEndMutations.json b/endToEndTests/test/queries/sequenceStartEndMutations.json index e988ff19b..87b533ed3 100644 --- a/endToEndTests/test/queries/sequenceStartEndMutations.json +++ b/endToEndTests/test/queries/sequenceStartEndMutations.json @@ -1,26 +1,6 @@ { "testCaseName": "Getting a mutation distribution that contains the first and last genome position", - "query": { - "action": { - "type": "Mutations", - "minProportion": 1 - }, - "filterExpression": { - "children": [ - { - "position": 1, - "symbol": "-", - "type": "NucleotideEquals" - }, - { - "position": 29903, - "symbol": "-", - "type": "NucleotideEquals" - } - ], - "type": "And" - } - }, + "query": "default.filter((nucleotideEquals(position:=1, symbol:='-')) && (nucleotideEquals(position:=29903, symbol:='-'))).mutations(minProportion:=1)", "expectedQueryResult": [ { "count": 42, diff --git a/endToEndTests/test/queries/sequenceStartFilter.json b/endToEndTests/test/queries/sequenceStartFilter.json index 1854fc573..06f37ee42 100644 --- a/endToEndTests/test/queries/sequenceStartFilter.json +++ b/endToEndTests/test/queries/sequenceStartFilter.json @@ -1,15 +1,6 @@ { "testCaseName": "Filtering for the first genome position", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "position": 1, - "symbol": "-", - "type": "NucleotideEquals" - } - }, + "query": "default.filter(nucleotideEquals(position:=1, symbol:='-')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 45 diff --git a/endToEndTests/test/queries/stringEquals.json b/endToEndTests/test/queries/stringEquals.json index 6093cb8c1..b2c61802d 100644 --- a/endToEndTests/test/queries/stringEquals.json +++ b/endToEndTests/test/queries/stringEquals.json @@ -1,15 +1,6 @@ { "testCaseName": "StringEquals for region", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "StringEquals", - "column": "country", - "value": "Switzerland" - } - }, + "query": "default.filter(country = 'Switzerland').groupBy({count:=count()})", "expectedQueryResult": [ { "count": 100 diff --git a/endToEndTests/test/queries/stringEqualsOnUnindexedColumn.json b/endToEndTests/test/queries/stringEqualsOnUnindexedColumn.json index c91558f0a..3340b0bea 100644 --- a/endToEndTests/test/queries/stringEqualsOnUnindexedColumn.json +++ b/endToEndTests/test/queries/stringEqualsOnUnindexedColumn.json @@ -1,15 +1,6 @@ { "testCaseName": "StringEquals for unindexed column primary_key", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "StringEquals", - "column": "primary_key", - "value": "key_41" - } - }, + "query": "default.filter(primary_key = 'key_41').groupBy({count:=count()})", "expectedQueryResult": [ { "count": 1 diff --git a/endToEndTests/test/queries/stringSearch_basic_regex.json b/endToEndTests/test/queries/stringSearch_basic_regex.json index 76a11b807..4555eb2ea 100644 --- a/endToEndTests/test/queries/stringSearch_basic_regex.json +++ b/endToEndTests/test/queries/stringSearch_basic_regex.json @@ -1,18 +1,6 @@ { "testCaseName": "StringSearch with a basic regex", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Not", - "child": { - "type": "StringSearch", - "column": "primary_key", - "searchExpression": "key" - } - } - }, + "query": "default.filter(!(primary_key.like('key'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 0 diff --git a/endToEndTests/test/queries/stringSearch_digitAmount.json b/endToEndTests/test/queries/stringSearch_digitAmount.json index 59bbc5ae1..d4d9edfa8 100644 --- a/endToEndTests/test/queries/stringSearch_digitAmount.json +++ b/endToEndTests/test/queries/stringSearch_digitAmount.json @@ -1,15 +1,6 @@ { "testCaseName": "StringSearch that matches the primary key to end with exactly six digits", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "StringSearch", - "column": "primary_key", - "searchExpression": "^\\D*\\d{1}$" - } - }, + "query": "default.filter(primary_key.like('^\\\\D*\\\\d{1}$')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 9 diff --git a/endToEndTests/test/queries/stringSearch_justAString.json b/endToEndTests/test/queries/stringSearch_justAString.json index 049127b32..66889a819 100644 --- a/endToEndTests/test/queries/stringSearch_justAString.json +++ b/endToEndTests/test/queries/stringSearch_justAString.json @@ -1,15 +1,6 @@ { "testCaseName": "StringSearch that matches exactly a string", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "StringSearch", - "column": "division", - "searchExpression": "^Aargau$" - } - }, + "query": "default.filter(division.like('^Aargau$')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 6 diff --git a/endToEndTests/test/queries/stringSearch_prefix.json b/endToEndTests/test/queries/stringSearch_prefix.json index ca52e4769..012eff09c 100644 --- a/endToEndTests/test/queries/stringSearch_prefix.json +++ b/endToEndTests/test/queries/stringSearch_prefix.json @@ -1,15 +1,6 @@ { "testCaseName": "StringSearch with a regex matching the prefix", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "StringSearch", - "column": "primary_key", - "searchExpression": "^key_" - } - }, + "query": "default.filter(primary_key.like('^key_')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 100 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolA.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolA.json index f4ef24e66..e6cb02f91 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolA.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolA.json @@ -1,16 +1,6 @@ { "testCaseName": "Test sequence has symbol A at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "A", - "sequenceName": "testSecondSequence" - } - }, + "query": "default.filter(nucleotideEquals(position:=2, symbol:='A', sequenceName:='testSecondSequence')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 2 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolC.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolC.json index 9ba9cc252..1f9f57d5f 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolC.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolC.json @@ -1,16 +1,6 @@ { "testCaseName": "Test sequence has symbol C at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "C", - "sequenceName": "testSecondSequence" - } - }, + "query": "default.filter(nucleotideEquals(position:=2, symbol:='C', sequenceName:='testSecondSequence')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 88 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactA.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactA.json index 0572e83d4..f6c11d15f 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactA.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactA.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has exactly symbol A at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Exact", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "A", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(exact(nucleotideEquals(position:=2, symbol:='A', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 2 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactC.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactC.json index 411a4512c..40885d398 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactC.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactC.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has exactly symbol C at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Exact", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "C", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(exact(nucleotideEquals(position:=2, symbol:='C', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 88 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactG.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactG.json index f239b129f..cab26cb67 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactG.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactG.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has exactly symbol G at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Exact", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "G", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(exact(nucleotideEquals(position:=2, symbol:='G', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 0 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactGAP.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactGAP.json index 55dd7bdde..8f0266904 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactGAP.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactGAP.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has exactly symbol GAP at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Exact", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "-", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(exact(nucleotideEquals(position:=2, symbol:='-', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 1 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactN.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactN.json index f2b0cbc80..33edcd9d0 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactN.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactN.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has exactly symbol N at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Exact", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "N", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(exact(nucleotideEquals(position:=2, symbol:='N', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 5 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactR.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactR.json index 311ddc1fa..df5d39c72 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactR.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactR.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has exactly symbol R at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Exact", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "R", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(exact(nucleotideEquals(position:=2, symbol:='R', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 2 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactT.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactT.json index eabab2da6..381daca0d 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactT.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactT.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has exactly symbol T at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Exact", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "T", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(exact(nucleotideEquals(position:=2, symbol:='T', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 1 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactY.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactY.json index e43631c1d..f5cc0bbec 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactY.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolExactY.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has exactly symbol Y at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Exact", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "Y", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(exact(nucleotideEquals(position:=2, symbol:='Y', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 1 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolG.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolG.json index 6f91d6cfc..694a3704c 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolG.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolG.json @@ -1,16 +1,6 @@ { "testCaseName": "Test sequence has symbol G at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "G", - "sequenceName": "testSecondSequence" - } - }, + "query": "default.filter(nucleotideEquals(position:=2, symbol:='G', sequenceName:='testSecondSequence')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 0 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolGAP.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolGAP.json index 1bcc64460..d19e5def7 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolGAP.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolGAP.json @@ -1,16 +1,6 @@ { "testCaseName": "Test sequence has symbol GAP at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "-", - "sequenceName": "testSecondSequence" - } - }, + "query": "default.filter(nucleotideEquals(position:=2, symbol:='-', sequenceName:='testSecondSequence')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 1 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeA.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeA.json index 32fd9f3a8..bbe02c858 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeA.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeA.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has maybe symbol A at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Maybe", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "A", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(maybe(nucleotideEquals(position:=2, symbol:='A', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 9 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeC.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeC.json index 85cd84ae8..903d17b97 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeC.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeC.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has maybe symbol C at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Maybe", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "C", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(maybe(nucleotideEquals(position:=2, symbol:='C', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 94 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeG.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeG.json index bcbef2645..16754f882 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeG.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeG.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has maybe symbol G at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Maybe", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "G", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(maybe(nucleotideEquals(position:=2, symbol:='G', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 7 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeGAP.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeGAP.json index f2a478e9c..ab40f04da 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeGAP.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeGAP.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has maybe symbol GAP at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Maybe", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "-", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(maybe(nucleotideEquals(position:=2, symbol:='-', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 6 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeN.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeN.json index 50c4b463a..523dcd846 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeN.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeN.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has maybe symbol N at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Maybe", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "N", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(maybe(nucleotideEquals(position:=2, symbol:='N', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 5 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeR.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeR.json index 5560c0826..8fea2bb0b 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeR.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeR.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has maybe symbol R at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Maybe", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "R", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(maybe(nucleotideEquals(position:=2, symbol:='R', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 7 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeT.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeT.json index 3d8baaceb..eabf044f0 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeT.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeT.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has maybe symbol T at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Maybe", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "T", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(maybe(nucleotideEquals(position:=2, symbol:='T', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 7 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeY.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeY.json index 9b07bedbd..b8cff6efd 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeY.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolMaybeY.json @@ -1,19 +1,6 @@ { "testCaseName": "Test sequence has maybe symbol Y at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "Maybe", - "child": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "Y", - "sequenceName": "testSecondSequence" - } - } - }, + "query": "default.filter(maybe(nucleotideEquals(position:=2, symbol:='Y', sequenceName:='testSecondSequence'))).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 6 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolN.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolN.json index 16596312e..e54cec79f 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolN.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolN.json @@ -1,16 +1,6 @@ { "testCaseName": "Test sequence has symbol N at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "N", - "sequenceName": "testSecondSequence" - } - }, + "query": "default.filter(nucleotideEquals(position:=2, symbol:='N', sequenceName:='testSecondSequence')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 5 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolR.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolR.json index fb78a171c..97612dd7a 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolR.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolR.json @@ -1,16 +1,6 @@ { "testCaseName": "Test sequence has symbol R at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "R", - "sequenceName": "testSecondSequence" - } - }, + "query": "default.filter(nucleotideEquals(position:=2, symbol:='R', sequenceName:='testSecondSequence')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 2 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolT.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolT.json index 2ea371f14..7c6919158 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolT.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolT.json @@ -1,16 +1,6 @@ { "testCaseName": "Test sequence has symbol T at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "T", - "sequenceName": "testSecondSequence" - } - }, + "query": "default.filter(nucleotideEquals(position:=2, symbol:='T', sequenceName:='testSecondSequence')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 1 diff --git a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolY.json b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolY.json index 20fc5425a..6e057d7f9 100644 --- a/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolY.json +++ b/endToEndTests/test/queries/symbolEquals/testSeqPos2SymbolY.json @@ -1,16 +1,6 @@ { "testCaseName": "Test sequence has symbol Y at position 2", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "NucleotideEquals", - "position": 2, - "symbol": "Y", - "sequenceName": "testSecondSequence" - } - }, + "query": "default.filter(nucleotideEquals(position:=2, symbol:='Y', sequenceName:='testSecondSequence')).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 1 diff --git a/endToEndTests/test/queries/unsortedDateBetween.json b/endToEndTests/test/queries/unsortedDateBetween.json index dd340a054..062f215c9 100644 --- a/endToEndTests/test/queries/unsortedDateBetween.json +++ b/endToEndTests/test/queries/unsortedDateBetween.json @@ -1,16 +1,6 @@ { "testCaseName": "DateBetween Query for date column that is not sorted", - "query": { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "DateBetween", - "column": "unsorted_date", - "from": "2021-03-18", - "to": "2021-03-20" - } - }, + "query": "default.filter(unsorted_date.between('2021-03-18'::date, '2021-03-20'::date)).groupBy({count:=count()})", "expectedQueryResult": [ { "count": 2 diff --git a/endToEndTests/test/query.test.js b/endToEndTests/test/query.test.js index 256381677..65b8c1550 100644 --- a/endToEndTests/test/query.test.js +++ b/endToEndTests/test/query.test.js @@ -62,13 +62,19 @@ function parseNdjsonResponse(response) { const formats = [ { name: 'NDJSON', - request: query => server.post('/query').send(query), + request: query => server.post('/query').set('Content-Type', 'text/plain').send(query), expectedContentType: 'application/x-ndjson', parseResponse: response => parseNdjsonResponse(response), }, { name: 'Arrow IPC', - request: query => server.post('/query').set('Accept', ARROW_MIME).send(query).responseType('blob'), + request: query => + server + .post('/query') + .set('Content-Type', 'text/plain') + .set('Accept', ARROW_MIME) + .send(query) + .responseType('blob'), expectedContentType: ARROW_MIME, parseResponse: response => arrowTableToObjects(tableFromIPC(response.body)), }, @@ -118,7 +124,7 @@ describe('The /query endpoint', () => { ); invalidQueryTestCases.forEach(testCase => it('should return the expected error for the test case ' + testCase.testCaseName, async () => { - const response = await server.post('/query').send(testCase.query); + const response = await server.post('/query').set('Content-Type', 'text/plain').send(testCase.query); const errorMessage = 'Actual result is:\n' + response.text + '\n'; expect(response.status, errorMessage).to.equal(400); @@ -143,49 +149,21 @@ describe('The /query endpoint', () => { }); }); - it('should return a bad request response when POSTing an invalid JSON', async () => { + it('should return a bad request response when POSTing invalid SaneQL', async () => { await server .post('/query') - .send('{ not a valid json') + .set('Content-Type', 'text/plain') + .send('this is not valid saneql !!!') .expect(400) - .expect('Content-Type', 'application/json') - .expect({ - error: 'Bad request', - message: - 'The query was not a valid JSON: [json.exception.parse_error.101] ' + - 'parse error at line 1, column 4: syntax error while parsing object key - invalid literal; ' + - "last read: '{ no'; expected string literal", - }); + .expect('Content-Type', 'application/json'); }); - it('should return a bad request response when POSTing a JSON without filter and action', async () => { + it('should return a bad request response when POSTing an empty query', async () => { await server .post('/query') - .send({ someJson: 'but missing expected properties' }) + .set('Content-Type', 'text/plain') + .send('') .expect(400) - .expect('Content-Type', 'application/json') - .expect({ - error: 'Bad request', - message: 'Query json must contain filterExpression and action.', - }); - }); - - it('should return a bad request response when POSTing an invalid filter type', async () => { - await server - .post('/query') - .send({ - action: { - type: 'invalid action', - }, - filterExpression: { - type: 'invalid filter type', - }, - }) - .expect(400) - .expect('Content-Type', 'application/json') - .expect({ - error: 'Bad request', - message: "Unknown object filter type 'invalid filter type'", - }); + .expect('Content-Type', 'application/json'); }); }); diff --git a/endToEndTests/test/requestId.test.js b/endToEndTests/test/requestId.test.js index e3afc003d..946eb8f33 100644 --- a/endToEndTests/test/requestId.test.js +++ b/endToEndTests/test/requestId.test.js @@ -11,7 +11,8 @@ describe('The request id', () => { await server .post('/query') .set(X_REQUEST_ID, requestID) - .send({ action: { type: 'Aggregated' }, filterExpression: { type: 'True' } }) + .set('Content-Type', 'text/plain') + .send('default.groupBy({count:=count()})') .expect(200) .expect(X_REQUEST_ID, requestID); }); @@ -19,7 +20,8 @@ describe('The request id', () => { it('should be generated when none is specified', async () => { await server .post('/query') - .send({ action: { type: 'Aggregated' }, filterExpression: { type: 'True' } }) + .set('Content-Type', 'text/plain') + .send('default.groupBy({count:=count()})') .expect(200) .expect(response => { const headers = response.headers; diff --git a/performance/many_short_read_filters.cpp b/performance/many_short_read_filters.cpp index f925c0222..47016d6d4 100644 --- a/performance/many_short_read_filters.cpp +++ b/performance/many_short_read_filters.cpp @@ -12,15 +12,11 @@ #include "silo/append/database_inserter.h" #include "silo/append/ndjson_line_reader.h" #include "silo/initialize/initializer.h" -#include "silo/query_engine/action_query.h" -#include "silo/query_engine/planner.h" -#include "silo/query_engine/binder.h" #include "silo/query_engine/exec_node/ndjson_sink.h" +#include "silo/query_engine/planner.h" #include "silo/storage/reference_genomes.h" -using silo::query_engine::ActionQuery; using silo::query_engine::Planner; -using silo::query_engine::Binder; using silo::Database; namespace { @@ -313,7 +309,16 @@ class QueryGenerator { if (use_all_symbols) { // Query all 5 symbols (A, C, G, T, -) at the same position in an OR return fmt::format( - R"({{"action":{{"type":"Aggregated"}},"filterExpression":{{"children":[{{"children":[{{"children":[{{"column":"locationName","value":"generated","type":"StringEquals"}}],"type":"Or"}},{{"column":"samplingDate","from":"2024-01-01","to":"2024-01-07","type":"DateBetween"}}],"type":"And"}},{{"children":[{{"position":{0},"symbol":"A","type":"NucleotideEquals"}},{{"position":{0},"symbol":"C","type":"NucleotideEquals"}},{{"position":{0},"symbol":"G","type":"NucleotideEquals"}},{{"position":{0},"symbol":"T","type":"NucleotideEquals"}},{{"position":{0},"symbol":"-","type":"NucleotideEquals"}}],"type":"Or"}},{{"column":"samplingDate","from":"2024-01-01","to":"2024-01-07","type":"DateBetween"}}],"type":"And"}}}})", + "default.filter(" + "locationName = 'generated' && " + "samplingDate.between('2024-01-01'::date, '2024-01-07'::date) && " + "(nucleotideEquals(position:={0}, symbol:='A') || " + "nucleotideEquals(position:={0}, symbol:='C') || " + "nucleotideEquals(position:={0}, symbol:='G') || " + "nucleotideEquals(position:={0}, symbol:='T') || " + "nucleotideEquals(position:={0}, symbol:='-')) && " + "samplingDate.between('2024-01-01'::date, '2024-01-07'::date)" + ").groupBy({{count:=count()}})", position ); } @@ -321,7 +326,12 @@ class QueryGenerator { std::uniform_int_distribution sym_dist(0, SYMBOLS.size() - 1); char symbol = SYMBOLS[sym_dist(rng)]; return fmt::format( - R"({{"action":{{"type":"Aggregated"}},"filterExpression":{{"children":[{{"children":[{{"children":[{{"column":"locationName","value":"generated","type":"StringEquals"}}],"type":"Or"}},{{"column":"samplingDate","from":"2024-01-01","to":"2024-01-07","type":"DateBetween"}}],"type":"And"}},{{"position":{},"symbol":"{}","type":"NucleotideEquals"}},{{"column":"samplingDate","from":"2024-01-01","to":"2024-01-07","type":"DateBetween"}}],"type":"And"}}}})", + "default.filter(" + "locationName = 'generated' && " + "samplingDate.between('2024-01-01'::date, '2024-01-07'::date) && " + "nucleotideEquals(position:={}, symbol:='{}') && " + "samplingDate.between('2024-01-01'::date, '2024-01-07'::date)" + ").groupBy({{count:=count()}})", position, symbol ); @@ -340,10 +350,8 @@ void executeAllQueries( SPDLOG_INFO("Executing query number {}", query_num); } std::string query_string = query_gen.generateQuery(); - auto query = ActionQuery::parseQuery(query_string); - - auto bound_query = Binder::bindQuery(std::move(query), database->tables); - auto query_plan = Planner::planQuery(std::move(bound_query), database->tables, {}, "test_query"); std::stringstream result; + auto query_plan = + Planner::planSaneqlQuery(query_string, database->tables, {}, "test_query"); std::ofstream null_output("/dev/null"); silo::query_engine::exec_node::NdjsonSink sink{&null_output, query_plan.results_schema}; diff --git a/performance/many_string_equals.cpp b/performance/many_string_equals.cpp index 07c6bfdf1..20d65c25f 100644 --- a/performance/many_string_equals.cpp +++ b/performance/many_string_equals.cpp @@ -10,30 +10,30 @@ #include "silo/append/database_inserter.h" #include "silo/append/ndjson_line_reader.h" #include "silo/initialize/initializer.h" -#include "silo/query_engine/action_query.h" -#include "silo/query_engine/actions/aggregated.h" #include "silo/query_engine/exec_node/ndjson_sink.h" -#include "silo/query_engine/planner.h" -#include "silo/query_engine/binder.h" #include "silo/query_engine/filter/expressions/or.h" #include "silo/query_engine/filter/expressions/string_equals.h" #include "silo/query_engine/filter/expressions/string_in_set.h" -#include "silo/query_engine/filter/expressions/true.h" +#include "silo/query_engine/operators/aggregate_node.h" +#include "silo/query_engine/operators/filter_node.h" #include "silo/query_engine/operators/query_node.h" +#include "silo/query_engine/operators/scan_node.h" +#include "silo/query_engine/planner.h" namespace { using silo::Database; -using silo::query_engine::actions::Aggregated; -using silo::query_engine::ActionQuery; using silo::query_engine::Planner; -using silo::query_engine::Binder; using silo::query_engine::filter::expressions::Expression; using silo::query_engine::filter::expressions::ExpressionVector; using silo::query_engine::filter::expressions::Or; using silo::query_engine::filter::expressions::StringEquals; using silo::query_engine::filter::expressions::StringInSet; -using silo::query_engine::filter::expressions::True; +using silo::query_engine::operators::AggregateDefinition; +using silo::query_engine::operators::AggregateFunction; +using silo::query_engine::operators::AggregateNode; +using silo::query_engine::operators::FilterNode; +using silo::query_engine::operators::ScanNode; std::shared_ptr initializeDatabase() { auto database_config = silo::config::DatabaseConfig::getValidatedConfig(R"( @@ -128,9 +128,22 @@ std::unique_ptr buildStringInSet( return std::make_unique(column, std::move(value_set)); } -void executeAggregatedQuery(const std::shared_ptr& database, ActionQuery& query) { - auto query_tree = Binder::bindQuery(std::move(query), database->tables); - auto query_plan = Planner::planQuery(std::move(query_tree), database->tables, {}, "benchmark_query"); +void executeCountWithFilter( + const std::shared_ptr& database, + std::unique_ptr filter +) { + const auto& table_name = silo::schema::TableName::getDefault(); + auto table = database->tables.at(table_name); + auto scan = std::make_unique(table_name, table->schema->getColumnIdentifiers()); + auto filter_node = std::make_unique(std::move(scan), std::move(filter)); + std::vector aggregates{ + {.output_name = "count", .function = AggregateFunction::COUNT, .source_column = std::nullopt} + }; + auto root = std::make_unique( + std::move(filter_node), std::vector{}, std::move(aggregates) + ); + auto query_plan = + Planner::planQuery(std::move(root), database->tables, {}, "benchmark_query"); std::stringstream result; silo::query_engine::exec_node::NdjsonSink sink{&result, query_plan.results_schema}; query_plan.executeAndWrite(sink, /*timeout_in_seconds=*/60); @@ -154,11 +167,9 @@ BenchmarkResult runBenchmark( for (int i = 0; i < iterations; ++i) { // Build a fresh filter for each iteration auto filter = build_filter(); - auto action = std::make_unique(std::vector{}); - ActionQuery query{std::move(filter), std::move(action)}; auto start = std::chrono::high_resolution_clock::now(); - executeAggregatedQuery(database, query); + executeCountWithFilter(database, std::move(filter)); auto end = std::chrono::high_resolution_clock::now(); auto duration = std::chrono::duration_cast(end - start).count(); durations.push_back(duration); diff --git a/performance/mutation_benchmark.cpp b/performance/mutation_benchmark.cpp index 22c287bb3..f62c1efcc 100644 --- a/performance/mutation_benchmark.cpp +++ b/performance/mutation_benchmark.cpp @@ -3,26 +3,12 @@ #include "silo/append/database_inserter.h" #include "silo/append/ndjson_line_reader.h" #include "silo/initialize/initializer.h" -#include "silo/query_engine/action_query.h" -#include "silo/query_engine/planner.h" -#include "silo/query_engine/binder.h" -#include "silo/query_engine/actions/mutations.h" #include "silo/query_engine/exec_node/ndjson_sink.h" -#include "silo/query_engine/filter/expressions/negation.h" -#include "silo/query_engine/filter/expressions/string_equals.h" -#include "silo/query_engine/filter/expressions/true.h" -#include "silo/query_engine/operators/query_node.h" +#include "silo/query_engine/planner.h" namespace { -using silo::query_engine::ActionQuery; using silo::query_engine::Planner; -using silo::query_engine::Binder; -using silo::query_engine::filter::expressions::True; -using silo::query_engine::filter::expressions::Negation; -using silo::query_engine::filter::expressions::StringEquals; -using silo::Nucleotide; -using silo::query_engine::actions::Mutations; using silo::Database; std::shared_ptr initializeDatabaseWithSingleReference(std::string reference){ @@ -115,17 +101,12 @@ void printClipped(const std::string& output){ } void executeMutationsAllQuery(const std::shared_ptr& database){ - using silo::query_engine::actions::Mutations; - - ActionQuery query{ - std::make_unique(), - std::make_unique>( - std::vector{"main"}, 0.05, std::vector{} - ) - }; - - auto bound_query = Binder::bindQuery(std::move(query), database->tables); - auto query_plan = Planner::planQuery(std::move(bound_query), database->tables, {}, "test_query"); + auto query_plan = Planner::planSaneqlQuery( + "default.mutations(minProportion:=0.05, sequenceNames:={main})", + database->tables, + {}, + "test_query" + ); std::stringstream result; silo::query_engine::exec_node::NdjsonSink sink{&result, query_plan.results_schema}; query_plan.executeAndWrite(sink, /*timeout_in_seconds=*/3); @@ -133,18 +114,13 @@ void executeMutationsAllQuery(const std::shared_ptr& database){ } void executeMutationsAlmostAllQuery(const std::shared_ptr& database){ - using silo::query_engine::actions::Mutations; - - - ActionQuery query{ - std::make_unique(std::make_unique("key", "3")), - std::make_unique>( - std::vector{"main"}, 0.05, std::vector{} - ) - }; - - auto bound_query = Binder::bindQuery(std::move(query), database->tables); - auto query_plan = Planner::planQuery(std::move(bound_query), database->tables, {}, "test_query"); std::stringstream result; + auto query_plan = Planner::planSaneqlQuery( + "default.filter(!(key = '3')).mutations(minProportion:=0.05, sequenceNames:={main})", + database->tables, + {}, + "test_query" + ); + std::stringstream result; silo::query_engine::exec_node::NdjsonSink sink{&result, query_plan.results_schema}; query_plan.executeAndWrite(sink, /*timeout_in_seconds=*/3); printClipped(result.str()); diff --git a/python/silodb/database.pxd b/python/silodb/database.pxd index cb117c6be..e5b316157 100644 --- a/python/silodb/database.pxd +++ b/python/silodb/database.pxd @@ -28,7 +28,7 @@ cdef extern from "silo/database.h" namespace "silo": vector[pair[uint64_t, string]] getPrevalentAminoAcidMutations(string table_name, string sequence_name, double prevalence_threshold, string filter) except +handle_silo_exception Roaring getFilteredBitmap(string table_name, string filter) except +handle_silo_exception void saveDatabaseState(string save_directory) except + - string executeQueryAsArrowIpc(string table_name, string query_json) except +handle_silo_exception + string executeQueryAsArrowIpc(string query_string) except +handle_silo_exception string getTablesAsArrowIpc() except + @staticmethod diff --git a/python/silodb/database.pyx b/python/silodb/database.pyx index da1b691f3..d97fed5dd 100644 --- a/python/silodb/database.pyx +++ b/python/silodb/database.pyx @@ -346,7 +346,7 @@ cdef class PyDatabase: raise ValueError("prevalence_threshold must be between 0.0 and 1.0") # Default to True filter (returns all rows) if no filter specified if filter_expression is None or filter_expression == "": - filter_expression = '{"type":"True"}' + filter_expression = 'true' cpp_table_name = table_name.encode('utf-8') cpp_sequence_name = sequence_name.encode('utf-8') @@ -404,7 +404,7 @@ cdef class PyDatabase: raise ValueError("prevalence_threshold must be between 0.0 and 1.0") # Default to True filter (returns all rows) if no filter specified if filter_expression is None or filter_expression == "": - filter_expression = '{"type":"True"}' + filter_expression = 'true' cpp_table_name = table_name.encode('utf-8') cpp_sequence_name = sequence_name.encode('utf-8') @@ -437,7 +437,7 @@ cdef class PyDatabase: table_name : str Name of the table filter_expression : str, optional - Filter expression in JSON format (default: '{"type":"True"}' which matches all rows) + SaneQL filter expression (default: 'true' which matches all rows) Returns ------- @@ -455,7 +455,7 @@ cdef class PyDatabase: # Default to True filter (returns all rows) if no filter specified if filter_expression is None or filter_expression == "": - filter_expression = '{"type":"True"}' + filter_expression = 'true' cpp_table_name = table_name.encode('utf-8') cpp_filter = filter_expression.encode('utf-8') @@ -483,17 +483,15 @@ cdef class PyDatabase: except Exception as e: raise RuntimeError(f"Failed to get filtered bitmap: {e}") - def execute_query(self, str table_name, str query_json): + def execute_query(self, str query_string): """ Execute a query and return results as a PyArrow Table Parameters ---------- - table_name : str - Name of the table to query - query_json : str - Query in JSON format. Must contain 'filterExpression' and 'action' fields. - Example: '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}' + query_string : str + SaneQL query string. The leading identifier is the table name. + Example: 'sequences.filter(true)' or 'sequences.mutations(minProportion:=0.05)' Returns ------- @@ -503,25 +501,20 @@ cdef class PyDatabase: Example ------- >>> db = PyDatabase("path/to/database") - >>> query = '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}' - >>> table = db.execute_query("my_table", query) + >>> table = db.execute_query("my_table.filter(true)") >>> print(table.schema) >>> df = table.to_pandas() # Convert to pandas DataFrame """ - cdef string cpp_table_name - cdef string cpp_query_json + cdef string cpp_query_string cdef string ipc_buffer - if not table_name or not table_name.strip(): - raise ValueError("table_name cannot be empty") - if not query_json or not query_json.strip(): - raise ValueError("query_json cannot be empty") + if not query_string or not query_string.strip(): + raise ValueError("query_string cannot be empty") - cpp_table_name = table_name.encode('utf-8') - cpp_query_json = query_json.encode('utf-8') + cpp_query_string = query_string.encode('utf-8') try: - ipc_buffer = self.c_database.executeQueryAsArrowIpc(cpp_table_name, cpp_query_json) + ipc_buffer = self.c_database.executeQueryAsArrowIpc(cpp_query_string) # Convert IPC buffer to PyArrow Table buffer_reader = pa.BufferReader(ipc_buffer) diff --git a/python/tests/test_database.py b/python/tests/test_database.py index 55ff844a2..3a6665407 100644 --- a/python/tests/test_database.py +++ b/python/tests/test_database.py @@ -179,7 +179,7 @@ def test_get_filtered_bitmap_true_filter(self, empty_database, main_reference_se empty_database.append_data_from_file("sequences", INPUT_FILE) # True filter should return all rows - bitmap = empty_database.get_filtered_bitmap("sequences", '{"type":"True"}') + bitmap = empty_database.get_filtered_bitmap("sequences", 'true') assert isinstance(bitmap, pyroaring.BitMap) assert len(bitmap) > 0 # Should have at least one row from test data @@ -193,7 +193,7 @@ def test_get_filtered_bitmap_returns_bitmap(self, empty_database, main_reference ) empty_database.append_data_from_file("sequences", INPUT_FILE) - bitmap = empty_database.get_filtered_bitmap("sequences", '{"type":"True"}') + bitmap = empty_database.get_filtered_bitmap("sequences", 'true') assert isinstance(bitmap, pyroaring.BitMap) # Can iterate over bitmap to get indices indices = list(bitmap) @@ -237,7 +237,7 @@ def test_get_filtered_bitmap_supports_set_operations(self, empty_database, main_ ) empty_database.append_data_from_file("sequences", INPUT_FILE) - bitmap = empty_database.get_filtered_bitmap("sequences", '{"type":"True"}') + bitmap = empty_database.get_filtered_bitmap("sequences", 'true') # Test union, intersection operations other_bitmap = pyroaring.BitMap([0, 1, 2]) @@ -265,7 +265,7 @@ def test_get_prevalent_mutations_basic(self, empty_database, main_reference_sequ table_name="sequences", sequence_name="main", prevalence_threshold=0.5, - filter_expression='{"type":"True"}' + filter_expression='true' ) assert isinstance(mutations, list) @@ -283,7 +283,7 @@ def test_get_prevalent_mutations_returns_tuples(self, empty_database, main_refer table_name="sequences", sequence_name="main", prevalence_threshold=0.0, # Get all mutations - filter_expression='{"type":"True"}' + filter_expression='true' ) for mutation in mutations: @@ -303,10 +303,10 @@ def test_get_prevalent_mutations_threshold_filtering(self, empty_database, main_ empty_database.append_data_from_file("sequences", INPUT_FILE) low_threshold = empty_database.get_prevalent_nucleotide_mutations( - "sequences", "main", 0.1, '{"type":"True"}' + "sequences", "main", 0.1, 'true' ) high_threshold = empty_database.get_prevalent_nucleotide_mutations( - "sequences", "main", 0.9, '{"type":"True"}' + "sequences", "main", 0.9, 'true' ) # Higher threshold should return same or fewer mutations @@ -370,9 +370,9 @@ def test_checkpoint_preserves_data(self, empty_database, main_reference_sequence empty_database.append_data_from_file("sequences", INPUT_FILE) # Get data before save - bitmap_before = empty_database.get_filtered_bitmap("sequences", '{"type":"True"}') + bitmap_before = empty_database.get_filtered_bitmap("sequences", 'true') mutations_before = empty_database.get_prevalent_nucleotide_mutations( - "sequences", "main", 0.5, '{"type":"True"}' + "sequences", "main", 0.5, 'true' ) # Save and reload @@ -382,9 +382,9 @@ def test_checkpoint_preserves_data(self, empty_database, main_reference_sequence loaded_db = Database(save_path) # Compare with loaded data - bitmap_after = loaded_db.get_filtered_bitmap("sequences", '{"type":"True"}') + bitmap_after = loaded_db.get_filtered_bitmap("sequences", 'true') mutations_after = loaded_db.get_prevalent_nucleotide_mutations( - "sequences", "main", 0.5, '{"type":"True"}' + "sequences", "main", 0.5, 'true' ) assert bitmap_before == bitmap_after @@ -560,8 +560,8 @@ def test_extra_columns_accept_data(self): db.append_data_from_string("test", '{"id": "s1", "seq": {"sequence": "AAAA", "insertions": []}, "country": "USA", "lineage": "BA.1"}') db.append_data_from_string("test", '{"id": "s2", "seq": {"sequence": "CCCC", "insertions": []}, "country": "UK", "lineage": "BA.2"}') - query = '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}' - result = db.execute_query("test", query) + query = 'test' + result = db.execute_query(query) assert "country" in result.column_names assert "lineage" in result.column_names @@ -583,8 +583,8 @@ def test_extra_columns_default_empty(self): ) db.append_data_from_string("test", '{"id": "s1", "seq": {"sequence": "AAAA", "insertions": []}}') - query = '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}' - result = db.execute_query("test", query) + query = 'test' + result = db.execute_query(query) assert result.num_rows == 1 def test_extra_columns_with_none(self): @@ -601,8 +601,8 @@ def test_extra_columns_with_none(self): ) db.append_data_from_string("test", '{"id": "s1", "seq": {"sequence": "AAAA", "insertions": []}}') - query = '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}' - result = db.execute_query("test", query) + query = 'test' + result = db.execute_query(query) assert result.num_rows == 1 def test_extra_columns_invalid_type_raises(self): @@ -697,8 +697,8 @@ def test_execute_query_returns_pyarrow_table(self, empty_database, main_referenc ) empty_database.append_data_from_file("sequences", INPUT_FILE) - query = '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}' - result = empty_database.execute_query("sequences", query) + query = 'sequences' + result = empty_database.execute_query(query) assert isinstance(result, pa.Table) @@ -712,8 +712,8 @@ def test_execute_query_has_correct_schema(self, empty_database, main_reference_s ) empty_database.append_data_from_file("sequences", INPUT_FILE) - query = '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}' - result = empty_database.execute_query("sequences", query) + query = 'sequences' + result = empty_database.execute_query(query) # Should have at least the primary key column assert "primary_key" in result.column_names @@ -728,8 +728,8 @@ def test_execute_query_returns_data(self, empty_database, main_reference_sequenc ) empty_database.append_data_from_file("sequences", INPUT_FILE) - query = '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}' - result = empty_database.execute_query("sequences", query) + query = 'sequences' + result = empty_database.execute_query(query) assert result.num_rows > 0 @@ -744,12 +744,12 @@ def test_execute_query_with_filter(self, empty_database, main_reference_sequence empty_database.append_data_from_file("sequences", INPUT_FILE) # Get all rows first - all_query = '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}' - all_result = empty_database.execute_query("sequences", all_query) + all_query = 'sequences' + all_result = empty_database.execute_query(all_query) # Get filtered rows (False filter should return 0 rows) - filtered_query = '{"filterExpression": {"type": "False"}, "action": {"type": "Details"}}' - filtered_result = empty_database.execute_query("sequences", filtered_query) + filtered_query = 'sequences.filter(false)' + filtered_result = empty_database.execute_query(filtered_query) assert filtered_result.num_rows == 0 assert all_result.num_rows > filtered_result.num_rows @@ -764,8 +764,8 @@ def test_execute_query_to_batches(self, empty_database, main_reference_sequence) ) empty_database.append_data_from_file("sequences", INPUT_FILE) - query = '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}' - result = empty_database.execute_query("sequences", query) + query = 'sequences' + result = empty_database.execute_query(query) batches = result.to_batches() assert isinstance(batches, list) @@ -782,19 +782,14 @@ def test_execute_query_to_pydict(self, empty_database, main_reference_sequence): ) empty_database.append_data_from_file("sequences", INPUT_FILE) - query = '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}' - result = empty_database.execute_query("sequences", query) + query = 'sequences' + result = empty_database.execute_query(query) data = result.to_pydict() assert isinstance(data, dict) assert "primary_key" in data assert isinstance(data["primary_key"], list) - def test_execute_query_empty_table_name_raises(self, empty_database): - """Test that empty table name raises ValueError.""" - with pytest.raises(ValueError, match="table_name cannot be empty"): - empty_database.execute_query("", '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}') - def test_execute_query_empty_query_raises(self, empty_database, main_reference_sequence): """Test that empty query raises ValueError.""" empty_database.create_nucleotide_sequence_table( @@ -804,23 +799,11 @@ def test_execute_query_empty_query_raises(self, empty_database, main_reference_s reference_sequence=main_reference_sequence ) - with pytest.raises(ValueError, match="query_json cannot be empty"): - empty_database.execute_query("sequences", "") - - def test_execute_query_invalid_json_raises(self, empty_database, main_reference_sequence): - """Test that invalid JSON raises an error.""" - empty_database.create_nucleotide_sequence_table( - table_name="sequences", - primary_key_name="primary_key", - sequence_name="main", - reference_sequence=main_reference_sequence - ) - - with pytest.raises(ValueError, match="not a valid JSON"): - empty_database.execute_query("sequences", "not valid json") + with pytest.raises(ValueError, match="query_string cannot be empty"): + empty_database.execute_query("") - def test_execute_query_missing_action_raises(self, empty_database, main_reference_sequence): - """Test that query without action raises an error.""" + def test_execute_query_invalid_query_raises(self, empty_database, main_reference_sequence): + """Test that invalid SaneQL raises an error.""" empty_database.create_nucleotide_sequence_table( table_name="sequences", primary_key_name="primary_key", @@ -828,8 +811,8 @@ def test_execute_query_missing_action_raises(self, empty_database, main_referenc reference_sequence=main_reference_sequence ) - with pytest.raises(ValueError, match="must contain filterExpression and action"): - empty_database.execute_query("sequences", '{"filterExpression": {"type": "True"}}') + with pytest.raises(RuntimeError): + empty_database.execute_query("not valid saneql !") def test_execute_query_simple_database(self): """Test execute_query with a simple in-memory database.""" @@ -845,8 +828,8 @@ def test_execute_query_simple_database(self): db.append_data_from_string("test", '{"id": "sample1", "seq": {"sequence": "AAAA", "insertions": []}}') db.append_data_from_string("test", '{"id": "sample2", "seq": {"sequence": "CCCC", "insertions": []}}') - query = '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}' - result = db.execute_query("test", query) + query = 'test' + result = db.execute_query(query) assert isinstance(result, pa.Table) assert result.num_rows == 2 @@ -870,8 +853,8 @@ def test_execute_query_preserves_data_after_checkpoint(self, empty_database, mai empty_database.append_data_from_file("sequences", INPUT_FILE) # Query before checkpoint - query = '{"filterExpression": {"type": "True"}, "action": {"type": "Details"}}' - result_before = empty_database.execute_query("sequences", query) + query = 'sequences' + result_before = empty_database.execute_query(query) # Save and reload save_path = os.path.join(temp_dir, "checkpoint") @@ -879,7 +862,7 @@ def test_execute_query_preserves_data_after_checkpoint(self, empty_database, mai loaded_db = Database(save_path) # Query after checkpoint - result_after = loaded_db.execute_query("sequences", query) + result_after = loaded_db.execute_query(query) # Results should match assert result_before.num_rows == result_after.num_rows diff --git a/saneql.examples b/saneql.examples new file mode 100644 index 000000000..b74b67b22 --- /dev/null +++ b/saneql.examples @@ -0,0 +1,250 @@ +# SaneQL examples + +lineitem + .filter(l_shipdate <= '1998-12-01'::date - '90 days'::interval) + .groupby({l_returnflag, l_linestatus}, + {sum_qty:=sum(l_quantity), + sum_base_price:=sum(l_extendedprice), + sum_disc_price:=sum(l_extendedprice * (1 - l_discount)), + sum_charge:=sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)), + avg_qty:=avg(l_quantity), + avg_price:=avg(l_extendedprice), + avg_disc:=avg(l_discount), + count_order:=count() + }) + .orderby({l_returnflag, l_linestatus}) + + + +let min_supplycost_for_part(p_partkey) := + partsupp + .filter(ps_partkey = p_partkey) + .join(supplier, s_suppkey=ps_suppkey) + .join(nation, s_nationkey=n_nationkey) + .join(region.filter(r_name='EUROPE'), n_regionkey=r_regionkey).aggregate(min(ps_supplycost)), + part + .filter(condition:=p_size = 15 && p_type.like('%BRASS')) + .join(partsupp, p_partkey = ps_partkey) + .join(supplier, s_suppkey = ps_suppkey) + .join(nation, s_nationkey = n_nationkey) + .join(region.filter(r_name='EUROPE'), n_regionkey=r_regionkey) + .filter(ps_supplycost = min_supplycost_for_part(p_partkey)) + .orderby({s_acctbal.desc(), n_name, s_name, p_partkey}, limit:=100) + .project({s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment}) + + + + customer + .filter(c_mktsegment = 'BUILDING') + .join(orders.filter(o_orderdate < '1995-03-15'::date), c_custkey = o_custkey) + .join(lineitem.filter(l_shipdate > '1995-03-15'::date), l_orderkey = o_orderkey) + .groupby({l_orderkey,o_orderdate,o_shippriority},{revenue:=sum(l_extendedprice * (1 - l_discount))}) + .orderby({revenue.desc(), o_orderdate}, limit:=10) + .project({l_orderkey, revenue, o_orderdate, o_shippriority}) + + + + orders + .filter(o_orderdate >= '1993-07-01'::date && o_orderdate < '1993-07-01'::date + '3 month'::interval) + .join(lineitem.filter(l_commitdate < l_receiptdate), l_orderkey = o_orderkey, type:=exists) + .groupby({o_orderpriority}, {order_count:=count()}) + .orderby(o_orderpriority) + + + + customer + .join(orders.filter(o_orderdate >= '1994-01-01'::date && o_orderdate < '1994-01-01'::date + '1 year'::interval), c_custkey=o_custkey) + .join(lineitem, l_orderkey=o_orderkey) + .join(supplier, l_suppkey=s_suppkey) + .join(nation, s_nationkey=n_nationkey) + .join(region.filter(r_name='ASIA'), n_regionkey=r_regionkey) + .groupby({n_name}, {revenue:=sum(l_extendedprice * (1 - l_discount))}) + .orderby({revenue.desc()}) + .project({n_name, revenue}) + + + + lineitem + .filter(l_shipdate >= '1994-01-01'::date && l_shipdate < '1994-01-01'::date + '1 year'::interval && l_discount.between(0.06 - 0.01, 0.06 + 0.01) && l_quantity<24) + .aggregate(sum(l_extendedprice * l_discount)) + + + + supplier + .join(lineitem.filter(l_shipdate.between('1995-01-01'::date, '1996-12-31'::date)), s_suppkey=l_suppkey) + .join(orders, o_orderkey=l_orderkey) + .join(customer, c_custkey=o_custkey) + .join(nation.as(n1), s_nationkey=n1.n_nationkey) + .join(nation.as(n2), c_nationkey=n2.n_nationkey) + .filter((n1.n_name = 'FRANCE' && n2.n_name = 'GERMANY') || (n1.n_name = 'GERMANY' && n2.n_name = 'FRANCE')) + .map({supp_nation:=n1.n_name, cust_nation:=n2.n_name, l_year:=l_shipdate.extract(year), volume:=l_extendedprice * (1 - l_discount)}) + .groupby({supp_nation, cust_nation, l_year}, {revenue:=sum(volume)}) + .orderby({supp_nation, cust_nation, l_year}) + + + + part + .filter(p_type = 'ECONOMY ANODIZED STEEL') + .join(lineitem, p_partkey=l_partkey) + .join(supplier, s_suppkey=l_suppkey) + .join(orders.filter(o_orderdate.between('1995-01-01'::date, '1996-12-31'::date)), l_orderkey=o_orderkey) + .join(customer, o_custkey=c_custkey) + .join(nation.as(n1), c_nationkey=n1.n_nationkey) + .join(nation.as(n2), s_nationkey=n2.n_nationkey) + .join(region.filter(r_name='AMERICA'), n1.n_regionkey=r_regionkey) + .map({o_year:=o_orderdate.extract(year), volume:=l_extendedprice * (1 - l_discount), nation:=n2.n_name}) + .groupby({o_year}, {mkt_share:=sum(case({nation='BRAZIL' => volume}, else:=0))/sum(volume)}) + .orderby({o_year}) + + + + part + .filter(p_name.like('%green%')) + .join(lineitem, p_partkey=l_partkey) + .join(supplier, s_suppkey=l_suppkey) + .join(partsupp, ps_suppkey=l_suppkey && ps_partkey=l_partkey) + .join(orders, o_orderkey=l_orderkey) + .join(nation, s_nationkey=n_nationkey) + .map({nation:=n_name, o_year:=o_orderdate.extract(year), amount:=l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity}) + .groupby({nation, o_year}, {sum_profit:=sum(amount)}) + .orderby({nation, o_year.desc()}) + + + + let base := '1993-10-01'::date, + orders + .filter(o_orderdate >= base && o_orderdate < base + '3 month'::interval) + .join(customer, c_custkey=o_custkey) + .join(lineitem.filter(l_returnflag='R'), l_orderkey=o_orderkey) + .join(nation, c_nationkey=n_nationkey) + .groupby({c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment}, {revenue:=sum(l_extendedprice * (1 - l_discount))}) + .orderby({revenue.desc()}, limit:=20) + + + + let partsupp_germany := partsupp + .join(supplier, ps_suppkey=s_suppkey) + .join(nation.filter(n_name='GERMANY'), s_nationkey=n_nationkey), +partsupp_germany +.groupby(ps_partkey, {value:=sum(ps_supplycost * ps_availqty)}) +.filter(value>partsupp_germany.aggregate(sum(ps_supplycost*ps_availqty))*0.0001) +.orderby(value.desc()) + + + +let base := '1994-01-01'::date, +lineitem +.filter(l_commitdate < l_receiptdate && l_shipdate < l_commitdate && l_receiptdate >= base && l_receiptdate < base + '1 year'::interval && l_shipmode.in({'MAIL', 'SHIP'})) +.join(orders, o_orderkey=l_orderkey) +.groupby(l_shipmode, {high_line_count:=sum(case({o_orderpriority = '1-URGENT' || o_orderpriority = '2-HIGH' => 1}, else:=0)), low_line_count:=sum(case({o_orderpriority <> '1-URGENT' && o_orderpriority <> '2-HIGH' => 1}, else:=0))}) +.orderby(l_shipmode) + + + +customer +.join(orders.filter(!o_comment.like('%special%requests%')), c_custkey=o_custkey, type:=leftouter) +.groupby({c_custkey}, {c_count:=count(o_orderkey)}) +.groupby({c_count}, {custdist:=count()}) +.orderby({custdist.desc(), c_count.desc()}) + + + +let base:='1995-09-01'::date, +lineitem +.filter(l_shipdate >= base && l_shipdate < base + '1 month'::interval) +.join(part, l_partkey=p_partkey) +.aggregate(100.00*sum(case({p_type.like('PROMO%') => l_extendedprice * (1 - l_discount)}, else:=0)) / sum(l_extendedprice * (1 - l_discount))) + + + +let base := '1996-01-01'::date, +let revenue:= + lineitem + .filter(l_shipdate >= base && l_shipdate < base + '3 month'::interval) + .groupby(l_suppkey, {total_revenue:=sum(l_extendedprice * (1 - l_discount))}) + .project({supplier_no:=l_suppkey, total_revenue}), +supplier +.join(revenue, s_suppkey = supplier_no) +.filter(total_revenue=revenue.aggregate(max(total_revenue))) +.orderby({s_suppkey}) +.project({s_suppkey, s_name, s_address, s_phone, total_revenue}) + + + +part +.filter(p_brand <> 'Brand#45' && !p_type.like('MEDIUM POLISHED%') && p_size.in({49, 14, 23, 45, 19, 3, 36, 9})) +.join(partsupp, p_partkey=ps_partkey) +.join(supplier.filter(s_comment.like('%Customer%Complaints%')), ps_suppkey=s_suppkey, type:=leftanti) +.groupby({p_brand, p_type, p_size}, {supplier_cnt:=count(ps_suppkey, distinct:=true)}) +.orderby({supplier_cnt.desc(), p_brand, p_type, p_size}) + + + +let avg_for_part(p_partkey) := +lineitem.filter(l_partkey=p_partkey).aggregate(0.2*avg(l_quantity)), +part +.filter(p_brand = 'Brand#23' && p_container = 'MED BOX') +.join(lineitem, p_partkey=l_partkey) +.filter(l_quantity < avg_for_part(p_partkey)) + + + +customer +.join(orders, c_custkey=o_custkey) +.join(lineitem.groupby({l_orderkey}, {s:=sum(l_quantity)}).filter(s>300), o_orderkey=l_orderkey, type:=leftsemi) +.join(lineitem, o_orderkey=l_orderkey) +.groupby({c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice}, {s:=sum(l_quantity)}) +.orderby({o_totalprice.desc(), o_orderdate}, limit:=100) + + + +lineitem +.filter(l_shipmode.in({'AIR', 'AIR REG'}) && l_shipinstruct = 'DELIVER IN PERSON') +.join(part, p_partkey=l_partkey) +.filter( + (p_brand = 'Brand#12' && p_container.in({'SM CASE', 'SM BOX', 'SM PACK', 'SM PKG'}) && l_quantity.between(1,1+10) && p_size.between(1,5)) +|| (p_brand = 'Brand#23' && p_container.in({'MED BAG', 'MED BOX', 'MED PKG', 'MED PACK'}) && l_quantity.between(10,10+10) && p_size.between(1,10)) +|| (p_brand = 'Brand#34' && p_container.in({'LG CASE', 'LG BOX', 'LG PACK', 'LG PKG'}) && l_quantity.between(20,20+10) && p_size.between(1,15))) +.aggregate(sum(l_extendedprice* (1 - l_discount))) + + + +let base := '1994-01-01'::date, +let qty_per_ps(ps_partkey, ps_suppkey) := + lineitem + .filter(l_partkey = ps_partkey && l_suppkey = ps_suppkey && l_shipdate >= base && l_shipdate < base + '1 year'::interval) + .aggregate(sum(l_quantity)), +let avail := + partsupp + .join(part.filter(p_name.like('forest%')), ps_partkey=p_partkey, type:=leftsemi) + .filter(ps_availqty > 0.5*qty_per_ps(ps_partkey, ps_suppkey)) + .project(ps_suppkey), +supplier +.join(nation.filter(n_name='CANADA'), s_nationkey=n_nationkey) +.join(avail, s_suppkey=ps_suppkey, type:=leftsemi) +.orderby({s_name}) +.project({s_name, s_address}) + + + +supplier +.join(lineitem.filter(l_receiptdate>l_commitdate).as(l1), s_suppkey=l1.l_suppkey) +.join(orders.filter(o_orderstatus = 'F'), o_orderkey = l1.l_orderkey) +.join(nation.filter(n_name = 'SAUDI ARABIA'), s_nationkey = n_nationkey) +.join(lineitem.as(l2), l2.l_orderkey = l1.l_orderkey && l2.l_suppkey <> l1.l_suppkey, type:=leftsemi) +.join(lineitem.as(l3), l3.l_orderkey = l1.l_orderkey && l3.l_suppkey <> l1.l_suppkey && l3.l_receiptdate > l3.l_commitdate, type:=leftanti) +.groupby({s_name}, {numwait:=count()}) +.orderby({numwait.desc(), s_name}, limit:=100) + + + +let avg_for_selected := +customer +.filter(c_acctbal > 0.00 && c_phone.substr(1,2).in({'13', '31', '23', '29', '30', '18', '17'})) +.aggregate(avg(c_acctbal)), +customer +.map({cntrycode:=c_phone.substr(1,2)}) +.filter(cntrycode.in({'13', '31', '23', '29', '30', '18', '17'}) && c_acctbal > avg_for_selected) +.join(orders, o_custkey=c_custkey, type:=leftanti) +.groupby({cntrycode}, {numcust:=count(), totacctbal:=sum(c_acctbal)}) +.orderby({cntrycode}) diff --git a/src/silo/api/query_handler.cpp b/src/silo/api/query_handler.cpp index e26d5a3f2..97051f1e0 100644 --- a/src/silo/api/query_handler.cpp +++ b/src/silo/api/query_handler.cpp @@ -15,12 +15,11 @@ #include "silo/api/active_database.h" #include "silo/api/bad_request.h" #include "silo/api/error_request_handler.h" -#include "silo/query_engine/action_query.h" -#include "silo/query_engine/binder.h" #include "silo/query_engine/exec_node/arrow_ipc_sink.h" #include "silo/query_engine/exec_node/ndjson_sink.h" #include "silo/query_engine/illegal_query_exception.h" #include "silo/query_engine/planner.h" +#include "silo/query_engine/saneql/parse_exception.h" namespace silo::api { @@ -55,10 +54,8 @@ void QueryHandler::post( SPDLOG_INFO("Request Id [{}] - received query: {}", request_id, query_string); try { - auto action_query = query_engine::ActionQuery::parseQuery(query_string); - auto bound_query = query_engine::Binder::bindQuery(std::move(action_query), database->tables); - auto query_plan = query_engine::Planner::planQuery( - std::move(bound_query), database->tables, query_options, request_id + auto query_plan = query_engine::Planner::planSaneqlQuery( + query_string, database->tables, query_options, request_id ); response.set("data-version", database->getDataVersionTimestamp().value); @@ -87,6 +84,8 @@ void QueryHandler::post( EVOBENCH_SCOPE("QueryPlan", "executeAndWrite"); query_plan.executeAndWrite(output_sink, DEFAULT_TIMEOUT_TWO_MINUTES); } + } catch (const silo::query_engine::saneql::ParseException& ex) { + throw BadRequest(ex.what()); } catch (const silo::query_engine::IllegalQueryException& ex) { throw BadRequest(ex.what()); } diff --git a/src/silo/database.cpp b/src/silo/database.cpp index 31b8c3102..4eb6e153a 100644 --- a/src/silo/database.cpp +++ b/src/silo/database.cpp @@ -25,17 +25,6 @@ #include "silo/common/version.h" #include "silo/database_info.h" #include "silo/persistence/exception.h" -#include "silo/query_engine/action_query.h" -#include "silo/query_engine/actions/action.h" -#include "silo/query_engine/actions/aggregated.h" -#include "silo/query_engine/actions/details.h" -#include "silo/query_engine/actions/fasta.h" -#include "silo/query_engine/actions/fasta_aligned.h" -#include "silo/query_engine/actions/insertions.h" -#include "silo/query_engine/actions/most_recent_common_ancestor.h" -#include "silo/query_engine/actions/mutations.h" -#include "silo/query_engine/actions/phylo_subtree.h" -#include "silo/query_engine/binder.h" #include "silo/query_engine/exec_node/arrow_ipc_sink.h" #include "silo/query_engine/exec_node/ndjson_sink.h" #include "silo/query_engine/filter/expressions/true.h" @@ -43,6 +32,8 @@ #include "silo/query_engine/operators/query_node.h" #include "silo/query_engine/operators/table_scan_node.h" #include "silo/query_engine/planner.h" +#include "silo/query_engine/saneql/ast_to_query.h" +#include "silo/query_engine/saneql/parser.h" #include "silo/schema/database_schema.h" #include "silo/storage/column/sequence_column.h" @@ -185,8 +176,6 @@ void Database::appendDataFromString(const std::string& table_name, std::string j silo::append::appendDataToTable(tables.at(schema::TableName{table_name}), input_data); } -using silo::query_engine::ActionQuery; -using silo::query_engine::actions::Action; using silo::query_engine::filter::expressions::Expression; using silo::query_engine::filter::expressions::True; @@ -266,14 +255,17 @@ roaring::Roaring Database::getFilteredBitmap( const std::string& table_name, const std::string& filter ) { - const nlohmann::json filter_json = nlohmann::json::parse(filter); - std::unique_ptr filter_expression = filter_json; auto maybe_table = tables.find(schema::TableName{table_name}); if (maybe_table == tables.end()) { SPDLOG_ERROR("The database does not contain the table {}", table_name); return {}; } auto table = maybe_table->second; + + query_engine::saneql::Parser parser(filter); + auto ast = parser.parse(); + auto filter_expression = query_engine::saneql::convertToFilter(*ast); + auto rewritten_filter_expression = filter_expression->rewrite(*table, Expression::AmbiguityMode::NONE); auto filter_operator = rewritten_filter_expression->compile(*table); @@ -288,24 +280,19 @@ std::vector> Database::getPrevalentMutations( double prevalence_threshold, const std::string& filter ) const { - using SymbolMutations = silo::query_engine::actions::Mutations; - - const nlohmann::json filter_json = nlohmann::json::parse(filter); - std::unique_ptr filter_expression = filter_json; - - std::unique_ptr action = std::make_unique( - std::vector{sequence_name}, + constexpr std::string_view MUTATIONS_FUNCTION = + std::is_same_v ? "aminoAcidMutations" : "mutations"; + + auto query_string = fmt::format( + "{}.filter({}).{}(minProportion:={}, sequenceNames:={{{}}}, fields:={{count, mutation}})", + table_name, + filter, + MUTATIONS_FUNCTION, prevalence_threshold, - std::vector{ - SymbolMutations::COUNT_FIELD_NAME, SymbolMutations::MUTATION_FIELD_NAME - } + sequence_name ); - - auto action_query = ActionQuery(std::move(filter_expression), std::move(action)); - action_query.table_name = schema::TableName{table_name}; - auto query_node = query_engine::Binder::bindQuery(std::move(action_query), tables); - auto query_plan = query_engine::Planner::planQuery( - std::move(query_node), tables, config::QueryOptions{}, "getPrevalentMutations" + auto query_plan = query_engine::Planner::planSaneqlQuery( + query_string, tables, config::QueryOptions{}, "getPrevalentMutations" ); std::stringstream result_stream; query_engine::exec_node::NdjsonSink output_sink{&result_stream, query_plan.results_schema}; @@ -315,11 +302,10 @@ std::vector> Database::getPrevalentMutations( std::string json_line; while (result_stream >> json_line) { auto line = nlohmann::json::parse(json_line); - SILO_ASSERT(line.contains(SymbolMutations::COUNT_FIELD_NAME)); - const uint64_t count = line[SymbolMutations::COUNT_FIELD_NAME].template get(); - SILO_ASSERT(line.contains(SymbolMutations::MUTATION_FIELD_NAME)); - const std::string mutation = - line[SymbolMutations::MUTATION_FIELD_NAME].template get(); + SILO_ASSERT(line.contains("count")); + const uint64_t count = line["count"].template get(); + SILO_ASSERT(line.contains("mutation")); + const std::string mutation = line["mutation"].template get(); result.emplace_back(count, mutation); } return result; @@ -491,15 +477,9 @@ void Database::updateDataVersion() { SPDLOG_DEBUG("Data version was set to {}", data_version_.toString()); } -std::string Database::executeQueryAsArrowIpc( - const std::string& table_name, - const std::string& query_json -) const { - auto action_query = ActionQuery::parseQuery(query_json); - action_query.table_name = schema::TableName{table_name}; - auto query_node = query_engine::Binder::bindQuery(std::move(action_query), tables); - auto query_plan = query_engine::Planner::planQuery( - std::move(query_node), tables, config::QueryOptions{}, "executeQueryAsArrowIpc" +std::string Database::executeQueryAsArrowIpc(const std::string& query_string) const { + auto query_plan = query_engine::Planner::planSaneqlQuery( + query_string, tables, config::QueryOptions{}, "executeQueryAsArrowIpc" ); constexpr uint64_t DEFAULT_TIMEOUT_SECONDS = 120; diff --git a/src/silo/database.h b/src/silo/database.h index 13e32617d..c2a4a1996 100644 --- a/src/silo/database.h +++ b/src/silo/database.h @@ -6,7 +6,6 @@ #include "silo/common/silo_directory.h" #include "silo/config/runtime_config.h" #include "silo/database_info.h" -#include "silo/query_engine/action_query.h" #include "silo/query_engine/query_plan.h" #include "silo/schema/database_schema.h" #include "silo/storage/table.h" @@ -105,10 +104,7 @@ class Database { [[nodiscard]] virtual DataVersion::Timestamp getDataVersionTimestamp() const; - [[nodiscard]] std::string executeQueryAsArrowIpc( - const std::string& table_name, - const std::string& query_json - ) const; + [[nodiscard]] std::string executeQueryAsArrowIpc(const std::string& query_string) const; [[nodiscard]] std::string getTablesAsArrowIpc() const; diff --git a/src/silo/database.test.cpp b/src/silo/database.test.cpp index 5f8ba87f4..c62c7f780 100644 --- a/src/silo/database.test.cpp +++ b/src/silo/database.test.cpp @@ -12,14 +12,9 @@ #include "silo/config/preprocessing_config.h" #include "silo/database_info.h" #include "silo/initialize/initializer.h" -#include "silo/query_engine/exec_node/ndjson_sink.h" -#include "silo/query_engine/filter/expressions/true.h" -#include "silo/query_engine/operators/aggregate_node.h" -#include "silo/query_engine/operators/count_filter_node.h" -#include "silo/query_engine/operators/query_node.h" -#include "silo/query_engine/operators/table_scan_node.h" #include "silo/query_engine/planner.h" #include "silo/storage/reference_genomes.h" +#include "silo/test/query_fixture.test.h" using silo::config::PreprocessingConfig; @@ -129,9 +124,6 @@ TEST(DatabaseTest, shouldReturnCorrectDatabaseInfoAfterAppendingNewSequences) { } using silo::Nucleotide; -using silo::query_engine::exec_node::NdjsonSink; -using silo::query_engine::filter::expressions::True; -using silo::query_engine::operators::CountFilterNode; using silo::schema::ColumnIdentifier; using silo::schema::ColumnType; using silo::schema::TableSchema; @@ -163,41 +155,31 @@ TEST(DatabaseTest, canCreateMultipleTablesAndAddData) { second_table_name, std::make_shared(column_metadata, primary_key) ); - auto first_table = database.tables.at(first_table_name); - std::ifstream first_table_data{"testBaseData/example.ndjson"}; database.appendData(first_table_name, first_table_data); - auto aggregated_all_query_1 = - std::make_unique(first_table, std::make_unique()); - auto query_plan_1 = silo::query_engine::Planner::planQuery( - std::move(aggregated_all_query_1), + auto query_plan_1 = silo::query_engine::Planner::planSaneqlQuery( + "first.groupBy({count:=count()})", database.tables, silo::config::QueryOptions{}, "test_query_1" ); - std::stringstream result; - NdjsonSink output_sink_1{&result, query_plan_1.results_schema}; - query_plan_1.executeAndWrite(output_sink_1, 100); - ASSERT_EQ(result.str(), "{\"count\":20}\n"); + ASSERT_EQ( + silo::test::executeQueryToJsonArray(query_plan_1), nlohmann::json::array({{{"count", 20}}}) + ); std::stringstream second_table_data; second_table_data << R"({"key":"id_1","sequence":{"sequence":"AAAA","insertions":[],"offset":0}})"; database.appendData(second_table_name, second_table_data); - auto second_table = database.tables.at(second_table_name); - - auto aggregated_all_query_2 = - std::make_unique(second_table, std::make_unique()); - auto query_plan_2 = silo::query_engine::Planner::planQuery( - std::move(aggregated_all_query_2), + auto query_plan_2 = silo::query_engine::Planner::planSaneqlQuery( + "second.groupBy({count:=count()})", database.tables, silo::config::QueryOptions{}, "test_query_2" ); - std::stringstream result_2; - NdjsonSink output_sink_2{&result_2, query_plan_2.results_schema}; - query_plan_2.executeAndWrite(output_sink_2, 100); - ASSERT_EQ(result_2.str(), "{\"count\":1}\n"); + ASSERT_EQ( + silo::test::executeQueryToJsonArray(query_plan_2), nlohmann::json::array({{{"count", 1}}}) + ); } diff --git a/src/silo/preprocessing/preprocessing.test.cpp b/src/silo/preprocessing/preprocessing.test.cpp index b32f86467..9580bc54d 100644 --- a/src/silo/preprocessing/preprocessing.test.cpp +++ b/src/silo/preprocessing/preprocessing.test.cpp @@ -12,11 +12,9 @@ #include "silo/database.h" #include "silo/database_info.h" #include "silo/preprocessing/preprocessing_exception.h" -#include "silo/query_engine/action_query.h" -#include "silo/query_engine/binder.h" -#include "silo/query_engine/exec_node/ndjson_sink.h" #include "silo/query_engine/planner.h" #include "silo/query_engine/query_plan.h" +#include "silo/test/query_fixture.test.h" namespace { using silo::config::PreprocessingConfig; @@ -164,19 +162,8 @@ const Scenario NDJSON_FILE_WITH_MISSING_SEGMENTS_AND_GENES = { })", .assertion{ .expected_sequence_count = 2, - .query = R"( - { - "action": { - "type": "FastaAligned", - "sequenceNames": ["someShortGene", "secondSegment"], - "orderByFields": ["accessionVersion"], - "additionalFields": ["country"] - }, - "filterExpression": { - "type": "True" - } - } - )", + .query = "default.project({accessionVersion, someShortGene, secondSegment, " + "country}).orderBy({accessionVersion})", .expected_query_result = nlohmann::json::parse(R"( [{ "accessionVersion": "1.1", @@ -241,18 +228,7 @@ const Scenario NDJSON_WITH_SQL_KEYWORD_AS_FIELD = { })", .assertion{ .expected_sequence_count = 2, - .query = R"( -{ - "action": { - "type": "Aggregated", - "groupByFields": ["group"], - "orderByFields": ["group"] - }, - "filterExpression": { - "type": "True" - } -} -)", + .query = "default.groupBy({count:=count()},{group}).orderBy({group})", .expected_query_result = nlohmann::json::parse( R"([ {"count": 1, "group": null}, @@ -325,18 +301,7 @@ const Scenario NDJSON_WITH_NUMERIC_NAMES = { })", .assertion{ .expected_sequence_count = 2, - .query = R"( - { - "action": { - "type": "Aggregated", - "groupByFields": ["2"], - "orderByFields": ["2"] - }, - "filterExpression": { - "type": "True" - } - } - )", + .query = R"(default.groupBy({count:=count()},{"2"}).orderBy({"2"}))", .expected_query_result = nlohmann::json::parse( R"([ {"count": 1, "2": null}, @@ -388,16 +353,7 @@ const Scenario EMPTY_INPUT_NDJSON = { .lineage_trees = {{"test_lineage_definition.yaml", "main: ~\n"}}, .assertion{ .expected_sequence_count = 0, - .query = R"( - { - "action": { - "type": "Details" - }, - "filterExpression": { - "type": "True" - } - } - )", + .query = "default", .expected_query_result = nlohmann::json::parse(R"( [])") } @@ -442,16 +398,7 @@ const Scenario EMPTY_INPUT_NDJSON_UNPARTITIONED = { })", .assertion{ .expected_sequence_count = 0, - .query = R"( - { - "action": { - "type": "Details" - }, - "filterExpression": { - "type": "True" - } - } - )", + .query = "default", .expected_query_result = nlohmann::json::parse(R"( [])") } @@ -496,16 +443,7 @@ const Scenario NO_GENES = { })", .assertion{ .expected_sequence_count = 100, - .query = R"( - { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "True" - } - } - )", + .query = "default.groupBy({count:=count()})", .expected_query_result = nlohmann::json::parse(R"( [{"count":100}])") } @@ -549,16 +487,7 @@ const Scenario NO_NUCLEOTIDE_SEQUENCES = { })", .assertion{ .expected_sequence_count = 100, - .query = R"( - { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "True" - } - } - )", + .query = "default.groupBy({count:=count()})", .expected_query_result = nlohmann::json::parse(R"( [{"count":100}])") } @@ -597,16 +526,7 @@ const Scenario NO_SEQUENCES = { )", .assertion{ .expected_sequence_count = 100, - .query = R"( - { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "True" - } - } - )", + .query = "default.groupBy({count:=count()})", .expected_query_result = nlohmann::json::parse(R"( [{"count":100}])") } @@ -731,16 +651,7 @@ const Scenario DIVERSE_SEQUENCE_NAMES_NDJSON = { })", .assertion{ .expected_sequence_count = 2, - .query = R"( - { - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "True" - } - } - )", + .query = "default.groupBy({count:=count()})", .expected_query_result = nlohmann::json::parse(R"( [{"count":2}])") } @@ -778,17 +689,7 @@ const Scenario PREVENT_LATE_AUTO_CASTING = { })", .assertion{ .expected_sequence_count = 3, - .query = R"( - { - "action": { - "type": "Details", - "orderByFields": ["accessionVersion"] - }, - "filterExpression": { - "type": "True" - } - } - )", + .query = "default.orderBy({accessionVersion})", .expected_query_result = nlohmann::json::parse(R"( [{"accessionVersion":"0"},{"accessionVersion":"0.12"},{"accessionVersion":"text_without_quotes"}])") } @@ -818,11 +719,7 @@ const Scenario DATE_COLUMN_VALID_DATES = { .reference_genomes = R"({"nucleotideSequences": [], "genes": []})", .assertion{ .expected_sequence_count = 3, - .query = R"( - { - "action": {"type": "Details", "orderByFields": ["accessionVersion"]}, - "filterExpression": {"type": "True"} - })", + .query = "default.orderBy({accessionVersion})", .expected_query_result = nlohmann::json::parse(R"([ {"accessionVersion": "1", "theDate": "1969-12-31"}, {"accessionVersion": "2", "theDate": "2021-03-15"}, @@ -883,20 +780,8 @@ root_2: ~ - root_2)"}}, .assertion{ .expected_sequence_count = 3, - .query = R"( - { - "action": { - "type": "Details", - "orderByFields": ["accessionVersion"] - }, - "filterExpression": { - "type": "Lineage", - "column": "lineage_1", - "value": "root_1", - "includeSublineages": true - } - } - )", + .query = "default.filter(lineage_1.lineage('root_1', includeSublineages:=true))" + ".orderBy({accessionVersion})", .expected_query_result = nlohmann::json::parse(R"([ {"accessionVersion":"0","lineage_1":"root_1","lineage_2":"root_2"}, {"accessionVersion":"1","lineage_1":"child_1","lineage_2":null} @@ -923,8 +808,6 @@ const auto TEST_CASES = ::testing::Values( INSTANTIATE_TEST_SUITE_P(PreprocessorTest, PreprocessorTestFixture, TEST_CASES, printTestName); -using silo::query_engine::exec_node::NdjsonSink; - TEST_P(PreprocessorTestFixture, shouldProcessData) { const auto& scenario = GetParam(); @@ -937,25 +820,13 @@ TEST_P(PreprocessorTestFixture, shouldProcessData) { EXPECT_EQ(database_info.sequence_count, scenario.assertion.expected_sequence_count); - auto query = silo::query_engine::ActionQuery::parseQuery(scenario.assertion.query); - - auto bound_query = silo::query_engine::Binder::bindQuery(std::move(query), database->tables); - auto query_plan = silo::query_engine::Planner::planQuery( - std::move(bound_query), + auto query_plan = silo::query_engine::Planner::planSaneqlQuery( + scenario.assertion.query, database->tables, silo::config::RuntimeConfig::withDefaults().query_options, "some_id" ); - std::stringstream actual_result_stream; - NdjsonSink output_sink{&actual_result_stream, query_plan.results_schema}; - query_plan.executeAndWrite(output_sink, /*timeout_in_seconds=*/3); - nlohmann::json actual_ndjson_result_as_array = nlohmann::json::array(); - std::string line; - while (std::getline(actual_result_stream, line)) { - auto line_object = nlohmann::json::parse(line); - std::cout << line_object.dump() << '\n'; - actual_ndjson_result_as_array.push_back(line_object); - } + auto actual_ndjson_result_as_array = silo::test::executeQueryToJsonArray(query_plan); ASSERT_EQ(actual_ndjson_result_as_array, scenario.assertion.expected_query_result); diff --git a/src/silo/query_engine/action_query.cpp b/src/silo/query_engine/action_query.cpp deleted file mode 100644 index b60166ffd..000000000 --- a/src/silo/query_engine/action_query.cpp +++ /dev/null @@ -1,37 +0,0 @@ -#include "silo/query_engine/action_query.h" - -#include -#include - -#include -#include - -#include "silo/query_engine/actions/action.h" -#include "silo/query_engine/filter/expressions/expression.h" -#include "silo/query_engine/illegal_query_exception.h" - -using silo::query_engine::actions::Action; -using silo::query_engine::filter::expressions::Expression; - -namespace silo::query_engine { - -ActionQuery ActionQuery::parseQuery(const std::string& query_string) { - try { - nlohmann::json json = nlohmann::json::parse(query_string); - if (!json.contains("filterExpression") || !json["filterExpression"].is_object() || - !json.contains("action") || !json["action"].is_object()) { - throw IllegalQueryException("Query json must contain filterExpression and action."); - } - - auto filter = json["filterExpression"].get>(); - auto action = json["action"].get>(); - - return ActionQuery{std::move(filter), std::move(action)}; - } catch (const nlohmann::json::parse_error& ex) { - throw IllegalQueryException("The query was not a valid JSON: " + std::string(ex.what())); - } catch (const nlohmann::json::exception& ex) { - throw IllegalQueryException("The query was not a valid JSON: " + std::string(ex.what())); - } -} - -} // namespace silo::query_engine diff --git a/src/silo/query_engine/action_query.h b/src/silo/query_engine/action_query.h deleted file mode 100644 index e604056c2..000000000 --- a/src/silo/query_engine/action_query.h +++ /dev/null @@ -1,35 +0,0 @@ -#pragma once - -#include -#include - -#include "silo/config/runtime_config.h" -#include "silo/query_engine/actions/action.h" -#include "silo/query_engine/filter/expressions/expression.h" - -namespace silo::query_engine { - -struct ActionQuery { - schema::TableName table_name; - std::unique_ptr filter; - std::unique_ptr action; - - explicit ActionQuery( - schema::TableName table_name, - std::unique_ptr filter, - std::unique_ptr action - ) - : table_name(std::move(table_name)), - filter(std::move(filter)), - action(std::move(action)) {} - - explicit ActionQuery( - std::unique_ptr filter, - std::unique_ptr action - ) - : ActionQuery(schema::TableName::getDefault(), std::move(filter), std::move(action)) {} - - static ActionQuery parseQuery(const std::string& query_string); -}; - -} // namespace silo::query_engine diff --git a/src/silo/query_engine/actions/action.cpp b/src/silo/query_engine/actions/action.cpp deleted file mode 100644 index 3c1cd8de0..000000000 --- a/src/silo/query_engine/actions/action.cpp +++ /dev/null @@ -1,131 +0,0 @@ -#include "silo/query_engine/actions/action.h" - -#include -#include -#include -#include -#include - -#include -#include -#include - -#include "silo/common/aa_symbols.h" -#include "silo/common/nucleotide_symbols.h" -#include "silo/query_engine/actions/aggregated.h" -#include "silo/query_engine/actions/details.h" -#include "silo/query_engine/actions/fasta.h" -#include "silo/query_engine/actions/fasta_aligned.h" -#include "silo/query_engine/actions/insertions.h" -#include "silo/query_engine/actions/most_recent_common_ancestor.h" -#include "silo/query_engine/actions/mutations.h" -#include "silo/query_engine/actions/phylo_subtree.h" -#include "silo/query_engine/copy_on_write_bitmap.h" -#include "silo/query_engine/illegal_query_exception.h" - -namespace silo::query_engine::actions { - -Action::Action() = default; - -void Action::setOrdering( - const std::vector& order_by_fields_, - std::optional limit_, - std::optional offset_, - std::optional randomize_seed_ -) { - order_by_fields = order_by_fields_; - limit = limit_; - offset = offset_; - randomize_seed = randomize_seed_; -} - -std::optional parseLimit(const nlohmann::json& json) { - CHECK_SILO_QUERY( - !json.contains("limit") || - (json["limit"].is_number_unsigned() && json["limit"].get() > 0), - "If the action contains a limit, it must be a positive number" - ); - return json.contains("limit") ? std::optional(json["limit"].get()) - : std::nullopt; -} - -std::optional parseOffset(const nlohmann::json& json) { - CHECK_SILO_QUERY( - !json.contains("offset") || json["offset"].is_number_unsigned(), - "If the action contains an offset, it must be a non-negative number" - ); - return json.contains("offset") ? std::optional(json["offset"].get()) - : std::nullopt; -} - -std::optional parseRandomizeSeed(const nlohmann::json& json) { - if (!json.contains("randomize")) { - return std::nullopt; - } - if (json["randomize"].is_boolean()) { - if (json["randomize"].get()) { - const uint32_t time_based_seed = - std::chrono::system_clock::now().time_since_epoch().count(); - return time_based_seed; - } - return std::nullopt; - } - CHECK_SILO_QUERY( - json["randomize"].is_object() && json["randomize"].contains("seed") && - json["randomize"]["seed"].is_number_unsigned(), - "If the action contains 'randomize', it must be either a boolean or an object " - "containing an unsigned 'seed'" - ); - return json["randomize"]["seed"].get(); -} - -// NOLINTNEXTLINE(readability-identifier-naming,readability-function-cognitive-complexity) -void from_json(const nlohmann::json& json, std::unique_ptr& action) { - CHECK_SILO_QUERY(json.contains("type"), "The field 'type' is required in any action"); - CHECK_SILO_QUERY( - json["type"].is_string(), - "The field 'type' in all actions needs to be a string, but is: {}", - json["type"].dump() - ); - const std::string expression_type = json["type"]; - if (expression_type == "Aggregated") { - action = json.get>(); - } else if (expression_type == "MostRecentCommonAncestor") { - action = json.get>(); - } else if (expression_type == "PhyloSubtree") { - action = json.get>(); - } else if (expression_type == "Mutations") { - action = json.get>>(); - } else if (expression_type == "Details") { - action = json.get>(); - } else if (expression_type == "AminoAcidMutations") { - action = json.get>>(); - } else if (expression_type == "Fasta") { - action = json.get>(); - } else if (expression_type == "FastaAligned") { - action = json.get>(); - } else if (expression_type == "Insertions") { - action = json.get>>(); - } else if (expression_type == "AminoAcidInsertions") { - action = json.get>>(); - } else { - throw query_engine::IllegalQueryException("{} is not a valid action", expression_type); - } - - std::vector order_by_fields; - if (json.contains("orderByFields")) { - CHECK_SILO_QUERY(json["orderByFields"].is_array(), "orderByFields must be an array"); - order_by_fields = json["orderByFields"].get>(); - } - - CHECK_SILO_QUERY( - !json.contains("offset") || json["offset"].is_number_unsigned(), - "If the action contains an offset, it must be a non-negative number" - ); - auto limit = parseLimit(json); - auto offset = parseOffset(json); - auto randomize_seed = parseRandomizeSeed(json); - action->setOrdering(order_by_fields, limit, offset, randomize_seed); -} - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/action.h b/src/silo/query_engine/actions/action.h deleted file mode 100644 index f96e2e06d..000000000 --- a/src/silo/query_engine/actions/action.h +++ /dev/null @@ -1,64 +0,0 @@ -#pragma once - -#include -#include -#include -#include -#include - -#include -#include - -#include "silo/config/runtime_config.h" -#include "silo/query_engine/actions/order_by_field.h" -#include "silo/query_engine/copy_on_write_bitmap.h" -#include "silo/query_engine/query_plan.h" -#include "silo/schema/database_schema.h" -#include "silo/storage/table.h" - -namespace silo::query_engine::actions { - -class Action { - protected: - std::vector order_by_fields; - std::optional limit; - std::optional offset; - std::optional randomize_seed; - - public: - Action(); - virtual ~Action() = default; - - void setOrdering( - const std::vector& order_by_fields, - std::optional limit, - std::optional offset, - std::optional randomize_seed - ); - - [[nodiscard]] const std::vector& getOrderByFields() const { - return order_by_fields; - } - - [[nodiscard]] std::optional getLimit() const { return limit; } - - [[nodiscard]] std::optional getOffset() const { return offset; } - - [[nodiscard]] std::optional getRandomizeSeed() const { return randomize_seed; } -}; - -std::optional parseLimit(const nlohmann::json& json); - -std::optional parseOffset(const nlohmann::json& json); - -std::optional parseRandomizeSeed(const nlohmann::json& json); - -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& action); - -std::vector columnNamesToFields( - const std::vector& column_names, - const silo::schema::TableSchema& table_schema -); - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/aggregated.cpp b/src/silo/query_engine/actions/aggregated.cpp deleted file mode 100644 index f463d1d4f..000000000 --- a/src/silo/query_engine/actions/aggregated.cpp +++ /dev/null @@ -1,58 +0,0 @@ -#include "silo/query_engine/actions/aggregated.h" - -#include -#include -#include - -#include -#include -#include -#include - -#include "evobench/evobench.hpp" -#include "silo/query_engine/actions/action.h" -#include "silo/query_engine/copy_on_write_bitmap.h" -#include "silo/query_engine/exec_node/table_scan.h" -#include "silo/query_engine/illegal_query_exception.h" -#include "silo/query_engine/operators/query_node.h" -#include "silo/storage/table.h" - -namespace { - -const std::string GROUP_BY_FIELDS_FIELD_NAME = "groupByFields"; - -} // namespace - -namespace silo::query_engine::actions { - -Aggregated::Aggregated(std::vector group_by_fields_) { - group_by_fields.reserve(group_by_fields_.size()); - for (auto& field : group_by_fields_) { - group_by_fields.emplace_back(std::move(field)); - } -} - -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& action) { - std::vector group_by_fields; - if (json.contains(GROUP_BY_FIELDS_FIELD_NAME)) { - CHECK_SILO_QUERY( - json[GROUP_BY_FIELDS_FIELD_NAME].is_array(), - "{} must be an array", - GROUP_BY_FIELDS_FIELD_NAME - ); - for (const auto& element : json[GROUP_BY_FIELDS_FIELD_NAME]) { - CHECK_SILO_QUERY( - element.is_string(), - "{} is not a valid entry in {}. Expected type string, got {}", - element.dump(), - GROUP_BY_FIELDS_FIELD_NAME, - element.type_name() - ); - group_by_fields.emplace_back(element.get()); - } - } - action = std::make_unique(std::move(group_by_fields)); -} - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/aggregated.h b/src/silo/query_engine/actions/aggregated.h deleted file mode 100644 index ad1cfdfba..000000000 --- a/src/silo/query_engine/actions/aggregated.h +++ /dev/null @@ -1,29 +0,0 @@ -#pragma once - -#include -#include -#include - -#include - -#include "silo/query_engine/actions/action.h" -#include "silo/query_engine/copy_on_write_bitmap.h" -#include "silo/storage/table.h" - -namespace silo::query_engine::actions { - -struct GroupByField { - std::string name; -}; - -class Aggregated : public Action { - public: - std::vector group_by_fields; - - explicit Aggregated(std::vector group_by_fields); -}; - -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& action); - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/aggregated.test.cpp b/src/silo/query_engine/actions/aggregated.test.cpp deleted file mode 100644 index 4b9c57c89..000000000 --- a/src/silo/query_engine/actions/aggregated.test.cpp +++ /dev/null @@ -1,458 +0,0 @@ -#include -#include -#include - -#include "silo/test/query_fixture.test.h" - -namespace { -using silo::ReferenceGenomes; -using silo::test::QueryTestData; -using silo::test::QueryTestScenario; - -using boost::uuids::random_generator; - -nlohmann::json createData(const std::string& country, const std::string& date) { - static std::atomic_int row_id = 0; - const auto primary_key = row_id++; - std::string age = row_id % 2 == 0 ? "null" : fmt::format("{}", (3 * row_id) + 4); - float coverage = 0.9; - - return nlohmann::json::parse(fmt::format( - R"( -{{ - "primaryKey": "id_{}", - "country": "{}", - "age": {}, - "coverage": {}, - "date": "{}", - "segment1": {{ - "sequence": "ACGT", - "insertions": ["2:A"] - }}, - "gene1": {{ - "sequence": "V", - "insertions": [] - }} -}} -)", - primary_key, - country, - age, - coverage, - date - )); -} - -const auto DATABASE_CONFIG = - R"( -defaultNucleotideSequence: "segment1" -schema: - instanceName: "dummy name" - metadata: - - name: "primaryKey" - type: "string" - - name: "country" - type: "string" - generateIndex: true - - name: "age" - type: "int" - - name: "coverage" - type: "float" - - name: "date" - type: "date" - primaryKey: "primaryKey" -)"; - -const auto REFERENCE_GENOMES = ReferenceGenomes{ - {{"segment1", "ATGCN"}}, - {{"gene1", "M*"}}, -}; - -const QueryTestData TEST_DATA{ - .ndjson_input_data = - {createData("Switzerland", "2020-01-01"), - createData("Germany", "2000-03-07"), - createData("Germany", "2009-06-07"), - createData("Switzerland", "2003-07-02"), - createData("Switzerland", "2002-01-04"), - createData("Switzerland", "2001-12-07")}, - .database_config = DATABASE_CONFIG, - .reference_genomes = REFERENCE_GENOMES, - .without_unaligned_sequences = true -}; - -const QueryTestScenario COUNT_ALL = { - .name = "COUNT_ALL", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Aggregated" - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"count": 6}])" - ) -}; - -const QueryTestScenario AGGREGATE_ALL = { - .name = "AGGREGATE_ALL", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Aggregated", - "orderByFields": [ - "primaryKey" - ], - "groupByFields": [ - "age","country","coverage","date","primaryKey" - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"age":7,"count":1,"country":"Switzerland","coverage":0.9,"date":"2020-01-01","primaryKey":"id_0"}, -{"age":null,"count":1,"country":"Germany","coverage":0.9,"date":"2000-03-07","primaryKey":"id_1"}, -{"age":13,"count":1,"country":"Germany","coverage":0.9,"date":"2009-06-07","primaryKey":"id_2"}, -{"age":null,"count":1,"country":"Switzerland","coverage":0.9,"date":"2003-07-02","primaryKey":"id_3"}, -{"age":19,"count": 1,"country":"Switzerland","coverage":0.9,"date":"2002-01-04","primaryKey":"id_4"}, -{"age":null,"count":1,"country":"Switzerland","coverage":0.9,"date":"2001-12-07","primaryKey":"id_5"}])" - ) -}; - -const QueryTestScenario AGGREGATE_ALMOST_ALL = { - .name = "AGGREGATE_ALMOST_ALL", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Aggregated", - "orderByFields": [ - "age","date" - ], - "groupByFields": [ - "age","country","coverage","date" - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[ -{"age":null,"count":1,"country":"Germany","coverage":0.9,"date":"2000-03-07"}, -{"age":null,"count":1,"country":"Switzerland","coverage":0.9,"date":"2001-12-07"}, -{"age":null,"count":1,"country":"Switzerland","coverage":0.9,"date":"2003-07-02"}, -{"age":7,"count":1,"country":"Switzerland","coverage":0.9,"date":"2020-01-01"}, -{"age":13,"count":1,"country":"Germany","coverage":0.9,"date":"2009-06-07"}, -{"age":19,"count": 1,"country":"Switzerland","coverage":0.9,"date":"2002-01-04"} -])" - ) -}; - -const QueryTestScenario AGGREGATE_SOME = { - .name = "AGGREGATE_SOME", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Aggregated", - "orderByFields": [ - "age", {"field": "count", "order": "descending"} - ], - "groupByFields": [ - "age","country","coverage" - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"age":null,"count":2,"country":"Switzerland","coverage":0.9}, -{"age":null,"count":1,"country":"Germany","coverage":0.9}, -{"age":7,"count":1,"country":"Switzerland","coverage":0.9}, -{"age":13,"count":1,"country":"Germany","coverage":0.9}, -{"age":19,"count": 1,"country":"Switzerland","coverage":0.9}])" - ) -}; - -const QueryTestScenario AGGREGATED_LIMIT_OFFSET = { - .name = "LIMIT_OFFSET", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Aggregated", - "groupByFields": [ - "age", - "country", - "coverage", - "date", - "primaryKey" - ], - "orderByFields": [ - "primaryKey" - ], - "limit": 3, - "offset": 1 - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"age":null,"count":1,"country":"Germany","coverage":0.9,"date":"2000-03-07","primaryKey":"id_1"}, -{"age":13,"count":1,"country":"Germany","coverage":0.9,"date":"2009-06-07","primaryKey":"id_2"}, -{"age":null,"count":1,"country":"Switzerland","coverage":0.9,"date":"2003-07-02","primaryKey":"id_3"}])" - ) -}; - -const QueryTestScenario AGGREGATED_LIMIT_WITHOUT_ORDER = { - .name = "AGGREGATED_LIMIT_WITHOUT_ORDER", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Aggregated", - "groupByFields": ["primaryKey"], - "limit": 1 - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_error_message = - "Offset and limit can only be applied if the output of the operation has some ordering. " - "Implicit ordering such as in the case of Details/Fasta is also allowed, Aggregated " - "however produces unordered results." -}; - -const QueryTestScenario AGGREGATE_UNIQUE = { - .name = "AGGREGATE_UNIQUE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Aggregated", - "groupByFields": [ - "date" - ], - "orderByFields": [ - "date" - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"date":"2000-03-07","count":1}, -{"date":"2001-12-07","count":1}, -{"date":"2002-01-04","count":1}, -{"date":"2003-07-02","count":1}, -{"date":"2009-06-07","count":1}, -{"date":"2020-01-01","count":1}])" - ) -}; - -const QueryTestScenario AGGREGATE_ONE = { - .name = "AGGREGATE_ONE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Aggregated", - "groupByFields": [ - "country" - ], - "orderByFields": [ - {"field": "count", "order": "descending"}, "country" - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"count":4,"country":"Switzerland"}, -{"count":2,"country":"Germany"}])" - ) -}; - -const QueryTestScenario AGGREGATE_NULLABLE = { - .name = "AGGREGATE_NULLABLE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Aggregated", - "groupByFields": [ - "age" - ], - "orderByFields": [ - "count", {"field": "age", "order": "descending"} - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"age":19,"count":1}, -{"age":13,"count":1}, -{"age":7,"count":1}, -{"age":null,"count":3}])" - ) -}; - -const QueryTestScenario DUPLICATE_AGGREGATE = { - .name = "DUPLICATE_AGGREGATE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Aggregated", - "groupByFields": [ - "age", "age" - ], - "orderByFields": [ - "count", {"field": "age", "order": "descending"} - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"age":19,"count":1}, -{"age":13,"count":1}, -{"age":7,"count":1}, -{"age":null,"count":3}])" - ) -}; - -const QueryTestScenario INVALID_GROUP_BY_FIELD_OBJECT = { - .name = "INVALID_GROUP_BY_FIELD_OBJECT", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "groupByFields": [ - { - "field": "test_boolean_column", - "order": "ascending" - } - ], - "type": "Aggregated" - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_error_message = - "{\"field\":\"test_boolean_column\",\"order\":\"ascending\"} is not a valid entry in " - "groupByFields. Expected type string, got object" -}; - -const QueryTestScenario INVALID_GROUP_BY_FIELDS = { - .name = "INVALID_GROUP_BY_FIELDS", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "groupByFields": "test_boolean_column", - "type": "Aggregated" - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_error_message = "groupByFields must be an array" -}; - -const QueryTestScenario INVALID_ORDER_BY_FIELD_OBJECT = { - .name = "INVALID_ORDER_BY_FIELD_OBJECT", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "orderByFields": [1], - "type": "Aggregated" - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_error_message = - "The orderByField '1' must be either a string or an object containing the fields " - "'field':string and 'order':string, where the value of order is 'ascending' or 'descending'" -}; - -const QueryTestScenario INVALID_ORDER_BY_FIELDS = { - .name = "INVALID_ORDER_BY_FIELDS", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "orderByFields": "test_boolean_column", - "type": "Aggregated" - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_error_message = "orderByFields must be an array" -}; - -} // namespace - -QUERY_TEST( - Aggregated, - TEST_DATA, - ::testing::Values( - COUNT_ALL, - AGGREGATE_ALL, - AGGREGATE_ALMOST_ALL, - AGGREGATE_SOME, - AGGREGATED_LIMIT_OFFSET, - AGGREGATED_LIMIT_WITHOUT_ORDER, - AGGREGATE_UNIQUE, - AGGREGATE_ONE, - AGGREGATE_NULLABLE, - DUPLICATE_AGGREGATE, - INVALID_GROUP_BY_FIELD_OBJECT, - INVALID_GROUP_BY_FIELDS, - INVALID_ORDER_BY_FIELD_OBJECT, - INVALID_ORDER_BY_FIELDS - ) -); diff --git a/src/silo/query_engine/actions/details.cpp b/src/silo/query_engine/actions/details.cpp deleted file mode 100644 index 1c0e3365b..000000000 --- a/src/silo/query_engine/actions/details.cpp +++ /dev/null @@ -1,19 +0,0 @@ -#include "silo/query_engine/actions/details.h" - -#include - -#include "silo/query_engine/actions/action.h" -#include "silo/query_engine/illegal_query_exception.h" -#include "silo/query_engine/operators/query_node.h" - -namespace silo::query_engine::actions { -Details::Details(std::vector fields) - : fields(std::move(fields)) {} - -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr
& action) { - std::vector fields = json.value("fields", std::vector()); - action = std::make_unique
(std::move(fields)); -} - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/details.h b/src/silo/query_engine/actions/details.h deleted file mode 100644 index c6f78d9e8..000000000 --- a/src/silo/query_engine/actions/details.h +++ /dev/null @@ -1,24 +0,0 @@ -#pragma once - -#include -#include -#include - -#include "silo/query_engine/actions/action.h" - -#include -#include - -namespace silo::query_engine::actions { - -class Details : public Action { - public: - std::vector fields; - - explicit Details(std::vector fields); -}; - -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr
& action); - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/details.test.cpp b/src/silo/query_engine/actions/details.test.cpp deleted file mode 100644 index 30a34db05..000000000 --- a/src/silo/query_engine/actions/details.test.cpp +++ /dev/null @@ -1,450 +0,0 @@ -#include -#include -#include - -#include "silo/test/query_fixture.test.h" - -namespace { -using silo::ReferenceGenomes; -using silo::test::QueryTestScenario; - -using boost::uuids::random_generator; - -nlohmann::json createData(const std::string& country, const std::string& date) { - static std::atomic_int row_id = 0; - const auto primary_key = row_id++; - std::string age = row_id % 2 == 0 ? "null" : fmt::format("{}", (3 * row_id) + 4); - float coverage = 0.9; - - return nlohmann::json::parse(fmt::format( - R"( -{{ - "primaryKey": "id_{}", - "country": "{}", - "age": {}, - "coverage": {}, - "date": "{}", - "unaligned_segment1": "ACGT", - "segment1": {{ - "sequence": "ACGT", - "insertions": ["2:A"] - }}, - "gene1": {{ - "sequence": "V", - "insertions": [] - }} -}} -)", - primary_key, - country, - age, - coverage, - date - )); -} - -const auto DATABASE_CONFIG = - R"( -defaultNucleotideSequence: "segment1" -schema: - instanceName: "dummy name" - metadata: - - name: "primaryKey" - type: "string" - - name: "country" - type: "string" - generateIndex: true - - name: "age" - type: "int" - - name: "coverage" - type: "float" - - name: "date" - type: "date" - primaryKey: "primaryKey" -)"; - -const auto REFERENCE_GENOMES = ReferenceGenomes{ - {{"segment1", "ATGCN"}}, - {{"gene1", "M*"}}, -}; - -const silo::test::QueryTestData TEST_DATA{ - .ndjson_input_data = - {createData("Switzerland", "2020-01-01"), - createData("Germany", "2000-03-07"), - createData("Germany", "2009-06-07"), - createData("Switzerland", "2003-07-02"), - createData("Switzerland", "2002-01-04"), - createData("Switzerland", "2001-12-07")}, - .database_config = DATABASE_CONFIG, - .reference_genomes = REFERENCE_GENOMES -}; - -const QueryTestScenario ALL_DATA = { - .name = "ALL_DATA", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "orderByFields": [ - "primaryKey" - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"age":7,"country":"Switzerland","coverage":0.9,"date":"2020-01-01","primaryKey":"id_0"}, -{"age":null,"country":"Germany","coverage":0.9,"date":"2000-03-07","primaryKey":"id_1"}, -{"age":13,"country":"Germany","coverage":0.9,"date":"2009-06-07","primaryKey":"id_2"}, -{"age":null,"country":"Switzerland","coverage":0.9,"date":"2003-07-02","primaryKey":"id_3"}, -{"age":19,"country":"Switzerland","coverage":0.9,"date":"2002-01-04","primaryKey":"id_4"}, -{"age":null,"country":"Switzerland","coverage":0.9,"date":"2001-12-07","primaryKey":"id_5"}])" - ) -}; - -const QueryTestScenario LIMIT_OFFSET = { - .name = "LIMIT_OFFSET", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "orderByFields": [ - "primaryKey" - ], - "limit": 3, - "offset": 1 - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"age":null,"country":"Germany","coverage":0.9,"date":"2000-03-07","primaryKey":"id_1"}, -{"age":13,"country":"Germany","coverage":0.9,"date":"2009-06-07","primaryKey":"id_2"}, -{"age":null,"country":"Switzerland","coverage":0.9,"date":"2003-07-02","primaryKey":"id_3"}])" - ) -}; - -const QueryTestScenario ALL_DATES = { - .name = "ALL_DATES", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": [ - "date", "primaryKey" - ], - "orderByFields": [ - "primaryKey" - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"date":"2020-01-01","primaryKey":"id_0"}, -{"date":"2000-03-07","primaryKey":"id_1"}, -{"date":"2009-06-07","primaryKey":"id_2"}, -{"date":"2003-07-02","primaryKey":"id_3"}, -{"date":"2002-01-04","primaryKey":"id_4"}, -{"date":"2001-12-07","primaryKey":"id_5"}])" - ) -}; - -const QueryTestScenario ALL_DATES_AND_COUNTRIES = { - .name = "ALL_DATES_AND_COUNTRIES", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": [ - "date", "primaryKey", "country" - ], - "orderByFields": [ - {"field": "country", "order": "descending"}, "date" - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"country":"Switzerland","date":"2001-12-07","primaryKey":"id_5"}, -{"country":"Switzerland","date":"2002-01-04","primaryKey":"id_4"}, -{"country":"Switzerland","date":"2003-07-02","primaryKey":"id_3"}, -{"country":"Switzerland","date":"2020-01-01","primaryKey":"id_0"}, -{"country":"Germany","date":"2000-03-07","primaryKey":"id_1"}, -{"country":"Germany","date":"2009-06-07","primaryKey":"id_2"}])" - ) -}; - -const QueryTestScenario DUPLICATE_COUNTRY = { - .name = "DUPLICATE_COUNTRY", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": [ - "country", "country" - ], - "orderByFields": [ - {"field": "country", "order": "descending"} - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"country":"Switzerland"}, -{"country":"Switzerland"}, -{"country":"Switzerland"}, -{"country":"Switzerland"}, -{"country":"Germany"}, -{"country":"Germany"}])" - ) -}; - -const QueryTestScenario LIMIT = { - .name = "LIMIT", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "orderByFields": [ - "primaryKey" - ], - "limit": 3 - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"age":7,"country":"Switzerland","coverage":0.9,"date":"2020-01-01","primaryKey":"id_0"}, -{"age":null,"country":"Germany","coverage":0.9,"date":"2000-03-07","primaryKey":"id_1"}, -{"age":13,"country":"Germany","coverage":0.9,"date":"2009-06-07","primaryKey":"id_2"}])" - ) -}; - -const QueryTestScenario LIMIT_0 = { - .name = "LIMIT_0", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "orderByFields": [ - "primaryKey" - ], - "limit": 0 - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_error_message = "If the action contains a limit, it must be a positive number" -}; - -const QueryTestScenario LIMIT_LARGE = { - .name = "LIMIT_LARGE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "orderByFields": [ - "age", "primaryKey" - ], - "limit": 1000 - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"age":null,"country":"Germany","coverage":0.9,"date":"2000-03-07","primaryKey":"id_1"}, -{"age":null,"country":"Switzerland","coverage":0.9,"date":"2003-07-02","primaryKey":"id_3"}, -{"age":null,"country":"Switzerland","coverage":0.9,"date":"2001-12-07","primaryKey":"id_5"}, -{"age":7,"country":"Switzerland","coverage":0.9,"date":"2020-01-01","primaryKey":"id_0"}, -{"age":13,"country":"Germany","coverage":0.9,"date":"2009-06-07","primaryKey":"id_2"}, -{"age":19,"country":"Switzerland","coverage":0.9,"date":"2002-01-04","primaryKey":"id_4"}])" - ) -}; - -const QueryTestScenario SINGLE_FIELD_DESCENDING = { - .name = "SINGLE_FIELD_DESCENDING", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "orderByFields": [ - {"field": "age", "order": "descending"} - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"age":19,"country":"Switzerland","coverage":0.9,"date":"2002-01-04","primaryKey":"id_4"}, -{"age":13,"country":"Germany","coverage":0.9,"date":"2009-06-07","primaryKey":"id_2"}, -{"age":7,"country":"Switzerland","coverage":0.9,"date":"2020-01-01","primaryKey":"id_0"}, -{"age":null,"country":"Germany","coverage":0.9,"date":"2000-03-07","primaryKey":"id_1"}, -{"age":null,"country":"Switzerland","coverage":0.9,"date":"2003-07-02","primaryKey":"id_3"}, -{"age":null,"country":"Switzerland","coverage":0.9,"date":"2001-12-07","primaryKey":"id_5"}])" - ) -}; - -const QueryTestScenario MULTI_FIELD_SORT = { - .name = "MULTI_FIELD_SORT", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "orderByFields": [ - {"field": "country", "order": "descending"}, "age" - ], - "limit": 1000 - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"age":7,"country":"Switzerland","coverage":0.9,"date":"2020-01-01","primaryKey":"id_0"}, -{"age":19,"country":"Switzerland","coverage":0.9,"date":"2002-01-04","primaryKey":"id_4"}, -{"age":null,"country":"Switzerland","coverage":0.9,"date":"2003-07-02","primaryKey":"id_3"}, -{"age":null,"country":"Switzerland","coverage":0.9,"date":"2001-12-07","primaryKey":"id_5"}, -{"age":13,"country":"Germany","coverage":0.9,"date":"2009-06-07","primaryKey":"id_2"}, -{"age":null,"country":"Germany","coverage":0.9,"date":"2000-03-07","primaryKey":"id_1"}])" - ) -}; - -const QueryTestScenario OFFSET = { - .name = "OFFSET", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "orderByFields": [ - "primaryKey" - ], - "offset": 3 - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"age":null,"country":"Switzerland","coverage":0.9,"date":"2003-07-02","primaryKey":"id_3"}, -{"age":19,"country":"Switzerland","coverage":0.9,"date":"2002-01-04","primaryKey":"id_4"}, -{"age":null,"country":"Switzerland","coverage":0.9,"date":"2001-12-07","primaryKey":"id_5"}])" - ) -}; - -const QueryTestScenario OFFSET_0 = { - .name = "OFFSET_0", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "orderByFields": [ - "primaryKey" - ], - "offset": 0 - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"age":7,"country":"Switzerland","coverage":0.9,"date":"2020-01-01","primaryKey":"id_0"}, -{"age":null,"country":"Germany","coverage":0.9,"date":"2000-03-07","primaryKey":"id_1"}, -{"age":13,"country":"Germany","coverage":0.9,"date":"2009-06-07","primaryKey":"id_2"}, -{"age":null,"country":"Switzerland","coverage":0.9,"date":"2003-07-02","primaryKey":"id_3"}, -{"age":19,"country":"Switzerland","coverage":0.9,"date":"2002-01-04","primaryKey":"id_4"}, -{"age":null,"country":"Switzerland","coverage":0.9,"date":"2001-12-07","primaryKey":"id_5"}])" - ) -}; - -const QueryTestScenario OFFSET_LARGE = { - .name = "OFFSET_LARGE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "orderByFields": [ - "primaryKey" - ], - "offset": 123123 - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[])" - ) -}; - -} // namespace - -QUERY_TEST( - Details, - TEST_DATA, - ::testing::Values( - ALL_DATA, - LIMIT_OFFSET, - ALL_DATES, - ALL_DATES_AND_COUNTRIES, - DUPLICATE_COUNTRY, - LIMIT, - LIMIT_0, - LIMIT_LARGE, - SINGLE_FIELD_DESCENDING, - MULTI_FIELD_SORT, - OFFSET, - OFFSET_0, - OFFSET_LARGE - ) -); diff --git a/src/silo/query_engine/actions/fasta.cpp b/src/silo/query_engine/actions/fasta.cpp deleted file mode 100644 index 67a1d2736..000000000 --- a/src/silo/query_engine/actions/fasta.cpp +++ /dev/null @@ -1,61 +0,0 @@ -#include "silo/query_engine/actions/fasta.h" - -#include - -#include -#include -#include -#include - -#include "silo/query_engine/illegal_query_exception.h" -#include "silo/query_engine/operators/query_node.h" - -namespace silo::query_engine::actions { - -namespace { - -const std::string SEQUENCE_NAMES_FIELD_NAME = "sequenceNames"; -const std::string ADDITIONAL_FIELDS_FIELD_NAME = "additionalFields"; - -} // namespace - -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& action) { - CHECK_SILO_QUERY( - json.contains(SEQUENCE_NAMES_FIELD_NAME) && json[SEQUENCE_NAMES_FIELD_NAME].is_array(), - "The Fasta action requires a {} field, which must be an array of strings", - SEQUENCE_NAMES_FIELD_NAME - ); - std::vector sequence_names; - for (const auto& child : json[SEQUENCE_NAMES_FIELD_NAME]) { - CHECK_SILO_QUERY( - child.is_string(), - "The Fasta action requires a {} field, which must be an array of " - "strings; while parsing array encountered the element {} which is not of type string", - SEQUENCE_NAMES_FIELD_NAME, - child.dump() - ); - sequence_names.emplace_back(child.get()); - } - std::vector additional_fields; - if (json.contains(ADDITIONAL_FIELDS_FIELD_NAME)) { - CHECK_SILO_QUERY( - json[ADDITIONAL_FIELDS_FIELD_NAME].is_array(), - "The field `{}` in a Fasta action must be an array of strings.", - ADDITIONAL_FIELDS_FIELD_NAME - ); - for (const auto& child : json[ADDITIONAL_FIELDS_FIELD_NAME]) { - CHECK_SILO_QUERY( - child.is_string(), - "The field `{}` in a Fasta action must be an array of strings. " - "Encountered non-string element: {}", - ADDITIONAL_FIELDS_FIELD_NAME, - child.dump() - ); - additional_fields.emplace_back(child.get()); - } - } - action = std::make_unique(std::move(sequence_names), std::move(additional_fields)); -} - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/fasta.h b/src/silo/query_engine/actions/fasta.h deleted file mode 100644 index a208b10c8..000000000 --- a/src/silo/query_engine/actions/fasta.h +++ /dev/null @@ -1,30 +0,0 @@ -#pragma once - -#include -#include - -#include - -#include "silo/query_engine/actions/action.h" - -#include "silo/schema/database_schema.h" - -namespace silo::query_engine::actions { - -class Fasta : public Action { - public: - std::vector sequence_names; - std::vector additional_fields; - - explicit Fasta( - std::vector&& sequence_names, - std::vector&& additional_fields - ) - : sequence_names(std::move(sequence_names)), - additional_fields(std::move(additional_fields)) {} -}; - -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& action); - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/fasta_aligned.cpp b/src/silo/query_engine/actions/fasta_aligned.cpp deleted file mode 100644 index 6accb289d..000000000 --- a/src/silo/query_engine/actions/fasta_aligned.cpp +++ /dev/null @@ -1,69 +0,0 @@ -#include "silo/query_engine/actions/fasta_aligned.h" - -#include - -#include -#include -#include -#include - -#include "silo/query_engine/actions/action.h" -#include "silo/query_engine/illegal_query_exception.h" -#include "silo/query_engine/operators/query_node.h" - -namespace silo::query_engine::actions { - -FastaAligned::FastaAligned( - std::vector&& sequence_names, - std::vector&& additional_fields -) - : sequence_names(sequence_names), - additional_fields(additional_fields) {} - -namespace { - -const std::string SEQUENCE_NAMES_FIELD_NAME = "sequenceNames"; -const std::string ADDITIONAL_FIELDS_FIELD_NAME = "additionalFields"; - -} // namespace - -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& action) { - CHECK_SILO_QUERY( - json.contains(SEQUENCE_NAMES_FIELD_NAME) && json[SEQUENCE_NAMES_FIELD_NAME].is_array(), - "The FastaAligned action requires a {} field, which must be an array of strings", - SEQUENCE_NAMES_FIELD_NAME - ); - std::vector sequence_names; - for (const auto& child : json[SEQUENCE_NAMES_FIELD_NAME]) { - CHECK_SILO_QUERY( - child.is_string(), - "The FastaAligned action requires a {} field, which must be an array of " - "strings; while parsing array encountered the element {} which is not of type string", - SEQUENCE_NAMES_FIELD_NAME, - child.dump() - ); - sequence_names.emplace_back(child.get()); - } - std::vector additional_fields; - if (json.contains(ADDITIONAL_FIELDS_FIELD_NAME)) { - CHECK_SILO_QUERY( - json[ADDITIONAL_FIELDS_FIELD_NAME].is_array(), - "The field `{}` in a FastaAligned action must be an array of strings.", - ADDITIONAL_FIELDS_FIELD_NAME - ); - for (const auto& child : json[ADDITIONAL_FIELDS_FIELD_NAME]) { - CHECK_SILO_QUERY( - child.is_string(), - "The field `{}` in a FastaAligned action must be an array of strings. " - "Encountered non-string element: {}", - ADDITIONAL_FIELDS_FIELD_NAME, - child.dump() - ); - additional_fields.emplace_back(child.get()); - } - } - action = std::make_unique(std::move(sequence_names), std::move(additional_fields)); -} - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/fasta_aligned.h b/src/silo/query_engine/actions/fasta_aligned.h deleted file mode 100644 index af9e1ba08..000000000 --- a/src/silo/query_engine/actions/fasta_aligned.h +++ /dev/null @@ -1,27 +0,0 @@ -#pragma once - -#include -#include -#include - -#include - -#include "silo/query_engine/actions/action.h" - -namespace silo::query_engine::actions { - -class FastaAligned : public Action { - public: - std::vector sequence_names; - std::vector additional_fields; - - explicit FastaAligned( - std::vector&& sequence_names, - std::vector&& additional_fields - ); -}; - -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& action); - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/fasta_aligned.test.cpp b/src/silo/query_engine/actions/fasta_aligned.test.cpp deleted file mode 100644 index 57cb9b115..000000000 --- a/src/silo/query_engine/actions/fasta_aligned.test.cpp +++ /dev/null @@ -1,468 +0,0 @@ -#include -#include -#include - -#include "silo/test/query_fixture.test.h" - -namespace { -using silo::ReferenceGenomes; -using silo::test::QueryTestData; -using silo::test::QueryTestScenario; - -using boost::uuids::random_generator; - -nlohmann::json createDataWithNucleotideSequence(const std::string& nucleotideSequence) { - static std::atomic_int row_id = 0; - const auto primary_key = row_id++; - - return nlohmann::json::parse(fmt::format( - R"( -{{ - - "primaryKey": "id_{}", - "country": "Switzerland", - "segment1": {{ - "sequence": "{}", - "insertions": [] - }}, - "unaligned_segment1": null, - "gene1": null -}} -)", - primary_key, - nucleotideSequence - )); -} - -const nlohmann::json DATA_SAME_AS_REFERENCE = createDataWithNucleotideSequence("ATGCN"); -const nlohmann::json DATA_SAME_AS_REFERENCE2 = createDataWithNucleotideSequence("ATGCN"); -const nlohmann::json DATA_WITH_ALL_N = createDataWithNucleotideSequence("NNNNN"); -const nlohmann::json DATA_WITH_ALL_MUTATED = createDataWithNucleotideSequence("CATTT"); - -const auto DATABASE_CONFIG = - R"( -defaultNucleotideSequence: "segment1" -schema: - instanceName: "dummy name" - metadata: - - name: "primaryKey" - type: "string" - - name: "country" - type: "string" - primaryKey: "primaryKey" -)"; - -const auto REFERENCE_GENOMES = ReferenceGenomes{ - {{"segment1", "ATGCN"}}, - {{"gene1", "M*"}}, -}; - -const QueryTestData TEST_DATA{ - .ndjson_input_data = - {DATA_SAME_AS_REFERENCE, DATA_SAME_AS_REFERENCE2, DATA_WITH_ALL_N, DATA_WITH_ALL_MUTATED}, - .database_config = DATABASE_CONFIG, - .reference_genomes = REFERENCE_GENOMES -}; - -const QueryTestScenario FASTA_ALIGNED = { - .name = "FASTA_ALIGNED", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "FastaAligned", - "sequenceNames": [ - "segment1" - ], - "orderByFields": [ - "primaryKey" - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"primaryKey":"id_0","segment1":"ATGCN"}, -{"primaryKey":"id_1","segment1":"ATGCN"}, -{"primaryKey":"id_2","segment1":"NNNNN"}, -{"primaryKey":"id_3","segment1":"CATTT"}])" - ) -}; - -const QueryTestScenario FASTA_ALIGNED_ADDITIONAL_HEADER = { - .name = "FASTA_ALIGNED_ADDITIONAL_HEADER", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "FastaAligned", - "sequenceNames": [ - "segment1" - ], - "orderByFields": [ - "primaryKey" - ], - "additionalFields": [ - "country" - ] - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"country":"Switzerland","primaryKey":"id_0","segment1":"ATGCN"}, -{"country":"Switzerland","primaryKey":"id_1","segment1":"ATGCN"}, -{"country":"Switzerland","primaryKey":"id_2","segment1":"NNNNN"}, -{"country":"Switzerland","primaryKey":"id_3","segment1":"CATTT"}])" - ) -}; - -const QueryTestScenario DUPLICATE_FIELDS = { - .name = "DUPLICATE_FIELDS", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "FastaAligned", - "sequenceNames": [ - "segment1", - "segment1" - ], - "orderByFields": [ - "primaryKey" - ], - "additionalFields": [ - "country", - "country" - ] - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"country":"Switzerland","primaryKey":"id_0","segment1":"ATGCN"}, -{"country":"Switzerland","primaryKey":"id_1","segment1":"ATGCN"}, -{"country":"Switzerland","primaryKey":"id_2","segment1":"NNNNN"}, -{"country":"Switzerland","primaryKey":"id_3","segment1":"CATTT"}])" - ) -}; - -const QueryTestScenario FASTA_ALIGNED_EXPLICIT_PRIMARY_KEY = { - .name = "FASTA_ALIGNED_EXPLICIT_PRIMARY_KEY", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "FastaAligned", - "sequenceNames": [ - "segment1" - ], - "additionalFields": [ - "primaryKey" - ] - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"primaryKey":"id_0","segment1":"ATGCN"}, -{"primaryKey":"id_1","segment1":"ATGCN"}, -{"primaryKey":"id_2","segment1":"NNNNN"}, -{"primaryKey":"id_3","segment1":"CATTT"}])" - ) -}; - -const QueryTestScenario FASTA_ALIGNED_DUPLICATE_HEADER = { - .name = "FASTA_ALIGNED_DUPLICATE_HEADER", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "FastaAligned", - "sequenceNames": [ - "segment1" - ], - "additionalFields": [ - "country", - "primaryKey", - "country" - ] - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"country":"Switzerland","primaryKey":"id_0","segment1":"ATGCN"}, -{"country":"Switzerland","primaryKey":"id_1","segment1":"ATGCN"}, -{"country":"Switzerland","primaryKey":"id_2","segment1":"NNNNN"}, -{"country":"Switzerland","primaryKey":"id_3","segment1":"CATTT"}])" - ) -}; - -const QueryTestScenario FASTA_ALIGNED_DESCENDING = { - .name = "FASTA_ALIGNED_DESCENDING", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "FastaAligned", - "sequenceNames": [ - "segment1" - ], - "orderByFields": [ - { - "field": "primaryKey", - "order": "descending" - } - ], - "additionalFields": [ - "country" - ] - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"country":"Switzerland","primaryKey":"id_3","segment1":"CATTT"}, -{"country":"Switzerland","primaryKey":"id_2","segment1":"NNNNN"}, -{"country":"Switzerland","primaryKey":"id_1","segment1":"ATGCN"}, -{"country":"Switzerland","primaryKey":"id_0","segment1":"ATGCN"}])" - ) -}; - -const QueryTestScenario FASTA_ALIGNED_SUBSET = { - .name = "FASTA_ALIGNED_SUBSET", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "FastaAligned", - "sequenceNames": [ - "segment1" - ] - }, - "filterExpression": { - "type": "Or", - "children": [ - { - "type": "StringEquals", - "column": "primaryKey", - "value": "id_0" - }, - { - "type": "StringEquals", - "column": "primaryKey", - "value": "id_2" - }, - { - "type": "StringEquals", - "column": "primaryKey", - "value": "id_3" - } - ] - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"primaryKey":"id_0","segment1":"ATGCN"}, -{"primaryKey":"id_2","segment1":"NNNNN"}, -{"primaryKey":"id_3","segment1":"CATTT"}])" - ) -}; - -const QueryTestScenario FASTA_ALIGNED_SMALL_BATCHES = { - .name = "FASTA_ALIGNED_SMALL_BATCHES", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "FastaAligned", - "sequenceNames": [ - "segment1" - ], - "additionalFields": [ - "country" - ], - "orderByFields": [ - "country","primaryKey" - ] - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"country":"Switzerland","primaryKey":"id_0","segment1":"ATGCN"}, -{"country":"Switzerland","primaryKey":"id_1","segment1":"ATGCN"}, -{"country":"Switzerland","primaryKey":"id_2","segment1":"NNNNN"}, -{"country":"Switzerland","primaryKey":"id_3","segment1":"CATTT"}])" - ), - .query_options = silo::config::QueryOptions{.materialization_cutoff = 0} -}; - -const QueryTestScenario FASTA_ALIGNED_WITH_OFFSET = { - .name = "FASTA_ALIGNED_WITH_OFFSET", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "FastaAligned", - "sequenceNames": [ - "segment1" - ], - "orderByFields": [ - "primaryKey" - ], - "offset": 2 - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"primaryKey":"id_2","segment1":"NNNNN"}, -{"primaryKey":"id_3","segment1":"CATTT"}])" - ), - .query_options = silo::config::QueryOptions{.materialization_cutoff = 1} -}; - -const QueryTestScenario FASTA_ALIGNED_WITH_LIMIT = { - .name = "FASTA_ALIGNED_WITH_LIMIT", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "FastaAligned", - "sequenceNames": [ - "segment1" - ], - "orderByFields": [ - { - "field": "primaryKey", - "order": "descending" - } - ], - "limit": 3 - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"primaryKey":"id_3","segment1":"CATTT"}, -{"primaryKey":"id_2","segment1":"NNNNN"}, -{"primaryKey":"id_1","segment1":"ATGCN"}])" - ), - .query_options = silo::config::QueryOptions{.materialization_cutoff = 1} -}; - -const QueryTestScenario FASTA_ALIGNED_WITH_LIMIT_UNSORTED = { - .name = "FASTA_ALIGNED_WITH_LIMIT_UNSORTED", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "FastaAligned", - "sequenceNames": [ - "segment1" - ], - "limit": 3 - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"primaryKey":"id_0","segment1":"ATGCN"}, -{"primaryKey":"id_1","segment1":"ATGCN"}, -{"primaryKey":"id_2","segment1":"NNNNN"}])" - ), - .query_options = silo::config::QueryOptions{.materialization_cutoff = 3} -}; - -const QueryTestScenario FASTA_ALIGNED_WITH_OFFSET_AND_LIMIT = { - .name = "FASTA_ALIGNED_WITH_OFFSET_AND_LIMIT", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "FastaAligned", - "sequenceNames": [ - "segment1" - ], - "orderByFields": [ - "primaryKey" - ], - "offset": 2, - "limit": 1 - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"primaryKey":"id_2","segment1":"NNNNN"}])" - ), - .query_options = silo::config::QueryOptions{.materialization_cutoff = 1} -}; - -} // namespace - -QUERY_TEST( - FastaAligned, - TEST_DATA, - ::testing::Values( - FASTA_ALIGNED, - FASTA_ALIGNED_ADDITIONAL_HEADER, - FASTA_ALIGNED_DUPLICATE_HEADER, - DUPLICATE_FIELDS, - FASTA_ALIGNED_EXPLICIT_PRIMARY_KEY, - FASTA_ALIGNED_DESCENDING, - FASTA_ALIGNED_SUBSET, - FASTA_ALIGNED_SMALL_BATCHES, - FASTA_ALIGNED_WITH_LIMIT, - FASTA_ALIGNED_WITH_LIMIT_UNSORTED, - FASTA_ALIGNED_WITH_OFFSET, - FASTA_ALIGNED_WITH_OFFSET_AND_LIMIT - ) -); diff --git a/src/silo/query_engine/actions/insertions.cpp b/src/silo/query_engine/actions/insertions.cpp deleted file mode 100644 index 38bf399f6..000000000 --- a/src/silo/query_engine/actions/insertions.cpp +++ /dev/null @@ -1,82 +0,0 @@ -#include "silo/query_engine/actions/insertions.h" - -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#include - -#include "evobench/evobench.hpp" -#include "silo/common/aa_symbols.h" -#include "silo/common/nucleotide_symbols.h" -#include "silo/query_engine/actions/action.h" -#include "silo/query_engine/copy_on_write_bitmap.h" -#include "silo/query_engine/exec_node/arrow_util.h" -#include "silo/query_engine/illegal_query_exception.h" -#include "silo/query_engine/operators/query_node.h" -#include "silo/storage/column/insertion_index.h" - -namespace silo::query_engine::actions { - -template -InsertionAggregation::InsertionAggregation(std::vector&& sequence_names) - : sequence_names(std::move(sequence_names)) {} - -namespace { - -const std::string SEQUENCE_NAMES_FIELD_NAME = "sequenceNames"; - -} // namespace - -template -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json( - const nlohmann::json& json, - std::unique_ptr>& action -) { - std::vector sequence_names; - if (json.contains(SEQUENCE_NAMES_FIELD_NAME)) { - CHECK_SILO_QUERY( - json[SEQUENCE_NAMES_FIELD_NAME].is_array(), - "The field '{}' of the insertions action must be of type string or array, was {}", - SEQUENCE_NAMES_FIELD_NAME, - std::string(json[SEQUENCE_NAMES_FIELD_NAME].type_name()) - ); - for (const auto& child : json[SEQUENCE_NAMES_FIELD_NAME]) { - CHECK_SILO_QUERY( - child.is_string(), - "The field {} of the Insertions action must have type string or an " - "array, if present. Found: {}", - SEQUENCE_NAMES_FIELD_NAME, - child.dump() - ); - sequence_names.emplace_back(child.get()); - } - } - - action = std::make_unique>(std::move(sequence_names)); -} - -// NOLINTNEXTLINE(readability-identifier-naming) -template void from_json( - const nlohmann::json& json, - std::unique_ptr>& action -); - -// NOLINTNEXTLINE(readability-identifier-naming) -template void from_json( - const nlohmann::json& json, - std::unique_ptr>& action -); - -template class InsertionAggregation; -template class InsertionAggregation; - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/insertions.h b/src/silo/query_engine/actions/insertions.h deleted file mode 100644 index 65fffd48e..000000000 --- a/src/silo/query_engine/actions/insertions.h +++ /dev/null @@ -1,42 +0,0 @@ -#pragma once - -#include -#include -#include -#include -#include -#include - -#include -#include -#include - -#include "silo/query_engine/actions/action.h" -#include "silo/query_engine/copy_on_write_bitmap.h" -#include "silo/query_engine/exec_node/json_value_type_array_builder.h" -#include "silo/storage/table.h" - -namespace silo::query_engine::actions { - -template -class InsertionAggregation : public Action { - static constexpr std::string_view POSITION_FIELD_NAME = "position"; - static constexpr std::string_view INSERTED_SYMBOLS_FIELD_NAME = "insertedSymbols"; - static constexpr std::string_view INSERTION_FIELD_NAME = "insertion"; - static constexpr std::string_view SEQUENCE_FIELD_NAME = "sequenceName"; - static constexpr std::string_view COUNT_FIELD_NAME = "count"; - - public: - std::vector sequence_names; - - explicit InsertionAggregation(std::vector&& sequence_names); -}; - -template -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json( - const nlohmann::json& json, - std::unique_ptr>& action -); - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/insertions.test.cpp b/src/silo/query_engine/actions/insertions.test.cpp deleted file mode 100644 index 9a1d54e62..000000000 --- a/src/silo/query_engine/actions/insertions.test.cpp +++ /dev/null @@ -1,215 +0,0 @@ -#include -#include -#include -#include -#include - -#include "silo/test/query_fixture.test.h" - -namespace { -using silo::ReferenceGenomes; -using silo::test::QueryTestData; -using silo::test::QueryTestScenario; - -using boost::uuids::random_generator; - -nlohmann::json createData( - std::vector insertions, - std::vector aa_insertions -) { - static std::atomic_int row_id = 0; - const auto primary_key = row_id++; - - std::string country = row_id % 3 == 0 ? "Germany" : "Switzerland"; - - for (auto& insertion : insertions) { - insertion = fmt::format("\"{}\"", insertion); - } - for (auto& insertion : aa_insertions) { - insertion = fmt::format("\"{}\"", insertion); - } - - return nlohmann::json::parse(fmt::format( - R"( -{{ - "primaryKey": "id_{}", - "country": "{}", - "unaligned_segment1": null, - "segment1": {{ - "sequence": "", - "insertions": [{}] - }}, - "gene1": {{ - "sequence": "", - "insertions": [{}] - }} -}} -)", - primary_key, - country, - fmt::join(insertions, ", "), - fmt::join(aa_insertions, ", ") - )); -} - -const nlohmann::json DATA_1 = createData({"4:ATGCN"}, {"1:AY"}); -const nlohmann::json DATA_2 = createData({"4:ATGCN"}, {"1:AY"}); -const nlohmann::json DATA_3 = createData({"4:NNNNNNNN"}, {"1:XXX"}); -const nlohmann::json DATA_4 = createData({"1:CCC"}, {"1:A"}); -const nlohmann::json DATA_5 = createData({"4:ATGCN"}, {"1:AY"}); - -const auto DATABASE_CONFIG = - R"( -defaultNucleotideSequence: "segment1" -schema: - instanceName: "dummy name" - metadata: - - name: "primaryKey" - type: "string" - - name: "country" - type: "string" - primaryKey: "primaryKey" -)"; - -const auto REFERENCE_GENOMES = ReferenceGenomes{ - {{"segment1", "ATGCN"}}, - {{"gene1", "M*"}}, -}; - -const QueryTestData TEST_DATA{ - .ndjson_input_data = {DATA_1, DATA_2, DATA_3, DATA_4, DATA_5}, - .database_config = DATABASE_CONFIG, - .reference_genomes = REFERENCE_GENOMES -}; - -const QueryTestScenario INSERTIONS = { - .name = "INSERTIONS", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Insertions", - "sequenceNames": [ - "segment1" - ], - "orderByFields": [ - "insertion" - ] - }, - "filterExpression": { - "type": "True" - } -})" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"count":1,"insertedSymbols":"CCC","insertion":"ins_1:CCC","position":1,"sequenceName":"segment1"}, -{"count":3,"insertedSymbols":"ATGCN","insertion":"ins_4:ATGCN","position":4,"sequenceName":"segment1"}, -{"count":1,"insertedSymbols":"NNNNNNNN","insertion":"ins_4:NNNNNNNN","position":4,"sequenceName":"segment1"}])" - ) -}; - -const QueryTestScenario INSERTIONS_NO_SEQUENCE_NAMES = { - .name = "INSERTIONS_NO_SEQUENCE_NAMES", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Insertions", - "orderByFields": ["insertion"] - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"count":1,"insertedSymbols":"CCC","insertion":"ins_1:CCC","position":1,"sequenceName":"segment1"}, -{"count":3,"insertedSymbols":"ATGCN","insertion":"ins_4:ATGCN","position":4,"sequenceName":"segment1"}, -{"count":1,"insertedSymbols":"NNNNNNNN","insertion":"ins_4:NNNNNNNN","position":4,"sequenceName":"segment1"}])" - ) -}; - -const QueryTestScenario INSERTIONS_SEQUENCE_NAME_NOT_IN_DATABASE = { - .name = "INSERTIONS_SEQUENCE_NAME_NOT_IN_DATABASE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Insertions", - "sequenceNames": [ - "not_in_database" - ] - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_error_message = - "The database does not contain the Nucleotide sequence 'not_in_database'" -}; - -const QueryTestScenario AA_INSERTIONS_ALL = { - .name = "AA_INSERTIONS_ALL", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "AminoAcidInsertions", - "orderByFields": ["insertion"] - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"count":1,"insertedSymbols":"A","insertion":"ins_1:A","position":1,"sequenceName":"gene1"}, -{"count":3,"insertedSymbols":"AY","insertion":"ins_1:AY","position":1,"sequenceName":"gene1"}, -{"count":1,"insertedSymbols":"XXX","insertion":"ins_1:XXX","position":1,"sequenceName":"gene1"}])" - ) -}; - -const QueryTestScenario AA_INSERTIONS_SUBSET = { - .name = "AA_INSERTIONS_SUBSET", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "AminoAcidInsertions", - "orderByFields": ["insertion"] - }, - "filterExpression": { - "type": "StringEquals", - "column": "country", - "value": "Switzerland" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"count":1,"insertedSymbols":"A","insertion":"ins_1:A","position":1,"sequenceName":"gene1"}, -{"count":3,"insertedSymbols":"AY","insertion":"ins_1:AY","position":1,"sequenceName":"gene1"}])" - ) -}; - -} // namespace - -QUERY_TEST( - Insertions, - TEST_DATA, - ::testing::Values( - INSERTIONS, - INSERTIONS_NO_SEQUENCE_NAMES, - INSERTIONS_SEQUENCE_NAME_NOT_IN_DATABASE, - AA_INSERTIONS_ALL, - AA_INSERTIONS_SUBSET - ) -); diff --git a/src/silo/query_engine/actions/most_recent_common_ancestor.cpp b/src/silo/query_engine/actions/most_recent_common_ancestor.cpp deleted file mode 100644 index e87af2304..000000000 --- a/src/silo/query_engine/actions/most_recent_common_ancestor.cpp +++ /dev/null @@ -1,50 +0,0 @@ -#include "silo/query_engine/actions/most_recent_common_ancestor.h" - -#include -#include -#include -#include - -#include -#include -#include -#include - -#include "silo/common/phylo_tree.h" -#include "silo/query_engine/exec_node/json_value_type_array_builder.h" -#include "silo/query_engine/illegal_query_exception.h" -#include "silo/query_engine/operators/query_node.h" -#include "silo/schema/database_schema.h" - -namespace silo::query_engine::actions { - -MostRecentCommonAncestor::MostRecentCommonAncestor( - std::string column_name, - bool print_nodes_not_in_tree -) - : column_name(std::move(column_name)), - print_nodes_not_in_tree(print_nodes_not_in_tree) {} - -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& action) { - CHECK_SILO_QUERY( - json.contains("columnName"), - "error: 'columnName' field is required in MostRecentCommonAncestor action" - ); - CHECK_SILO_QUERY( - json["columnName"].is_string(), - "error: 'columnName' field in MostRecentCommonAncestor action must be a string" - ); - if (json.contains("printNodesNotInTree")) { - CHECK_SILO_QUERY( - json["printNodesNotInTree"].is_boolean(), - "error: 'printNodesNotInTree' field in MostRecentCommonAncestor action must be a boolean" - ); - } - const bool print_nodes_not_in_tree = json.value("printNodesNotInTree", false); - const std::string column_name = json["columnName"].get(); - - action = std::make_unique(column_name, print_nodes_not_in_tree); -} - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/most_recent_common_ancestor.h b/src/silo/query_engine/actions/most_recent_common_ancestor.h deleted file mode 100644 index 525734f66..000000000 --- a/src/silo/query_engine/actions/most_recent_common_ancestor.h +++ /dev/null @@ -1,26 +0,0 @@ -#pragma once - -#include -#include -#include - -#include - -#include - -#include "silo/query_engine/actions/action.h" - -namespace silo::query_engine::actions { - -class MostRecentCommonAncestor : public Action { - public: - std::string column_name; - bool print_nodes_not_in_tree; - - MostRecentCommonAncestor(std::string column_name, bool print_nodes_not_in_tree); -}; - -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& action); - -} // namespace silo::query_engine::actions \ No newline at end of file diff --git a/src/silo/query_engine/actions/mutations.cpp b/src/silo/query_engine/actions/mutations.cpp deleted file mode 100644 index 6edd555a5..000000000 --- a/src/silo/query_engine/actions/mutations.cpp +++ /dev/null @@ -1,142 +0,0 @@ -#include "silo/query_engine/actions/mutations.h" - -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#include -#include - -#include "silo/common/aa_symbols.h" -#include "silo/common/nucleotide_symbols.h" -#include "silo/common/symbol_map.h" -#include "silo/query_engine/actions/action.h" -#include "silo/query_engine/copy_on_write_bitmap.h" -#include "silo/query_engine/exec_node/arrow_util.h" -#include "silo/query_engine/exec_node/json_value_type_array_builder.h" -#include "silo/query_engine/illegal_query_exception.h" -#include "silo/query_engine/operators/query_node.h" -#include "silo/storage/column/sequence_column.h" - -namespace silo::query_engine::actions { - -template -Mutations::Mutations( - std::vector&& sequence_names, - double min_proportion, - std::vector&& fields -) - : sequence_names(std::move(sequence_names)), - min_proportion(min_proportion), - fields(std::move(fields)) { - if (this->fields.empty()) { - this->fields = std::vector{VALID_FIELDS.begin(), VALID_FIELDS.end()}; - } -} - -namespace { - -const std::string SEQUENCE_NAMES_FIELD_NAME = "sequenceNames"; -const std::string MIN_PROPORTION_FIELD_NAME = "minProportion"; - -} // namespace - -template -// NOLINTNEXTLINE(readability-identifier-naming,readability-function-cognitive-complexity) -void from_json(const nlohmann::json& json, std::unique_ptr>& action) { - std::vector sequence_names; - if (json.contains(SEQUENCE_NAMES_FIELD_NAME)) { - CHECK_SILO_QUERY( - json[SEQUENCE_NAMES_FIELD_NAME].is_array(), - "Mutations action can have the field {} of type array of " - "strings, but no other type", - SEQUENCE_NAMES_FIELD_NAME - ); - for (const auto& child : json[SEQUENCE_NAMES_FIELD_NAME]) { - CHECK_SILO_QUERY( - child.is_string(), - "The field {}" - " of Mutations action must have type " - "array, if present. Found: {}", - SEQUENCE_NAMES_FIELD_NAME, - child.dump() - ); - sequence_names.emplace_back(child.get()); - } - } - - CHECK_SILO_QUERY( - json.contains(MIN_PROPORTION_FIELD_NAME) && json[MIN_PROPORTION_FIELD_NAME].is_number(), - "Mutations action must contain the field {0}" - " of type number with limits [0.0, " - "1.0]. Only mutations are returned if the proportion of sequences having this mutation, " - "is at least {0}", - MIN_PROPORTION_FIELD_NAME - ); - const double min_proportion = json[MIN_PROPORTION_FIELD_NAME].get(); - if (min_proportion < 0 || min_proportion > 1) { - throw IllegalQueryException( - "Invalid proportion: " + MIN_PROPORTION_FIELD_NAME + " must be in interval [0.0, 1.0]" - ); - } - - std::vector fields; - if (json.contains("fields")) { - CHECK_SILO_QUERY( - json["fields"].is_array(), - "The field 'fields' for a Mutations action must be an array of strings" - ); - for (const auto& field_json : json["fields"]) { - CHECK_SILO_QUERY( - field_json.is_string(), - "The field 'fields' for a Mutations action must be an array of strings" - ); - const std::string field = field_json; - auto iter = - std::ranges::find_if(Mutations::VALID_FIELDS, [&](const auto& valid_field) { - return valid_field == field; - }); - CHECK_SILO_QUERY( - iter != Mutations::VALID_FIELDS.end(), - "The attribute 'fields' contains an invalid field '{}'. Valid fields are {}.", - field, - boost::join( - std::vector{ - Mutations::VALID_FIELDS.begin(), - Mutations::VALID_FIELDS.end() - }, - ", " - ) - ); - fields.push_back(*iter); - } - } - - action = std::make_unique>( - std::move(sequence_names), min_proportion, std::move(fields) - ); -} - -template class Mutations; -template class Mutations; -// NOLINTNEXTLINE(readability-identifier-naming) -template void from_json( - const nlohmann::json& json, - std::unique_ptr>& action -); -// NOLINTNEXTLINE(readability-identifier-naming) -template void from_json( - const nlohmann::json& json, - std::unique_ptr>& action -); - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/mutations.h b/src/silo/query_engine/actions/mutations.h deleted file mode 100644 index a2ebdb7d2..000000000 --- a/src/silo/query_engine/actions/mutations.h +++ /dev/null @@ -1,63 +0,0 @@ -#pragma once - -#include -#include -#include -#include -#include -#include -#include - -#include -#include -#include -#include - -#include "silo/common/symbol_map.h" -#include "silo/query_engine/actions/action.h" -#include "silo/query_engine/exec_node/json_value_type_array_builder.h" -#include "silo/storage/column/sequence_column.h" -#include "silo/storage/table.h" - -namespace silo::query_engine::actions { - -template -class Mutations : public Action { - public: - static constexpr std::string_view MUTATION_FIELD_NAME = "mutation"; - static constexpr std::string_view MUTATION_FROM_FIELD_NAME = "mutationFrom"; - static constexpr std::string_view MUTATION_TO_FIELD_NAME = "mutationTo"; - static constexpr std::string_view POSITION_FIELD_NAME = "position"; - static constexpr std::string_view SEQUENCE_FIELD_NAME = "sequenceName"; - static constexpr std::string_view PROPORTION_FIELD_NAME = "proportion"; - static constexpr std::string_view COVERAGE_FIELD_NAME = "coverage"; - static constexpr std::string_view COUNT_FIELD_NAME = "count"; - static constexpr std::array VALID_FIELDS{ - MUTATION_FIELD_NAME, - MUTATION_FROM_FIELD_NAME, - MUTATION_TO_FIELD_NAME, - POSITION_FIELD_NAME, - SEQUENCE_FIELD_NAME, - PROPORTION_FIELD_NAME, - COVERAGE_FIELD_NAME, - COUNT_FIELD_NAME - }; - - std::vector sequence_names; - double min_proportion; - std::vector fields; - - private: - public: - explicit Mutations( - std::vector&& sequence_names, - double min_proportion, - std::vector&& fields - ); -}; - -template -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr>& action); - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/mutations.test.cpp b/src/silo/query_engine/actions/mutations.test.cpp deleted file mode 100644 index 30416df1b..000000000 --- a/src/silo/query_engine/actions/mutations.test.cpp +++ /dev/null @@ -1,336 +0,0 @@ -#include -#include -#include - -#include "silo/test/query_fixture.test.h" - -namespace { -using silo::ReferenceGenomes; -using silo::test::QueryTestData; -using silo::test::QueryTestScenario; - -using boost::uuids::random_generator; - -nlohmann::json createDataWithNucleotideSequence(const std::string& nucleotideSequence) { - random_generator generator; - const auto primary_key = generator(); - - return { - {"primaryKey", "id_" + to_string(primary_key)}, - {"segment1", {{"sequence", nucleotideSequence}, {"insertions", nlohmann::json::array()}}}, - {"unaligned_segment1", nullptr}, - {"gene1", nullptr} - }; -} - -const nlohmann::json DATA_SAME_AS_REFERENCE = createDataWithNucleotideSequence("ATGCN"); -const nlohmann::json DATA_SAME_AS_REFERENCE2 = createDataWithNucleotideSequence("ATGCN"); -const nlohmann::json DATA_WITH_ALL_N = createDataWithNucleotideSequence("NNNNN"); -const nlohmann::json DATA_WITH_ALL_MUTATED = createDataWithNucleotideSequence("CATTT"); - -const auto DATABASE_CONFIG = - R"( -defaultNucleotideSequence: "segment1" -schema: - instanceName: "dummy name" - metadata: - - name: "primaryKey" - type: "string" - primaryKey: "primaryKey" -)"; - -const auto REFERENCE_GENOMES = ReferenceGenomes{ - {{"segment1", "ATGCN"}}, - {{"gene1", "M*"}}, -}; - -const QueryTestData TEST_DATA{ - .ndjson_input_data = - {DATA_SAME_AS_REFERENCE, DATA_SAME_AS_REFERENCE2, DATA_WITH_ALL_N, DATA_WITH_ALL_MUTATED}, - .database_config = DATABASE_CONFIG, - .reference_genomes = REFERENCE_GENOMES -}; - -const QueryTestScenario MUTATIONS = { - .name = "MUTATIONS", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Mutations", - "minProportion": 0.05 - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"count":1,"coverage":3,"mutation":"A1C","mutationFrom":"A","mutationTo":"C","position":1,"proportion":0.3333333333333333,"sequenceName":"segment1"}, -{"count":1,"coverage":3,"mutation":"T2A","mutationFrom":"T","mutationTo":"A","position":2,"proportion":0.3333333333333333,"sequenceName":"segment1"}, -{"count":1,"coverage":3,"mutation":"G3T","mutationFrom":"G","mutationTo":"T","position":3,"proportion":0.3333333333333333,"sequenceName":"segment1"}, -{"count":1,"coverage":3,"mutation":"C4T","mutationFrom":"C","mutationTo":"T","position":4,"proportion":0.3333333333333333,"sequenceName":"segment1"}, -{"count":1,"coverage":1,"mutation":"N5T","mutationFrom":"N","mutationTo":"T","position":5,"proportion":1.0,"sequenceName":"segment1"}])" - ) -}; - -const QueryTestScenario MUTATIONS_SUBFIELDS = { - .name = "MUTATIONS_SUBFIELDS", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Mutations", - "fields": ["count","coverage","mutation"], - "minProportion": 0.05 - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"count":1,"coverage":3,"mutation":"A1C"}, -{"count":1,"coverage":3,"mutation":"T2A"}, -{"count":1,"coverage":3,"mutation":"G3T"}, -{"count":1,"coverage":3,"mutation":"C4T"}, -{"count":1,"coverage":1,"mutation":"N5T"}])" - ) -}; - -const QueryTestScenario MUTATIONS_SUBFIELDS_HIGH_MIN = { - .name = "MUTATIONS_SUBFIELDS_HIGH_MIN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Mutations", - "fields": ["count","coverage","mutation"], - "minProportion": 0.5 - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"( -[{"count":1,"coverage":1,"mutation":"N5T"}])" - ) -}; - -const QueryTestScenario MUTATIONS_INVALID_FIELDS = { - .name = "MUTATIONS_INVALID_FIELDS", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Mutations", - "fields": ["count","foo"], - "minProportion": 0.5 - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_error_message = - "The attribute 'fields' contains an invalid field 'foo'. Valid fields are mutation, " - "mutationFrom, mutationTo, position, sequenceName, proportion, coverage, count." -}; - -const QueryTestScenario MUTATIONS_INVALID_FIELD_TYPE = { - .name = "MUTATIONS_INVALID_FIELD_TYPE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Mutations", - "minProportion": 0.5, - "fields": "count" - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_error_message = "The field 'fields' for a Mutations action must be an array of strings" -}; - -} // namespace - -QUERY_TEST( - Mutations, - TEST_DATA, - ::testing::Values( - MUTATIONS, - MUTATIONS_SUBFIELDS, - MUTATIONS_SUBFIELDS_HIGH_MIN, - MUTATIONS_INVALID_FIELDS, - MUTATIONS_INVALID_FIELD_TYPE - ) -); - -namespace { - -const QueryTestData TEST_DATA2{ - .ndjson_input_data = - []() { - std::vector data; - data.reserve(100000); - for (int i = 0; i < 20000; ++i) { - data.push_back(createDataWithNucleotideSequence("CATTT")); - } - for (int i = 0; i < 20000; ++i) { - data.push_back(createDataWithNucleotideSequence("ATGCN")); - } - for (int i = 0; i < 20000; ++i) { - data.push_back(createDataWithNucleotideSequence("CATTT")); - } - for (int i = 0; i < 20000; ++i) { - data.push_back(createDataWithNucleotideSequence("NNCNN")); - } - for (int i = 0; i < 20000; ++i) { - data.push_back(createDataWithNucleotideSequence("ANCNN")); - } - return data; - }(), - .database_config = DATABASE_CONFIG, - .reference_genomes = REFERENCE_GENOMES -}; - -const QueryTestScenario MUTATIONS_BIG = { - .name = "MUTATIONS_BIG", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Mutations", - "minProportion": 0.05 - }, - "filterExpression": { - "type": "True" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"([ -{"count":40000,"coverage":80000,"mutation":"A1C","mutationFrom":"A","mutationTo":"C","position":1,"proportion":0.5,"sequenceName":"segment1"}, -{"count":40000,"coverage":60000,"mutation":"T2A","mutationFrom":"T","mutationTo":"A","position":2,"proportion":0.6666666666666666,"sequenceName":"segment1"}, -{"count":40000,"coverage":100000,"mutation":"G3C","mutationFrom":"G","mutationTo":"C","position":3,"proportion":0.4,"sequenceName":"segment1"}, -{"count":40000,"coverage":100000,"mutation":"G3T","mutationFrom":"G","mutationTo":"T","position":3,"proportion":0.4,"sequenceName":"segment1"}, -{"count":40000,"coverage":60000,"mutation":"C4T","mutationFrom":"C","mutationTo":"T","position":4,"proportion":0.6666666666666666,"sequenceName":"segment1"}, -{"count":40000,"coverage":40000,"mutation":"N5T","mutationFrom":"N","mutationTo":"T","position":5,"proportion":1.0,"sequenceName":"segment1"}])" - ) -}; - -const QueryTestScenario MUTATIONS_BIG_SELECTIVE = { - .name = "MUTATIONS_BIG_SELECTIVE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Mutations", - "minProportion": 0.05 - }, - "filterExpression": { - "type": "NucleotideEquals", - "position": 3, - "symbol": "C", - "sequenceName": "segment1" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"([ -{"count":40000,"coverage":40000,"mutation":"G3C","mutationFrom":"G","mutationTo":"C","position":3,"proportion":1.0,"sequenceName":"segment1"} -])" - ) -}; - -const QueryTestScenario MUTATIONS_BIG_SELECTIVE2 = { - .name = "MUTATIONS_BIG_SELECTIVE2", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Mutations", - "minProportion": 0.05 - }, - "filterExpression": { - "type": "NucleotideEquals", - "position": 1, - "symbol": "C", - "sequenceName": "segment1" - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"([ -{"count":40000,"coverage":40000,"mutation":"A1C","mutationFrom":"A","mutationTo":"C","position":1,"proportion":1.0,"sequenceName":"segment1"}, -{"count":40000,"coverage":40000,"mutation":"T2A","mutationFrom":"T","mutationTo":"A","position":2,"proportion":1.0,"sequenceName":"segment1"}, -{"count":40000,"coverage":40000,"mutation":"G3T","mutationFrom":"G","mutationTo":"T","position":3,"proportion":1.0,"sequenceName":"segment1"}, -{"count":40000,"coverage":40000,"mutation":"C4T","mutationFrom":"C","mutationTo":"T","position":4,"proportion":1.0,"sequenceName":"segment1"}, -{"count":40000,"coverage":40000,"mutation":"N5T","mutationFrom":"N","mutationTo":"T","position":5,"proportion":1.0,"sequenceName":"segment1"} -])" - ) -}; - -const QueryTestScenario MUTATIONS_BIG_SELECTIVE_END = { - .name = "MUTATIONS_BIG_SELECTIVE_END", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Mutations", - "minProportion": 0.05 - }, - "filterExpression": { - "type": "And", - "children": [ - { - "type": "NucleotideEquals", - "position": 1, - "symbol": "A", - "sequenceName": "segment1" - }, - { - "type": "NucleotideEquals", - "position": 3, - "symbol": "C", - "sequenceName": "segment1" - } - ] - } -} -)" - ), - .expected_query_result = nlohmann::json::parse( - R"([ -{"count":20000,"coverage":20000,"mutation":"G3C","mutationFrom":"G","mutationTo":"C","position":3,"proportion":1.0,"sequenceName":"segment1"} -])" - ) -}; - -} // namespace - -QUERY_TEST( - MutationsBig, - TEST_DATA2, - ::testing::Values( - MUTATIONS_BIG, - MUTATIONS_BIG_SELECTIVE, - MUTATIONS_BIG_SELECTIVE2, - MUTATIONS_BIG_SELECTIVE_END - ) -); diff --git a/src/silo/query_engine/actions/phylo_subtree.cpp b/src/silo/query_engine/actions/phylo_subtree.cpp deleted file mode 100644 index 5c64b773c..000000000 --- a/src/silo/query_engine/actions/phylo_subtree.cpp +++ /dev/null @@ -1,59 +0,0 @@ -#include "silo/query_engine/actions/phylo_subtree.h" - -#include -#include -#include -#include - -#include -#include -#include -#include - -#include "silo/common/phylo_tree.h" -#include "silo/query_engine/exec_node/json_value_type_array_builder.h" -#include "silo/query_engine/illegal_query_exception.h" -#include "silo/query_engine/operators/query_node.h" -#include "silo/schema/database_schema.h" - -namespace silo::query_engine::actions { - -PhyloSubtree::PhyloSubtree( - std::string column_name, - bool print_nodes_not_in_tree, - bool contract_unary_nodes -) - : column_name(std::move(column_name)), - print_nodes_not_in_tree(print_nodes_not_in_tree), - contract_unary_nodes(contract_unary_nodes) {} - -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& action) { - CHECK_SILO_QUERY( - json.contains("columnName"), "error: 'columnName' field is required in PhyloSubtree action" - ); - CHECK_SILO_QUERY( - json["columnName"].is_string(), - "error: 'columnName' field in PhyloSubtree action must be a string" - ); - if (json.contains("printNodesNotInTree")) { - CHECK_SILO_QUERY( - json["printNodesNotInTree"].is_boolean(), - "error: 'printNodesNotInTree' field in PhyloSubtree action must be a boolean" - ); - } - if (json.contains("contractUnaryNodes")) { - CHECK_SILO_QUERY( - json["contractUnaryNodes"].is_boolean(), - "error: 'contractUnaryNodes' field in PhyloSubtree action must be a boolean" - ); - } - const bool print_nodes_not_in_tree = json.value("printNodesNotInTree", false); - const bool contract_unary_nodes = json.value("contractUnaryNodes", true); - const std::string column_name = json["columnName"].get(); - - action = - std::make_unique(column_name, print_nodes_not_in_tree, contract_unary_nodes); -} - -} // namespace silo::query_engine::actions diff --git a/src/silo/query_engine/actions/phylo_subtree.h b/src/silo/query_engine/actions/phylo_subtree.h deleted file mode 100644 index 446c4b7e1..000000000 --- a/src/silo/query_engine/actions/phylo_subtree.h +++ /dev/null @@ -1,27 +0,0 @@ -#pragma once - -#include -#include -#include - -#include - -#include - -#include "silo/query_engine/actions/action.h" - -namespace silo::query_engine::actions { - -class PhyloSubtree : public Action { - public: - std::string column_name; - bool print_nodes_not_in_tree; - bool contract_unary_nodes = false; - - PhyloSubtree(std::string column_name, bool print_nodes_not_in_tree, bool contract_unary_nodes); -}; - -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& action); - -} // namespace silo::query_engine::actions \ No newline at end of file diff --git a/src/silo/query_engine/binder.cpp b/src/silo/query_engine/binder.cpp deleted file mode 100644 index fe218743e..000000000 --- a/src/silo/query_engine/binder.cpp +++ /dev/null @@ -1,416 +0,0 @@ -#include "silo/query_engine/binder.h" - -#include -#include - -#include - -#include "silo/query_engine/actions/action.h" -#include "silo/query_engine/actions/aggregated.h" -#include "silo/query_engine/actions/details.h" -#include "silo/query_engine/actions/fasta.h" -#include "silo/query_engine/actions/fasta_aligned.h" -#include "silo/query_engine/actions/insertions.h" -#include "silo/query_engine/actions/most_recent_common_ancestor.h" -#include "silo/query_engine/actions/mutations.h" -#include "silo/query_engine/actions/phylo_subtree.h" -#include "silo/query_engine/filter/expressions/expression.h" -#include "silo/query_engine/illegal_query_exception.h" -#include "silo/query_engine/operators/aggregate_node.h" -#include "silo/query_engine/operators/fetch_node.h" -#include "silo/query_engine/operators/insertions_node.h" -#include "silo/query_engine/operators/most_recent_common_ancestor_node.h" -#include "silo/query_engine/operators/mutations_node.h" -#include "silo/query_engine/operators/order_by_node.h" -#include "silo/query_engine/operators/phylo_subtree_node.h" -#include "silo/query_engine/operators/table_scan_node.h" -#include "silo/query_engine/operators/zstd_decompress_node.h" - -namespace silo::query_engine { - -namespace { - -std::vector deduplicateOrderPreserving(const std::vector& fields) { - std::vector unique_fields; - std::unordered_set seen; - - for (const auto& field : fields) { - if (seen.insert(field).second) { - unique_fields.push_back(field); - } - } - return unique_fields; -} - -std::shared_ptr bindTableNameToTable( - const std::map>& tables, - const schema::TableName& table_name -) { - auto iter = tables.find(table_name); - CHECK_SILO_QUERY( - iter != tables.end(), "The table {} is not contained in the database", table_name.getName() - ); - return iter->second; -} - -std::vector bindFieldsToColumnsByName( - const std::vector& field_names, - const storage::Table& table -) { - std::vector field_identifiers; - for (const auto& field_name : field_names) { - auto col = table.schema->getColumn(field_name); - CHECK_SILO_QUERY(col.has_value(), "The table does not contain the field {}", field_name); - field_identifiers.emplace_back(field_name, col.value().type); - } - return field_identifiers; -} - -operators::QueryNodePtr bindAction( - actions::FastaAligned* action, - std::unique_ptr filter_expression, - const schema::TableName& table_name, - const std::map>& tables -) { - auto table = bindTableNameToTable(tables, table_name); - - std::vector all_fields; - all_fields.push_back(table->schema->primary_key.name); - std::ranges::copy(action->sequence_names, std::back_inserter(all_fields)); - std::ranges::copy(action->additional_fields, std::back_inserter(all_fields)); - - all_fields = deduplicateOrderPreserving(all_fields); - - auto bound_fields = bindFieldsToColumnsByName(all_fields, *table); - return std::make_unique( - std::move(table), std::move(filter_expression), std::move(bound_fields) - ); -} - -operators::QueryNodePtr bindAction( - actions::Fasta* action, - std::unique_ptr filter_expression, - const schema::TableName& table_name, - const std::map>& tables -) { - auto table = bindTableNameToTable(tables, table_name); - - std::vector all_fields; - all_fields.push_back(table->schema->primary_key.name); - std::ranges::copy(action->sequence_names, std::back_inserter(all_fields)); - std::ranges::copy(action->additional_fields, std::back_inserter(all_fields)); - - all_fields = deduplicateOrderPreserving(all_fields); - - auto bound_fields = bindFieldsToColumnsByName(all_fields, *table); - return std::make_unique( - std::move(table), std::move(filter_expression), std::move(bound_fields) - ); -} - -operators::QueryNodePtr bindAction( - actions::Details* action, - std::unique_ptr filter_expression, - const schema::TableName& table_name, - const std::map>& tables -) { - auto table = bindTableNameToTable(tables, table_name); - std::vector bound_fields; - if (action->fields.empty()) { - auto all_non_sequence_fields = std::ranges::filter_view( - table->schema->getColumnIdentifiers(), - [&](const auto& identifier) { return !schema::isSequenceColumn(identifier.type); } - ); - bound_fields = {all_non_sequence_fields.begin(), all_non_sequence_fields.end()}; - } else { - bound_fields = bindFieldsToColumnsByName(deduplicateOrderPreserving(action->fields), *table); - } - return std::make_unique( - std::move(table), std::move(filter_expression), std::move(bound_fields) - ); -} - -operators::QueryNodePtr bindAction( - actions::Aggregated* action, - std::unique_ptr filter_expression, - const schema::TableName& table_name, - const std::map>& tables -) { - std::vector group_by_field_names; - std::ranges::transform( - action->group_by_fields, - std::back_inserter(group_by_field_names), - [](auto& field) { return field.name; } - ); - - group_by_field_names = deduplicateOrderPreserving(group_by_field_names); - - auto table = bindTableNameToTable(tables, table_name); - auto bound_fields = bindFieldsToColumnsByName(group_by_field_names, *table); - // fields for table_scan cannot be empty, otherwise, we cannot aggregate without fields - std::vector fields_for_table_scan; - if (bound_fields.empty()) { - fields_for_table_scan = {table->schema->primary_key}; - } else { - fields_for_table_scan = bound_fields; - } - auto scan = std::make_unique( - std::move(table), std::move(filter_expression), fields_for_table_scan - ); - return std::make_unique(std::move(scan), std::move(bound_fields)); -} - -template -operators::QueryNodePtr bindAction( - actions::Mutations* action, - std::unique_ptr filter_expression, - const schema::TableName& table_name, - const std::map>& tables -) { - auto table = bindTableNameToTable(tables, table_name); - - std::vector bound_sequence_columns; - for (const auto& sequence_name : action->sequence_names) { - auto column_identifier = table->schema->getColumn(sequence_name); - CHECK_SILO_QUERY( - column_identifier.has_value() && column_identifier.value().type == SymbolType::COLUMN_TYPE, - "The database does not contain the {} sequence '{}'", - SymbolType::SYMBOL_NAME, - sequence_name - ); - bound_sequence_columns.emplace_back(column_identifier.value()); - } - if (action->sequence_names.empty()) { - for (const auto& column_identifier : - table->schema->getColumnByType()) { - bound_sequence_columns.emplace_back(column_identifier); - } - } - - std::vector fields_to_use = action->fields; - if (fields_to_use.empty()) { - fields_to_use = { - actions::Mutations::MUTATION_FIELD_NAME, - actions::Mutations::MUTATION_FROM_FIELD_NAME, - actions::Mutations::MUTATION_TO_FIELD_NAME, - actions::Mutations::POSITION_FIELD_NAME, - actions::Mutations::SEQUENCE_FIELD_NAME, - actions::Mutations::PROPORTION_FIELD_NAME, - actions::Mutations::COVERAGE_FIELD_NAME, - actions::Mutations::COUNT_FIELD_NAME - }; - } - - return std::make_unique>( - std::move(table), - std::move(filter_expression), - bound_sequence_columns, - action->min_proportion, - fields_to_use - ); -} - -template -operators::QueryNodePtr bindAction( - actions::InsertionAggregation* action, - std::unique_ptr filter_expression, - const schema::TableName& table_name, - const std::map>& tables -) { - auto table = bindTableNameToTable(tables, table_name); - - std::vector bound_sequence_columns; - for (const auto& sequence_name : action->sequence_names) { - auto column_identifier = table->schema->getColumn(sequence_name); - CHECK_SILO_QUERY( - column_identifier.has_value() && column_identifier.value().type == SymbolType::COLUMN_TYPE, - "The database does not contain the {} sequence '{}'", - SymbolType::SYMBOL_NAME, - sequence_name - ); - bound_sequence_columns.emplace_back(column_identifier.value()); - } - if (action->sequence_names.empty()) { - for (const auto& column_identifier : - table->schema->getColumnByType()) { - bound_sequence_columns.emplace_back(column_identifier); - } - } - - return std::make_unique>( - std::move(table), std::move(filter_expression), bound_sequence_columns - ); -} - -operators::QueryNodePtr bindAction( - actions::PhyloSubtree* action, - std::unique_ptr filter_expression, - const schema::TableName& table_name, - const std::map>& tables -) { - auto table = bindTableNameToTable(tables, table_name); - return std::make_unique( - std::move(table), - std::move(filter_expression), - action->column_name, - action->print_nodes_not_in_tree, - action->contract_unary_nodes - ); -} - -operators::QueryNodePtr bindAction( - actions::MostRecentCommonAncestor* action, - std::unique_ptr filter_expression, - const schema::TableName& table_name, - const std::map>& tables -) { - auto table = bindTableNameToTable(tables, table_name); - return std::make_unique( - std::move(table), - std::move(filter_expression), - action->column_name, - action->print_nodes_not_in_tree - ); -} - -operators::QueryNodePtr bindBaseAction( - ActionQuery action_query, - const std::map>& tables -) { - if (dynamic_cast(action_query.action.get()) != nullptr) { - auto* specialized_action = dynamic_cast(action_query.action.get()); - return bindAction( - specialized_action, std::move(action_query.filter), action_query.table_name, tables - ); - } - if (dynamic_cast(action_query.action.get()) != nullptr) { - auto* specialized_action = dynamic_cast(action_query.action.get()); - return bindAction( - specialized_action, std::move(action_query.filter), action_query.table_name, tables - ); - } - if (dynamic_cast(action_query.action.get()) != nullptr) { - auto* specialized_action = dynamic_cast(action_query.action.get()); - return bindAction( - specialized_action, std::move(action_query.filter), action_query.table_name, tables - ); - } - if (dynamic_cast(action_query.action.get()) != nullptr) { - auto* specialized_action = dynamic_cast(action_query.action.get()); - return bindAction( - specialized_action, std::move(action_query.filter), action_query.table_name, tables - ); - } - if (dynamic_cast*>(action_query.action.get()) != nullptr) { - auto* specialized_action = - dynamic_cast*>(action_query.action.get()); - return bindAction( - specialized_action, std::move(action_query.filter), action_query.table_name, tables - ); - } - if (dynamic_cast*>(action_query.action.get()) != nullptr) { - auto* specialized_action = - dynamic_cast*>(action_query.action.get()); - return bindAction( - specialized_action, std::move(action_query.filter), action_query.table_name, tables - ); - } - if (dynamic_cast*>(action_query.action.get()) != - nullptr) { - auto* specialized_action = - dynamic_cast*>(action_query.action.get()); - return bindAction( - specialized_action, std::move(action_query.filter), action_query.table_name, tables - ); - } - if (dynamic_cast*>(action_query.action.get()) != - nullptr) { - auto* specialized_action = - dynamic_cast*>(action_query.action.get()); - return bindAction( - specialized_action, std::move(action_query.filter), action_query.table_name, tables - ); - } - if (dynamic_cast(action_query.action.get()) != nullptr) { - auto* specialized_action = dynamic_cast(action_query.action.get()); - return bindAction( - specialized_action, std::move(action_query.filter), action_query.table_name, tables - ); - } - if (dynamic_cast(action_query.action.get()) != nullptr) { - auto* specialized_action = - dynamic_cast(action_query.action.get()); - return bindAction( - specialized_action, std::move(action_query.filter), action_query.table_name, tables - ); - } - SILO_UNREACHABLE(); -} - -std::optional>> -getDecompressInfo( - const std::vector& columns, - const std::shared_ptr& column_schema -) { - std::map> - table_schemas_for_decompression; - for (const auto& column_identifier : columns) { - if (silo::schema::isSequenceColumn(column_identifier.type)) { - table_schemas_for_decompression.emplace(column_identifier, column_schema); - } - } - if (table_schemas_for_decompression.empty()) { - return std::nullopt; - } - return table_schemas_for_decompression; -} - -} // namespace - -operators::QueryNodePtr Binder::bindQuery( - ActionQuery action_query, - const std::map>& tables -) { - auto table_name = action_query.table_name; - auto order_by_fields = action_query.action->getOrderByFields(); - auto randomize = action_query.action->getRandomizeSeed(); - auto limit = action_query.action->getLimit(); - auto offset = action_query.action->getOffset(); - auto node = bindBaseAction(std::move(action_query), tables); - - if (!order_by_fields.empty() || randomize) { - auto field_identifiers = node->getOutputSchema(); - std::vector field_names; - std::ranges::transform( - field_identifiers, - std::back_inserter(field_names), - [](const auto& identifier) { return identifier.name; } - ); - - for (const OrderByField& order_by_field : order_by_fields) { - CHECK_SILO_QUERY( - std::ranges::find(field_names, order_by_field.name) != field_names.end(), - "OrderByField {} is not contained in the result of this operation. " - "Allowed values are {}.", - order_by_field.name, - fmt::join(field_names, ", ") - ); - } - // TODO(#800) add optimized sorting when limit is supplied - node = std::make_unique(std::move(node), order_by_fields, randomize); - } - - if (limit.has_value() || offset.has_value()) { - node = std::make_unique(std::move(node), limit, offset); - } - - auto decompress_info = getDecompressInfo(node->getOutputSchema(), tables.at(table_name)->schema); - if (decompress_info) { - node = - std::make_unique(std::move(node), decompress_info.value()); - } - - return node; -} - -} // namespace silo::query_engine diff --git a/src/silo/query_engine/binder.h b/src/silo/query_engine/binder.h deleted file mode 100644 index 4bff6b28a..000000000 --- a/src/silo/query_engine/binder.h +++ /dev/null @@ -1,16 +0,0 @@ -#pragma once - -#include "silo/query_engine/action_query.h" -#include "silo/query_engine/operators/query_node.h" - -namespace silo::query_engine { - -class Binder { - public: - static operators::QueryNodePtr bindQuery( - ActionQuery action_query, - const std::map>& tables - ); -}; - -} // namespace silo::query_engine diff --git a/src/silo/query_engine/filter/expressions/and.cpp b/src/silo/query_engine/filter/expressions/and.cpp index e13abfa92..8aa0164e6 100644 --- a/src/silo/query_engine/filter/expressions/and.cpp +++ b/src/silo/query_engine/filter/expressions/and.cpp @@ -9,7 +9,6 @@ #include #include #include -#include #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/complement.h" @@ -200,16 +199,4 @@ std::unique_ptr And::compile(const storage::Table& table) const { return result; } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("children"), "The field 'children' is required in an And expression" - ); - CHECK_SILO_QUERY( - json["children"].is_array(), "The field 'children' in an And expression needs to be an array" - ); - auto children = json.at("children").get(); - filter = std::make_unique(std::move(children)); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/and.h b/src/silo/query_engine/filter/expressions/and.h index 1470871ca..fc8c580ba 100644 --- a/src/silo/query_engine/filter/expressions/and.h +++ b/src/silo/query_engine/filter/expressions/and.h @@ -4,8 +4,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" #include "silo/query_engine/filter/operators/selection.h" @@ -34,7 +32,4 @@ class And : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/and.test.cpp b/src/silo/query_engine/filter/expressions/and.test.cpp index 8e0c0369b..8ac6f81b1 100644 --- a/src/silo/query_engine/filter/expressions/and.test.cpp +++ b/src/silo/query_engine/filter/expressions/and.test.cpp @@ -1,5 +1,3 @@ -#include -#include #include #include "silo/test/query_fixture.test.h" @@ -9,8 +7,6 @@ using silo::ReferenceGenomes; using silo::test::QueryTestData; using silo::test::QueryTestScenario; -using boost::uuids::random_generator; - nlohmann::json createData(const std::string& country, const std::string& date) { static std::atomic_int row_id = 0; const auto primary_key = row_id++; @@ -83,41 +79,9 @@ const QueryTestData TEST_DATA{ const QueryTestScenario NESTED_AND = { .name = "NESTED_AND", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details" - }, - "filterExpression": { - "children": [ - { - "column": "date", - "from": "2009-01-01", - "to": null, - "type": "DateBetween" - }, - { - "children": [ - { - "column": "date", - "from": "2000-01-01", - "to": null, - "type": "DateBetween" - }, - { - "column": "country", - "value": "Germany", - "type": "StringEquals" - } - ], - "type": "And" - } - ], - "type": "And" - } -})" - ), + .query = + "default.filter(date >= '2009-01-01'::date && date >= '2000-01-01'::date && country = " + "'Germany').project({age,country,coverage,date,primaryKey})", .expected_query_result = nlohmann::json::parse( R"( [{"age":13,"country":"Germany","coverage":0.9,"date":"2009-06-07","primaryKey":"id_2"}])" diff --git a/src/silo/query_engine/filter/expressions/bool_equals.cpp b/src/silo/query_engine/filter/expressions/bool_equals.cpp index 1c29ebc00..d2ca64dd6 100644 --- a/src/silo/query_engine/filter/expressions/bool_equals.cpp +++ b/src/silo/query_engine/filter/expressions/bool_equals.cpp @@ -3,7 +3,6 @@ #include #include -#include #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/index_scan.h" @@ -56,27 +55,4 @@ std::unique_ptr BoolEquals::compile(const storage::Table& t SILO_UNREACHABLE(); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("column"), "The field 'column' is required in an BoolEquals expression" - ); - CHECK_SILO_QUERY( - json["column"].is_string(), "The field 'column' in an BoolEquals expression must be a string" - ); - CHECK_SILO_QUERY( - json.contains("value"), "The field 'value' is required in an BoolEquals expression" - ); - CHECK_SILO_QUERY( - json["value"].is_boolean() || json["value"].is_null(), - "The field 'value' in an BoolEquals expression must be a boolean or null" - ); - const std::string& column_name = json["column"]; - std::optional value; - if (!json["value"].is_null()) { - value = json["value"].get(); - } - filter = std::make_unique(column_name, value); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/bool_equals.h b/src/silo/query_engine/filter/expressions/bool_equals.h index 0e253f85f..5b970d4ea 100644 --- a/src/silo/query_engine/filter/expressions/bool_equals.h +++ b/src/silo/query_engine/filter/expressions/bool_equals.h @@ -3,8 +3,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -29,7 +27,4 @@ struct BoolEquals : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/date_between.cpp b/src/silo/query_engine/filter/expressions/date_between.cpp index 73979bba4..37bbcc93e 100644 --- a/src/silo/query_engine/filter/expressions/date_between.cpp +++ b/src/silo/query_engine/filter/expressions/date_between.cpp @@ -6,7 +6,6 @@ #include #include -#include #include "silo/common/date32.h" #include "silo/query_engine/filter/operators/range_selection.h" @@ -102,45 +101,4 @@ std::vector DateBe return ranges; } -// NOLINTNEXTLINE(readability-identifier-naming,readability-function-cognitive-complexity) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("column"), "The field 'column' is required in a DateBetween expression" - ); - CHECK_SILO_QUERY( - json["column"].is_string(), - "The field 'column' in a DateBetween expression needs to be a string" - ); - CHECK_SILO_QUERY( - json.contains("from"), "The field 'from' is required in DateBetween expression" - ); - CHECK_SILO_QUERY( - json["from"].is_null() || (json["from"].is_string() && !json["from"].empty()), - "The field 'from' in a DateBetween expression needs to be a string or null" - ); - CHECK_SILO_QUERY(json.contains("to"), "The field 'to' is required in a DateBetween expression"); - CHECK_SILO_QUERY( - json["to"].is_null() || (json["to"].is_string() && !json["to"].empty()), - "The field 'to' in a DateBetween expression needs to be a non-empty string or null" - ); - const std::string& column_name = json["column"]; - std::optional date_from; - if (json["from"].is_string()) { - const auto from_string = json["from"].get(); - auto from_result = common::stringToDate32(from_string); - CHECK_SILO_QUERY( - from_result.has_value(), "Invalid date in 'from' field: {}", from_result.error() - ); - date_from = from_result.value(); - } - std::optional date_to; - if (json["to"].is_string()) { - const auto to_string = json["to"].get(); - auto to_result = common::stringToDate32(to_string); - CHECK_SILO_QUERY(to_result.has_value(), "Invalid date in 'to' field: {}", to_result.error()); - date_to = to_result.value(); - } - filter = std::make_unique(column_name, date_from, date_to); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/date_between.h b/src/silo/query_engine/filter/expressions/date_between.h index 255256138..dc1fd567c 100644 --- a/src/silo/query_engine/filter/expressions/date_between.h +++ b/src/silo/query_engine/filter/expressions/date_between.h @@ -5,8 +5,6 @@ #include #include -#include - #include "silo/common/date32.h" #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -45,7 +43,4 @@ class DateBetween : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/date_equals.cpp b/src/silo/query_engine/filter/expressions/date_equals.cpp index 6513da703..c479e39e0 100644 --- a/src/silo/query_engine/filter/expressions/date_equals.cpp +++ b/src/silo/query_engine/filter/expressions/date_equals.cpp @@ -3,7 +3,6 @@ #include #include -#include #include "silo/common/date32.h" #include "silo/query_engine/filter/expressions/expression.h" @@ -63,33 +62,4 @@ std::unique_ptr DateEquals::compile(const storage::Table& t ); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("column"), "The field 'column' is required in a DateEquals expression" - ); - CHECK_SILO_QUERY( - json["column"].is_string(), "The field 'column' in a DateEquals expression must be a string" - ); - CHECK_SILO_QUERY( - json.contains("value"), "The field 'value' is required in a DateEquals expression" - ); - CHECK_SILO_QUERY( - json["value"].is_null() || (json["value"].is_string() && !json["value"].empty()), - "The field 'value' in a DateEquals expression must be a non-empty date string or null" - ); - const std::string& column_name = json["column"]; - if (json["value"].is_string()) { - auto value = common::stringToDate32(json["value"].get()); - CHECK_SILO_QUERY( - value.has_value(), - "The value for the DateEquals expression is not a valid date: {}", - value.error() - ); - filter = std::make_unique(column_name, value.value()); - } else { - filter = std::make_unique(column_name, std::nullopt); - } -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/date_equals.h b/src/silo/query_engine/filter/expressions/date_equals.h index 11b3cf992..5f7fbf51d 100644 --- a/src/silo/query_engine/filter/expressions/date_equals.h +++ b/src/silo/query_engine/filter/expressions/date_equals.h @@ -4,8 +4,6 @@ #include #include -#include - #include "silo/common/date32.h" #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -31,7 +29,4 @@ class DateEquals : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/exact.cpp b/src/silo/query_engine/filter/expressions/exact.cpp index 87ad5b9d7..2c3ce3882 100644 --- a/src/silo/query_engine/filter/expressions/exact.cpp +++ b/src/silo/query_engine/filter/expressions/exact.cpp @@ -5,7 +5,6 @@ #include #include -#include #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -33,11 +32,4 @@ std::unique_ptr Exact::compile(const storage::Table& /*tabl throw QueryCompilationException{"Exact expression must be elimitated in query rewrite phase"}; } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY(json.contains("child"), "The field 'child' is required in a Exact expression"); - auto child = json["child"].get>(); - filter = std::make_unique(std::move(child)); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/exact.h b/src/silo/query_engine/filter/expressions/exact.h index 804a09243..f8f58a194 100644 --- a/src/silo/query_engine/filter/expressions/exact.h +++ b/src/silo/query_engine/filter/expressions/exact.h @@ -3,8 +3,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -27,7 +25,4 @@ class Exact : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/expression.cpp b/src/silo/query_engine/filter/expressions/expression.cpp index 266f1aba5..a4b0b937e 100644 --- a/src/silo/query_engine/filter/expressions/expression.cpp +++ b/src/silo/query_engine/filter/expressions/expression.cpp @@ -2,8 +2,6 @@ #include -#include - #include "silo/common/aa_symbols.h" #include "silo/common/nucleotide_symbols.h" #include "silo/query_engine/filter/expressions/and.h" @@ -46,76 +44,4 @@ Expression::AmbiguityMode invertMode(Expression::AmbiguityMode mode) { return mode; } -// NOLINTNEXTLINE(readability-identifier-naming,readability-function-cognitive-complexity) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY(json.contains("type"), "The field 'type' is required in any filter expression"); - CHECK_SILO_QUERY( - json["type"].is_string(), - "The field 'type' in all filter expressions needs to be a string, but is: {}", - json["type"].dump() - ); - const std::string expression_type = json["type"]; - if (expression_type == "True") { - filter = json.get>(); - } else if (expression_type == "False") { - filter = json.get>(); - } else if (expression_type == "And") { - filter = json.get>(); - } else if (expression_type == "Or") { - filter = json.get>(); - } else if (expression_type == "N-Of") { - filter = json.get>(); - } else if (expression_type == "Not") { - filter = json.get>(); - } else if (expression_type == "DateBetween") { - filter = json.get>(); - } else if (expression_type == "DateEquals") { - filter = json.get>(); - } else if (expression_type == "NucleotideEquals") { - filter = json.get>>(); - } else if (expression_type == "HasNucleotideMutation") { - filter = json.get>>(); - } else if (expression_type == "AminoAcidEquals") { - filter = json.get>>(); - } else if (expression_type == "HasAminoAcidMutation") { - filter = json.get>>(); - } else if (expression_type == "Lineage") { - filter = json.get>(); - } else if (expression_type == "PhyloDescendantOf") { - filter = json.get>(); - } else if (expression_type == "StringEquals") { - filter = json.get>(); - } else if (expression_type == "StringInSet") { - filter = json.get>(); - } else if (expression_type == "StringSearch") { - filter = json.get>(); - } else if (expression_type == "BooleanEquals") { - filter = json.get>(); - } else if (expression_type == "IntEquals") { - filter = json.get>(); - } else if (expression_type == "IntBetween") { - filter = json.get>(); - } else if (expression_type == "FloatEquals") { - filter = json.get>(); - } else if (expression_type == "FloatBetween") { - filter = json.get>(); - } else if (expression_type == "Maybe") { - filter = json.get>(); - } else if (expression_type == "Exact") { - filter = json.get>(); - } else if (expression_type == "InsertionContains") { - filter = json.get>>(); - } else if (expression_type == "AminoAcidInsertionContains") { - filter = json.get>>(); - } else if (expression_type == "IsNull") { - filter = json.get>(); - } else if (expression_type == "IsNotNull") { - filter = std::make_unique(json.get>()); - } else { - throw query_engine::IllegalQueryException( - "Unknown object filter type '" + expression_type + "'" - ); - } -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/expression.h b/src/silo/query_engine/filter/expressions/expression.h index c84d72ce8..1380e9ec0 100644 --- a/src/silo/query_engine/filter/expressions/expression.h +++ b/src/silo/query_engine/filter/expressions/expression.h @@ -3,8 +3,6 @@ #include #include -#include - #include "silo/query_engine/filter/operators/operator.h" #include "silo/storage/table.h" @@ -34,9 +32,6 @@ class Expression { ) const = 0; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - Expression::AmbiguityMode invertMode(Expression::AmbiguityMode mode); template diff --git a/src/silo/query_engine/filter/expressions/false.cpp b/src/silo/query_engine/filter/expressions/false.cpp index 441a2d116..06b298cde 100644 --- a/src/silo/query_engine/filter/expressions/false.cpp +++ b/src/silo/query_engine/filter/expressions/false.cpp @@ -25,9 +25,4 @@ std::unique_ptr False::compile(const storage::Table& table) return std::make_unique(table.sequence_count); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& /*json*/, std::unique_ptr& filter) { - filter = std::make_unique(); -} - } // namespace silo::query_engine::filter::expressions \ No newline at end of file diff --git a/src/silo/query_engine/filter/expressions/false.h b/src/silo/query_engine/filter/expressions/false.h index 1dcac288c..fe701cfac 100644 --- a/src/silo/query_engine/filter/expressions/false.h +++ b/src/silo/query_engine/filter/expressions/false.h @@ -3,8 +3,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -25,7 +23,4 @@ class False : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/float_between.cpp b/src/silo/query_engine/filter/expressions/float_between.cpp index 8af82d6cb..aaca40730 100644 --- a/src/silo/query_engine/filter/expressions/float_between.cpp +++ b/src/silo/query_engine/filter/expressions/float_between.cpp @@ -4,8 +4,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/complement.h" #include "silo/query_engine/filter/operators/index_scan.h" @@ -74,36 +72,4 @@ std::unique_ptr FloatBetween::compile(const storage::Table& return std::make_unique(std::move(predicates), table.sequence_count); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("column"), "The field 'column' is required in a FloatBetween expression" - ); - CHECK_SILO_QUERY( - json["column"].is_string(), "The field 'column' in a FloatBetween expression must be a string" - ); - CHECK_SILO_QUERY( - json.contains("from"), "The field 'from' is required in FloatBetween expression" - ); - CHECK_SILO_QUERY( - json["from"].is_null() || json["from"].is_number_float(), - "The field 'from' in a FloatBetween expression must be a float or null" - ); - CHECK_SILO_QUERY(json.contains("to"), "The field 'to' is required in a FloatBetween expression"); - CHECK_SILO_QUERY( - json["to"].is_null() || json["to"].is_number_float(), - "The field 'to' in a FloatBetween expression must be a float or null" - ); - const std::string& column_name = json["column"]; - std::optional value_from; - if (json["from"].is_number_float()) { - value_from = json["from"].get(); - } - std::optional value_to; - if (json["to"].is_number_float()) { - value_to = json["to"].get(); - } - filter = std::make_unique(column_name, value_from, value_to); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/float_between.h b/src/silo/query_engine/filter/expressions/float_between.h index 462aa7d50..6a103e4da 100644 --- a/src/silo/query_engine/filter/expressions/float_between.h +++ b/src/silo/query_engine/filter/expressions/float_between.h @@ -4,8 +4,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -35,7 +33,4 @@ class FloatBetween : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/float_equals.cpp b/src/silo/query_engine/filter/expressions/float_equals.cpp index 5cfdd1e44..208b545fd 100644 --- a/src/silo/query_engine/filter/expressions/float_equals.cpp +++ b/src/silo/query_engine/filter/expressions/float_equals.cpp @@ -5,7 +5,6 @@ #include #include -#include #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/index_scan.h" @@ -57,27 +56,4 @@ std::unique_ptr FloatEquals::compile(const storage::Table& ); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("column"), "The field 'column' is required in a FloatEquals expression" - ); - CHECK_SILO_QUERY( - json["column"].is_string(), "The field 'column' in a FloatEquals expression must be a string" - ); - CHECK_SILO_QUERY( - json.contains("value"), "The field 'value' is required in a FloatEquals expression" - ); - CHECK_SILO_QUERY( - json["value"].is_number_float() || json["value"].is_null(), - "The field 'value' in a FloatEquals expression must be a float or null" - ); - const std::string& column_name = json["column"]; - std::optional value; - if (!json["value"].is_null()) { - value = json["value"].get(); - } - filter = std::make_unique(column_name, value); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/float_equals.h b/src/silo/query_engine/filter/expressions/float_equals.h index 6deb1ba45..7854e11b3 100644 --- a/src/silo/query_engine/filter/expressions/float_equals.h +++ b/src/silo/query_engine/filter/expressions/float_equals.h @@ -3,8 +3,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -29,7 +27,4 @@ class FloatEquals : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/has_mutation.cpp b/src/silo/query_engine/filter/expressions/has_mutation.cpp index 3376afa43..4891b87fc 100644 --- a/src/silo/query_engine/filter/expressions/has_mutation.cpp +++ b/src/silo/query_engine/filter/expressions/has_mutation.cpp @@ -4,8 +4,6 @@ #include #include -#include - #include "silo/common/nucleotide_symbols.h" #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/expressions/symbol_in_set.h" @@ -87,41 +85,6 @@ std::unique_ptr HasMutation::compile( }; } -template -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr>& filter) { - CHECK_SILO_QUERY( - json.contains("position"), - "The field 'position' is required in a Has{}Mutation expression", - SymbolType::SYMBOL_NAME - ); - CHECK_SILO_QUERY( - json["position"].is_number_unsigned(), - "The field 'position' in a Has{}Mutation expression needs to be an unsigned integer", - SymbolType::SYMBOL_NAME - ); - std::optional nuc_sequence_name; - if (json.contains("sequenceName")) { - nuc_sequence_name = json["sequenceName"].get(); - } - const uint32_t position_idx_1_indexed = json["position"].get(); - CHECK_SILO_QUERY( - position_idx_1_indexed > 0, "The field 'position' is 1-indexed. Value of 0 not allowed." - ); - const uint32_t position_idx = position_idx_1_indexed - 1; - filter = std::make_unique>(nuc_sequence_name, position_idx); -} - -template void from_json( - const nlohmann::json& json, - std::unique_ptr>& filter -); - -template void from_json( - const nlohmann::json& json, - std::unique_ptr>& filter -); - template class HasMutation; template class HasMutation; diff --git a/src/silo/query_engine/filter/expressions/has_mutation.h b/src/silo/query_engine/filter/expressions/has_mutation.h index f0dade5fc..9598b7148 100644 --- a/src/silo/query_engine/filter/expressions/has_mutation.h +++ b/src/silo/query_engine/filter/expressions/has_mutation.h @@ -5,8 +5,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -32,8 +30,4 @@ class HasMutation : public Expression { ) const override; }; -template -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr>& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/insertion_contains.cpp b/src/silo/query_engine/filter/expressions/insertion_contains.cpp index ef7948877..fa0569ef5 100644 --- a/src/silo/query_engine/filter/expressions/insertion_contains.cpp +++ b/src/silo/query_engine/filter/expressions/insertion_contains.cpp @@ -4,7 +4,6 @@ #include #include -#include #include "silo/common/aa_symbols.h" #include "silo/common/nucleotide_symbols.h" @@ -90,49 +89,6 @@ std::unique_ptr InsertionContains::compile( ); } -template -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr>& filter) { - CHECK_SILO_QUERY( - json.contains("position"), - "The field 'position' is required in an InsertionContains expression" - ); - CHECK_SILO_QUERY( - json["position"].is_number_unsigned(), - "The field 'position' in an InsertionContains expression needs to be an unsigned integer" - ); - CHECK_SILO_QUERY( - json.contains("value"), "The field 'value' is required in an InsertionContains expression" - ); - CHECK_SILO_QUERY( - json["value"].is_string() && !json["value"].is_null(), - "The field 'value' in an InsertionContains expression needs to be a string" - ); - std::optional sequence_name = std::nullopt; - if (json.contains("sequenceName")) { - sequence_name = json["sequenceName"].get(); - } - const uint32_t position_idx = json["position"].get(); - const std::string& value = json["value"].get(); - CHECK_SILO_QUERY( - !value.empty(), - "The field 'value' in an InsertionContains expression must not be an empty string" - ); - filter = std::make_unique>(sequence_name, position_idx, value); -} - -// NOLINTNEXTLINE(readability-identifier-naming) -template void from_json( - const nlohmann::json& json, - std::unique_ptr>& filter -); - -// NOLINTNEXTLINE(readability-identifier-naming) -template void from_json( - const nlohmann::json& json, - std::unique_ptr>& filter -); - template class InsertionContains; template class InsertionContains; diff --git a/src/silo/query_engine/filter/expressions/insertion_contains.h b/src/silo/query_engine/filter/expressions/insertion_contains.h index 8e2f02cf1..bc20e74bc 100644 --- a/src/silo/query_engine/filter/expressions/insertion_contains.h +++ b/src/silo/query_engine/filter/expressions/insertion_contains.h @@ -5,8 +5,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -37,8 +35,4 @@ class InsertionContains : public Expression { ) const override; }; -template -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr>& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/int_between.cpp b/src/silo/query_engine/filter/expressions/int_between.cpp index 0a71d82bd..b27afd2b8 100644 --- a/src/silo/query_engine/filter/expressions/int_between.cpp +++ b/src/silo/query_engine/filter/expressions/int_between.cpp @@ -5,7 +5,6 @@ #include #include -#include #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/complement.h" @@ -81,36 +80,4 @@ std::unique_ptr IntBetween::compile(const storage::Table& t return std::move(result); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("column"), "The field 'column' is required in a IntBetween expression" - ); - CHECK_SILO_QUERY( - json["column"].is_string(), "The field 'column' in a IntBetween expression must be a string" - ); - CHECK_SILO_QUERY(json.contains("from"), "The field 'from' is required in IntBetween expression"); - CHECK_SILO_QUERY( - json["from"].is_number_integer() || json["from"].is_null(), - "The field 'from' in an IntBetween expression must be an integer in [-2147483648; " - "2147483647] or null" - ); - CHECK_SILO_QUERY(json.contains("to"), "The field 'to' is required in a IntBetween expression"); - CHECK_SILO_QUERY( - (json["to"].is_number_integer() && json["to"].is_number_integer()) || json["to"].is_null(), - "The field 'to' in an IntBetween expression must be an integer in [-2147483648; 2147483647] " - "or null" - ); - const std::string& column_name = json["column"]; - std::optional value_from; - if (json["from"].is_number_integer()) { - value_from = json["from"].get(); - } - std::optional value_to; - if (json["to"].is_number_integer()) { - value_to = json["to"].get(); - } - filter = std::make_unique(column_name, value_from, value_to); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/int_between.h b/src/silo/query_engine/filter/expressions/int_between.h index 1051b812c..29c36fc73 100644 --- a/src/silo/query_engine/filter/expressions/int_between.h +++ b/src/silo/query_engine/filter/expressions/int_between.h @@ -36,7 +36,4 @@ class IntBetween : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/int_equals.cpp b/src/silo/query_engine/filter/expressions/int_equals.cpp index 4badb538d..263fb2d16 100644 --- a/src/silo/query_engine/filter/expressions/int_equals.cpp +++ b/src/silo/query_engine/filter/expressions/int_equals.cpp @@ -55,28 +55,4 @@ std::unique_ptr IntEquals::compile(const storage::Table& ta ); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("column"), "The field 'column' is required in an IntEquals expression" - ); - CHECK_SILO_QUERY( - json["column"].is_string(), "The field 'column' in an IntEquals expression must be a string" - ); - CHECK_SILO_QUERY( - json.contains("value"), "The field 'value' is required in an IntEquals expression" - ); - CHECK_SILO_QUERY( - json["value"].is_number_integer() || json["value"].is_null(), - "The field 'value' in an IntEquals expression must be an integer in [-2147483648; " - "2147483647] or null" - ); - const std::string& column = json["column"]; - std::optional value; - if (!json["value"].is_null()) { - value = json["value"].get(); - } - filter = std::make_unique(column, value); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/int_equals.h b/src/silo/query_engine/filter/expressions/int_equals.h index 319daa6ff..4a97092cb 100644 --- a/src/silo/query_engine/filter/expressions/int_equals.h +++ b/src/silo/query_engine/filter/expressions/int_equals.h @@ -30,7 +30,4 @@ class IntEquals : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/is_null.cpp b/src/silo/query_engine/filter/expressions/is_null.cpp index 993d913bf..37ee28355 100644 --- a/src/silo/query_engine/filter/expressions/is_null.cpp +++ b/src/silo/query_engine/filter/expressions/is_null.cpp @@ -3,7 +3,6 @@ #include #include -#include #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/empty.h" @@ -45,16 +44,4 @@ std::unique_ptr IsNull::compile(const storage::Table& table }); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("column"), "The field 'column' is required in an IsNull expression" - ); - CHECK_SILO_QUERY( - json["column"].is_string(), "The field 'column' in an IsNull expression must be a string" - ); - const std::string& column_name = json["column"]; - filter = std::make_unique(column_name); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/is_null.h b/src/silo/query_engine/filter/expressions/is_null.h index dbfc8180c..f89a18b47 100644 --- a/src/silo/query_engine/filter/expressions/is_null.h +++ b/src/silo/query_engine/filter/expressions/is_null.h @@ -3,8 +3,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -28,7 +26,4 @@ class IsNull : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/is_null.test.cpp b/src/silo/query_engine/filter/expressions/is_null.test.cpp index 5b4ff0e84..f61a0f402 100644 --- a/src/silo/query_engine/filter/expressions/is_null.test.cpp +++ b/src/silo/query_engine/filter/expressions/is_null.test.cpp @@ -85,136 +85,49 @@ const QueryTestData TEST_DATA{ const QueryTestScenario IS_NULL_STRING_COLUMN = { .name = "IS_NULL_STRING_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "IsNull", - "column": "stringField" - } -})" - ), + .query = "default.filter(stringField.isNull()).project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([{"primaryKey":"id_1"},{"primaryKey":"id_7"}])") }; const QueryTestScenario IS_NULL_INDEXED_STRING_COLUMN = { .name = "IS_NULL_INDEXED_STRING_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "IsNull", - "column": "indexedStringField" - } -})" - ), + .query = "default.filter(indexedStringField.isNull()).project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([{"primaryKey":"id_2"},{"primaryKey":"id_7"}])") }; const QueryTestScenario IS_NULL_INT_COLUMN = { .name = "IS_NULL_INT_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "IsNull", - "column": "intField" - } -})" - ), + .query = "default.filter(intField.isNull()).project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([{"primaryKey":"id_3"},{"primaryKey":"id_7"}])") }; const QueryTestScenario IS_NULL_FLOAT_COLUMN = { .name = "IS_NULL_FLOAT_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "IsNull", - "column": "floatField" - } -})" - ), + .query = "default.filter(floatField.isNull()).project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([{"primaryKey":"id_4"},{"primaryKey":"id_7"}])") }; const QueryTestScenario IS_NULL_BOOL_COLUMN = { .name = "IS_NULL_BOOL_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "IsNull", - "column": "boolField" - } -})" - ), + .query = "default.filter(boolField.isNull()).project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([{"primaryKey":"id_5"},{"primaryKey":"id_7"}])") }; const QueryTestScenario IS_NULL_DATE_COLUMN = { .name = "IS_NULL_DATE_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "IsNull", - "column": "dateField" - } -})" - ), + .query = "default.filter(dateField.isNull()).project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([{"primaryKey":"id_6"},{"primaryKey":"id_7"}])") }; const QueryTestScenario IS_NULL_NEGATED = { .name = "IS_NULL_NEGATED", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "Not", - "child": { - "type": "IsNull", - "column": "stringField" - } - } -})" - ), + .query = "default.filter(!(stringField.isNull())).project(primaryKey)", .expected_query_result = nlohmann::json::parse( R"([{"primaryKey":"id_0"},{"primaryKey":"id_2"},{"primaryKey":"id_3"},{"primaryKey":"id_4"},{"primaryKey":"id_5"},{"primaryKey":"id_6"}])" ) @@ -222,19 +135,7 @@ const QueryTestScenario IS_NULL_NEGATED = { const QueryTestScenario IS_NOT_NULL = { .name = "IS_NOT_NULL", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "IsNotNull", - "column": "stringField" - } -})" - ), + .query = "default.filter(stringField.isNotNull()).project(primaryKey)", .expected_query_result = nlohmann::json::parse( R"([{"primaryKey":"id_0"},{"primaryKey":"id_2"},{"primaryKey":"id_3"},{"primaryKey":"id_4"},{"primaryKey":"id_5"},{"primaryKey":"id_6"}])" ) @@ -242,84 +143,10 @@ const QueryTestScenario IS_NOT_NULL = { const QueryTestScenario IS_NULL_WITH_AND = { .name = "IS_NULL_WITH_AND", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "And", - "children": [ - { - "type": "IsNull", - "column": "stringField" - }, - { - "type": "IsNull", - "column": "intField" - } - ] - } -})" - ), + .query = "default.filter(stringField.isNull() && intField.isNull()).project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([{"primaryKey":"id_7"}])") }; -const QueryTestScenario IS_NULL_MISSING_COLUMN = { - .name = "IS_NULL_MISSING_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details" - }, - "filterExpression": { - "type": "IsNull" - } -})" - ), - .expected_query_result = {}, - .expected_error_message = "The field 'column' is required in an IsNull expression" -}; - -const QueryTestScenario IS_NULL_INVALID_COLUMN_TYPE = { - .name = "IS_NULL_INVALID_COLUMN_TYPE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details" - }, - "filterExpression": { - "type": "IsNull", - "column": 123 - } -})" - ), - .expected_query_result = {}, - .expected_error_message = "The field 'column' in an IsNull expression must be a string" -}; - -const QueryTestScenario IS_NULL_NONEXISTENT_COLUMN = { - .name = "IS_NULL_NONEXISTENT_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details" - }, - "filterExpression": { - "type": "IsNull", - "column": "nonexistent" - } -})" - ), - .expected_query_result = {}, - .expected_error_message = "The column 'nonexistent' is not contained in the database" -}; - } // namespace QUERY_TEST( @@ -334,9 +161,6 @@ QUERY_TEST( IS_NULL_DATE_COLUMN, IS_NULL_NEGATED, IS_NOT_NULL, - IS_NULL_WITH_AND, - IS_NULL_MISSING_COLUMN, - IS_NULL_INVALID_COLUMN_TYPE, - IS_NULL_NONEXISTENT_COLUMN + IS_NULL_WITH_AND ) ); diff --git a/src/silo/query_engine/filter/expressions/lineage_filter.cpp b/src/silo/query_engine/filter/expressions/lineage_filter.cpp index 3c45db9fc..854412504 100644 --- a/src/silo/query_engine/filter/expressions/lineage_filter.cpp +++ b/src/silo/query_engine/filter/expressions/lineage_filter.cpp @@ -7,7 +7,6 @@ #include #include -#include #include "silo/query_engine/filter/operators/empty.h" #include "silo/query_engine/filter/operators/index_scan.h" @@ -94,81 +93,4 @@ std::unique_ptr LineageFilter::compile(const storage::Table ); } -namespace { - -const std::string COLUMN_FIELD_NAME = "column"; -const std::string VALUE_FIELD_NAME = "value"; -const std::string INCLUDE_SUBLINEAGES_FIELD_NAME = "includeSublineages"; -const std::string RECOMBINANT_FOLLOWING_MODE_FIELD_NAME = "recombinantFollowingMode"; -} // namespace - -// NOLINTNEXTLINE(readability-identifier-naming, readability-function-cognitive-complexity) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains(COLUMN_FIELD_NAME), - "The field '{}' is required in a Lineage expression", - COLUMN_FIELD_NAME - ); - CHECK_SILO_QUERY( - json[COLUMN_FIELD_NAME].is_string(), - "The field '{}' in a Lineage expression needs to be a string", - COLUMN_FIELD_NAME - ); - const std::string& column_name = json[COLUMN_FIELD_NAME]; - - std::optional lineage; - CHECK_SILO_QUERY( - json.contains(VALUE_FIELD_NAME), - "The field '{}' is required in a Lineage expression", - VALUE_FIELD_NAME - ); - CHECK_SILO_QUERY( - json[VALUE_FIELD_NAME].is_string() || json[VALUE_FIELD_NAME].is_null(), - "The field '{}' in a Lineage expression needs to be a string or null", - VALUE_FIELD_NAME - ); - if (json[VALUE_FIELD_NAME].is_string()) { - lineage = json[VALUE_FIELD_NAME].get(); - } - - CHECK_SILO_QUERY( - json.contains(INCLUDE_SUBLINEAGES_FIELD_NAME), - "The field '{}' is required in a Lineage expression", - INCLUDE_SUBLINEAGES_FIELD_NAME - ); - CHECK_SILO_QUERY( - json[INCLUDE_SUBLINEAGES_FIELD_NAME].is_boolean(), - "The field '{}' in a Lineage expression needs to be a boolean", - INCLUDE_SUBLINEAGES_FIELD_NAME - ); - const bool include_sublineages = json[INCLUDE_SUBLINEAGES_FIELD_NAME]; - std::optional sublineage_mode = std::nullopt; - if (include_sublineages) { - sublineage_mode = RecombinantEdgeFollowingMode::DO_NOT_FOLLOW; - if (json.contains(RECOMBINANT_FOLLOWING_MODE_FIELD_NAME)) { - static std::unordered_map - recombinant_following_mode_options{ - {"doNotFollow", RecombinantEdgeFollowingMode::DO_NOT_FOLLOW}, - {"followIfFullyContainedInClade", - RecombinantEdgeFollowingMode::FOLLOW_IF_FULLY_CONTAINED_IN_CLADE}, - {"alwaysFollow", RecombinantEdgeFollowingMode::ALWAYS_FOLLOW} - }; - CHECK_SILO_QUERY( - json.at(RECOMBINANT_FOLLOWING_MODE_FIELD_NAME).is_string() && - recombinant_following_mode_options.contains( - json.at(RECOMBINANT_FOLLOWING_MODE_FIELD_NAME).get() - ), - "The field '{}' in a Lineage expression needs to be one of: {}", - RECOMBINANT_FOLLOWING_MODE_FIELD_NAME, - fmt::join(recombinant_following_mode_options | std::views::keys, ",") - ); - sublineage_mode = recombinant_following_mode_options.at( - json.at(RECOMBINANT_FOLLOWING_MODE_FIELD_NAME).get() - ); - } - } - - filter = std::make_unique(column_name, lineage, sublineage_mode); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/lineage_filter.h b/src/silo/query_engine/filter/expressions/lineage_filter.h index b6c2519a6..4be2d8a81 100644 --- a/src/silo/query_engine/filter/expressions/lineage_filter.h +++ b/src/silo/query_engine/filter/expressions/lineage_filter.h @@ -38,7 +38,4 @@ class LineageFilter : public Expression { ) const; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/lineage_filter.test.cpp b/src/silo/query_engine/filter/expressions/lineage_filter.test.cpp index 30dd37483..07d1c6e6e 100644 --- a/src/silo/query_engine/filter/expressions/lineage_filter.test.cpp +++ b/src/silo/query_engine/filter/expressions/lineage_filter.test.cpp @@ -86,20 +86,9 @@ const QueryTestData TEST_DATA{ .lineage_trees = {{"test_lineage_index", LINEAGE_TREE}} }; -nlohmann::json createLineageQuery(const nlohmann::json value, bool include_sublineages) { - return { - {"action", {{"type", "Details"}}}, - {"filterExpression", - {{"type", "Lineage"}, - {"column", "pango_lineage"}, - {"value", value}, - {"includeSublineages", include_sublineages}}} - }; -} - const QueryTestScenario LINEAGE_FILTER_SCENARIO = { .name = "lineageFilter", - .query = createLineageQuery(SOME_BASE_LINEAGE, false), + .query = "default.filter(pango_lineage.lineage('BASE.1')).project({pango_lineage, primaryKey})", .expected_query_result = nlohmann::json( {{{"primaryKey", "id_0"}, {"pango_lineage", SOME_BASE_LINEAGE}}, {{"primaryKey", "id_1"}, {"pango_lineage", SOME_BASE_LINEAGE}}} @@ -108,7 +97,9 @@ const QueryTestScenario LINEAGE_FILTER_SCENARIO = { const QueryTestScenario LINEAGE_FILTER_INCLUDING_SUBLINEAGES_SCENARIO = { .name = "lineageFilterIncludingSublineages", - .query = createLineageQuery(SOME_BASE_LINEAGE, true), + .query = + "default.filter(pango_lineage.lineage('BASE.1', " + "includeSublineages:=true)).project({pango_lineage, primaryKey})", .expected_query_result = nlohmann::json( {{{"primaryKey", "id_0"}, {"pango_lineage", SOME_BASE_LINEAGE}}, {{"primaryKey", "id_1"}, {"pango_lineage", SOME_BASE_LINEAGE}}, @@ -118,29 +109,23 @@ const QueryTestScenario LINEAGE_FILTER_INCLUDING_SUBLINEAGES_SCENARIO = { const QueryTestScenario LINEAGE_FILTER_NULL_SCENARIO = { .name = "lineageFilterNull", - .query = createLineageQuery(nullptr, false), + .query = "default.filter(pango_lineage.lineage(null)).project({pango_lineage, primaryKey})", .expected_query_result = nlohmann::json({{{"primaryKey", "id_3"}, {"pango_lineage", nullptr}}}) }; const QueryTestScenario LINEAGE_FILTER_NULL_INCLUDING_SUBLINEAGES_SCENARIO = { .name = "lineageFilterNullIncludingSublineages", - .query = createLineageQuery(nullptr, true), + .query = + "default.filter(pango_lineage.lineage(null, " + "includeSublineages:=true)).project({pango_lineage, primaryKey})", .expected_query_result = nlohmann::json({{{"primaryKey", "id_3"}, {"pango_lineage", nullptr}}}) }; const QueryTestScenario FILTER_INCLUDING_RECOMBINANTS = { .name = "FILTER_INCLUDING_RECOMBINANTS", - .query = nlohmann::json::parse(R"( -{ - "action": {"type": "Details"}, - "filterExpression": { - "type": "Lineage", - "column": "pango_lineage", - "value": "CHILD", - "includeSublineages": true, - "recombinantFollowingMode": "alwaysFollow" - } -})"), + .query = + "default.filter(pango_lineage.lineage('CHILD', includeSublineages:=true, " + "recombinantFollowingMode:='alwaysFollow')).project({pango_lineage, primaryKey})", .expected_query_result = nlohmann::json::parse(R"( [{"pango_lineage":"CHILD","primaryKey":"id_2"}, {"pango_lineage":"RECOMBINANT","primaryKey":"id_4"}] @@ -149,17 +134,10 @@ const QueryTestScenario FILTER_INCLUDING_RECOMBINANTS = { const QueryTestScenario FILTER_INCLUDING_CONTAINED_RECOMBINANTS = { .name = "FILTER_INCLUDING_CONTAINED_RECOMBINANTS", - .query = nlohmann::json::parse(R"( -{ - "action": {"type": "Details"}, - "filterExpression": { - "type": "Lineage", - "column": "pango_lineage", - "value": "BASE.1", - "includeSublineages": true, - "recombinantFollowingMode": "followIfFullyContainedInClade" - } -})"), + .query = + "default.filter(pango_lineage.lineage('BASE.1', includeSublineages:=true, " + "recombinantFollowingMode:='followIfFullyContainedInClade')).project({pango_lineage, " + "primaryKey})", .expected_query_result = nlohmann::json::parse(R"( [{"pango_lineage":"BASE.1","primaryKey":"id_0"}, {"pango_lineage":"BASE.1","primaryKey":"id_1"}, @@ -170,17 +148,10 @@ const QueryTestScenario FILTER_INCLUDING_CONTAINED_RECOMBINANTS = { const QueryTestScenario DOES_NOT_FILTER_NON_INCLUDED_RECOMBINANTS = { .name = "DOES_NOT_FILTER_NON_INCLUDED_RECOMBINANTS", - .query = nlohmann::json::parse(R"( -{ - "action": {"type": "Details"}, - "filterExpression": { - "type": "Lineage", - "column": "pango_lineage", - "value": "CHILD", - "includeSublineages": true, - "recombinantFollowingMode": "followIfFullyContainedInClade" - } -})"), + .query = + "default.filter(pango_lineage.lineage('CHILD', includeSublineages:=true, " + "recombinantFollowingMode:='followIfFullyContainedInClade')).project({pango_lineage, " + "primaryKey})", .expected_query_result = nlohmann::json::parse(R"( [{"pango_lineage":"CHILD","primaryKey":"id_2"}] )") @@ -188,17 +159,9 @@ const QueryTestScenario DOES_NOT_FILTER_NON_INCLUDED_RECOMBINANTS = { const QueryTestScenario EXPLICIT_DO_NOT_FOLLOW = { .name = "EXPLICIT_DO_NOT_FOLLOW", - .query = nlohmann::json::parse(R"( -{ - "action": {"type": "Details"}, - "filterExpression": { - "type": "Lineage", - "column": "pango_lineage", - "value": "BASE.1", - "includeSublineages": true, - "recombinantFollowingMode": "doNotFollow" - } -})"), + .query = + "default.filter(pango_lineage.lineage('BASE.1', includeSublineages:=true, " + "recombinantFollowingMode:='doNotFollow')).project({pango_lineage, primaryKey})", .expected_query_result = nlohmann::json::parse(R"( [{"pango_lineage":"BASE.1","primaryKey":"id_0"}, {"pango_lineage":"BASE.1","primaryKey":"id_1"}, diff --git a/src/silo/query_engine/filter/expressions/maybe.cpp b/src/silo/query_engine/filter/expressions/maybe.cpp index 27103e8fa..7fdcd46b2 100644 --- a/src/silo/query_engine/filter/expressions/maybe.cpp +++ b/src/silo/query_engine/filter/expressions/maybe.cpp @@ -32,11 +32,4 @@ std::unique_ptr Maybe::compile(const storage::Table& /*tabl throw QueryCompilationException{"Maybe expression must be elimitated in query rewrite phase"}; } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY(json.contains("child"), "The field 'child' is required in a Maybe expression"); - auto child = json["child"].get>(); - filter = std::make_unique(std::move(child)); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/maybe.h b/src/silo/query_engine/filter/expressions/maybe.h index 93f8ed3e4..c8f63b102 100644 --- a/src/silo/query_engine/filter/expressions/maybe.h +++ b/src/silo/query_engine/filter/expressions/maybe.h @@ -27,7 +27,4 @@ class Maybe : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/negation.cpp b/src/silo/query_engine/filter/expressions/negation.cpp index 01585a276..e8f2bbfa5 100644 --- a/src/silo/query_engine/filter/expressions/negation.cpp +++ b/src/silo/query_engine/filter/expressions/negation.cpp @@ -28,11 +28,4 @@ std::unique_ptr Negation::compile(const storage::Table& tab return operators::Operator::negate(child->compile(table)); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY(json.contains("child"), "The field 'child' is required in a Not expression"); - auto child = json["child"].get>(); - filter = std::make_unique(std::move(child)); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/negation.h b/src/silo/query_engine/filter/expressions/negation.h index 50d2816ba..5ff4bcd2a 100644 --- a/src/silo/query_engine/filter/expressions/negation.h +++ b/src/silo/query_engine/filter/expressions/negation.h @@ -30,7 +30,4 @@ class Negation : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/nof.cpp b/src/silo/query_engine/filter/expressions/nof.cpp index 0a0abe1c1..90b65186d 100644 --- a/src/silo/query_engine/filter/expressions/nof.cpp +++ b/src/silo/query_engine/filter/expressions/nof.cpp @@ -4,8 +4,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/and.h" #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/expressions/negation.h" @@ -262,34 +260,4 @@ std::unique_ptr NOf::compile(const storage::Table& table) c ); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("children"), "The field 'children' is required in an N-Of expression" - ); - CHECK_SILO_QUERY( - json["children"].is_array(), "The field 'children' in an N-Of expression needs to be an array" - ); - CHECK_SILO_QUERY( - json.contains("numberOfMatchers"), - "The field 'numberOfMatchers' is required in an N-Of expression" - ); - CHECK_SILO_QUERY( - json["numberOfMatchers"].is_number_unsigned(), - "The field 'numberOfMatchers' in an N-Of expression needs to be an unsigned integer" - ); - CHECK_SILO_QUERY( - json.contains("matchExactly"), "The field 'matchExactly' is required in an N-Of expression" - ); - CHECK_SILO_QUERY( - json["matchExactly"].is_boolean(), - "The field 'matchExactly' in an N-Of expression needs to be a boolean" - ); - - const uint32_t number_of_matchers = json["numberOfMatchers"]; - const bool match_exactly = json["matchExactly"]; - auto children = json["children"].get(); - filter = std::make_unique(std::move(children), number_of_matchers, match_exactly); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/nof.h b/src/silo/query_engine/filter/expressions/nof.h index 84734ff0f..1635ba68d 100644 --- a/src/silo/query_engine/filter/expressions/nof.h +++ b/src/silo/query_engine/filter/expressions/nof.h @@ -4,8 +4,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -44,7 +42,4 @@ class NOf : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/or.cpp b/src/silo/query_engine/filter/expressions/or.cpp index e79cbfb0e..efaa8e6d8 100644 --- a/src/silo/query_engine/filter/expressions/or.cpp +++ b/src/silo/query_engine/filter/expressions/or.cpp @@ -5,7 +5,6 @@ #include #include -#include #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/expressions/false.h" @@ -227,16 +226,4 @@ std::unique_ptr Or::compile(const storage::Table& table) co ); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("children"), "The field 'children' is required in an Or expression" - ); - CHECK_SILO_QUERY( - json["children"].is_array(), "The field 'children' in an Or expression needs to be an array" - ); - auto children = json["children"].get(); - filter = std::make_unique(std::move(children)); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/or.h b/src/silo/query_engine/filter/expressions/or.h index ac9440e8f..a6e21f97c 100644 --- a/src/silo/query_engine/filter/expressions/or.h +++ b/src/silo/query_engine/filter/expressions/or.h @@ -3,8 +3,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -40,7 +38,4 @@ class Or : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/or.test.cpp b/src/silo/query_engine/filter/expressions/or.test.cpp index 411c366dc..be2e1f9c9 100644 --- a/src/silo/query_engine/filter/expressions/or.test.cpp +++ b/src/silo/query_engine/filter/expressions/or.test.cpp @@ -438,31 +438,11 @@ const QueryTestData TEST_DATA{ .without_unaligned_sequences = true }; -// Test nested Or expressions - inner Or should be flattened during compilation const QueryTestScenario NESTED_OR_SAME_COLUMN = { .name = "NESTED_OR_SAME_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey", "country"] - }, - "filterExpression": { - "type": "Or", - "children": [ - { - "type": "Or", - "children": [ - {"type": "StringEquals", "column": "country", "value": "Switzerland"}, - {"type": "StringEquals", "column": "country", "value": "Germany"} - ] - }, - {"type": "StringEquals", "column": "country", "value": "France"} - ] - } -})" - ), + .query = + "default.filter((country = 'Switzerland' || country = 'Germany') || country = " + "'France').project({primaryKey, country})", .expected_query_result = nlohmann::json::parse( R"([ {"country":"Switzerland","primaryKey":"id_0"}, @@ -474,35 +454,11 @@ const QueryTestScenario NESTED_OR_SAME_COLUMN = { ) }; -// Test deeply nested Or expressions const QueryTestScenario DEEPLY_NESTED_OR = { .name = "DEEPLY_NESTED_OR", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey", "country"] - }, - "filterExpression": { - "type": "Or", - "children": [ - { - "type": "Or", - "children": [ - { - "type": "Or", - "children": [ - {"type": "StringEquals", "column": "country", "value": "Switzerland"} - ] - }, - {"type": "StringEquals", "column": "country", "value": "Germany"} - ] - } - ] - } -})" - ), + .query = + "default.filter(country = 'Switzerland' || country = 'Germany').project({primaryKey, " + "country})", .expected_query_result = nlohmann::json::parse( R"([ {"country":"Switzerland","primaryKey":"id_0"}, @@ -513,49 +469,19 @@ const QueryTestScenario DEEPLY_NESTED_OR = { ) }; -// Test Or with single child gets unwrapped during rewrite const QueryTestScenario OR_SINGLE_CHILD_UNWRAPPED = { .name = "OR_SINGLE_CHILD_UNWRAPPED", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey", "country"] - }, - "filterExpression": { - "type": "Or", - "children": [ - {"type": "StringEquals", "column": "country", "value": "Switzerland"} - ] - } -})" - ), + .query = "default.filter(country = 'Switzerland').project({primaryKey, country})", .expected_query_result = nlohmann::json::parse( R"([{"country":"Switzerland","primaryKey":"id_0"},{"country":"Switzerland","primaryKey":"id_3"}])" ) }; -// Test Or with multiple StringEquals on same indexed column get merged const QueryTestScenario OR_STRING_EQUALS_MERGED = { .name = "OR_STRING_EQUALS_MERGED", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey", "country"] - }, - "filterExpression": { - "type": "Or", - "children": [ - {"type": "StringEquals", "column": "country", "value": "Switzerland"}, - {"type": "StringEquals", "column": "country", "value": "Germany"}, - {"type": "StringEquals", "column": "country", "value": "France"} - ] - } -})" - ), + .query = + "default.filter(country = 'Switzerland' || country = 'Germany' || country = " + "'France').project({primaryKey, country})", .expected_query_result = nlohmann::json::parse( R"([ {"country":"Switzerland","primaryKey":"id_0"}, @@ -567,25 +493,10 @@ const QueryTestScenario OR_STRING_EQUALS_MERGED = { ) }; -// Test Or with mixed columns - should not merge different columns const QueryTestScenario OR_MIXED_COLUMNS = { .name = "OR_MIXED_COLUMNS", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey", "country", "region"] - }, - "filterExpression": { - "type": "Or", - "children": [ - {"type": "StringEquals", "column": "country", "value": "USA"}, - {"type": "StringEquals", "column": "region", "value": "Europe"} - ] - } -})" - ), + .query = + "default.filter(country = 'USA' || region = 'Europe').project({primaryKey, country, region})", .expected_query_result = nlohmann::json::parse( R"([ {"country":"Switzerland","primaryKey":"id_0","region":"Europe"}, @@ -598,31 +509,11 @@ const QueryTestScenario OR_MIXED_COLUMNS = { ) }; -// Test nested Or with And - should not flatten Or across And const QueryTestScenario OR_WITH_AND = { .name = "OR_WITH_AND", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey", "country", "region"] - }, - "filterExpression": { - "type": "Or", - "children": [ - { - "type": "And", - "children": [ - {"type": "StringEquals", "column": "country", "value": "Switzerland"}, - {"type": "StringEquals", "column": "region", "value": "Europe"} - ] - }, - {"type": "StringEquals", "column": "country", "value": "USA"} - ] - } -})" - ), + .query = + "default.filter((country = 'Switzerland' && region = 'Europe') || country = " + "'USA').project({primaryKey, country, region})", .expected_query_result = nlohmann::json::parse( R"([ {"country":"Switzerland","primaryKey":"id_0","region":"Europe"}, @@ -632,22 +523,9 @@ const QueryTestScenario OR_WITH_AND = { ) }; -// Test empty Or returns empty result const QueryTestScenario OR_EMPTY_CHILDREN = { .name = "OR_EMPTY_CHILDREN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "Or", - "children": [] - } -})" - ), + .query = "default.filter(false).project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([])") }; diff --git a/src/silo/query_engine/filter/expressions/phylo_child_filter.cpp b/src/silo/query_engine/filter/expressions/phylo_child_filter.cpp index 5900e01ce..3cb956fc8 100644 --- a/src/silo/query_engine/filter/expressions/phylo_child_filter.cpp +++ b/src/silo/query_engine/filter/expressions/phylo_child_filter.cpp @@ -4,7 +4,6 @@ #include #include -#include #include "silo/common/panic.h" #include "silo/query_engine/filter/expressions/expression.h" @@ -71,26 +70,4 @@ std::unique_ptr PhyloChildFilter::compile(const storage::Ta return createMatchingBitmap(string_column, internal_node, table.sequence_count); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("column"), "The field 'column' is required in an PhyloChildFilter expression" - ) - CHECK_SILO_QUERY( - json["column"].is_string(), - "The field 'column' in an PhyloChildFilter expression needs to be a string" - ) - CHECK_SILO_QUERY( - json.contains("internalNode"), - "The field 'internalNode' is required in an PhyloChildFilter expression" - ) - CHECK_SILO_QUERY( - json["internalNode"].is_string(), - "The field 'internalNode' in an PhyloChildFilter expression needs to be a string" - ) - filter = std::make_unique( - json["column"].get(), json["internalNode"].get() - ); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/phylo_child_filter.h b/src/silo/query_engine/filter/expressions/phylo_child_filter.h index c4cb4371b..71ebac21b 100644 --- a/src/silo/query_engine/filter/expressions/phylo_child_filter.h +++ b/src/silo/query_engine/filter/expressions/phylo_child_filter.h @@ -3,8 +3,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -33,7 +31,4 @@ class PhyloChildFilter : public Expression { ) const; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/string_equals.cpp b/src/silo/query_engine/filter/expressions/string_equals.cpp index 7ffa593a6..e36d7627f 100644 --- a/src/silo/query_engine/filter/expressions/string_equals.cpp +++ b/src/silo/query_engine/filter/expressions/string_equals.cpp @@ -4,8 +4,6 @@ #include #include -#include - #include "silo/common/panic.h" #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/expressions/is_null.h" @@ -70,28 +68,4 @@ std::unique_ptr StringEquals::compile(const storage::Table& ); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("column"), "The field 'column' is required in an StringEquals expression" - ); - CHECK_SILO_QUERY( - json["column"].is_string(), - "The field 'column' in an StringEquals expression needs to be a string" - ); - CHECK_SILO_QUERY( - json.contains("value"), "The field 'value' is required in an StringEquals expression" - ); - CHECK_SILO_QUERY( - json["value"].is_string() || json["value"].is_null(), - "The field 'value' in an StringEquals expression needs to be a string or null" - ); - const std::string& column_name = json["column"]; - std::optional value; - if (!json["value"].is_null()) { - value = json["value"].get(); - } - filter = std::make_unique(column_name, value); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/string_equals.h b/src/silo/query_engine/filter/expressions/string_equals.h index 4a35233ab..914c1cba0 100644 --- a/src/silo/query_engine/filter/expressions/string_equals.h +++ b/src/silo/query_engine/filter/expressions/string_equals.h @@ -3,8 +3,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -29,7 +27,4 @@ class StringEquals : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/string_equals.test.cpp b/src/silo/query_engine/filter/expressions/string_equals.test.cpp index 8508dc72f..99bf792bb 100644 --- a/src/silo/query_engine/filter/expressions/string_equals.test.cpp +++ b/src/silo/query_engine/filter/expressions/string_equals.test.cpp @@ -52,66 +52,23 @@ const QueryTestData TEST_DATA{ .reference_genomes = REFERENCE_GENOMES }; -// Tests for StringEquals with value: null (should rewrite to IsNull) const QueryTestScenario STRING_EQUALS_NULL_STRING_COLUMN = { .name = "STRING_EQUALS_NULL_STRING_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "StringEquals", - "column": "stringField", - "value": null - } -})" - ), + .query = "default.filter(stringField = null).project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([{"primaryKey":"id_1"},{"primaryKey":"id_4"}])") }; const QueryTestScenario STRING_EQUALS_NULL_INDEXED_STRING_COLUMN = { .name = "STRING_EQUALS_NULL_INDEXED_STRING_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "StringEquals", - "column": "indexedStringField", - "value": null - } -})" - ), + .query = "default.filter(indexedStringField = null).project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([{"primaryKey":"id_2"},{"primaryKey":"id_4"}])") }; const QueryTestScenario STRING_EQUALS_NULL_NEGATED = { .name = "STRING_EQUALS_NULL_NEGATED", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "Not", - "child": { - "type": "StringEquals", - "column": "stringField", - "value": null - } - } -})" - ), + .query = "default.filter(!(stringField = null)).project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([{"primaryKey":"id_0"},{"primaryKey":"id_2"},{"primaryKey":"id_3"}])" ) @@ -119,58 +76,19 @@ const QueryTestScenario STRING_EQUALS_NULL_NEGATED = { const QueryTestScenario STRING_EQUALS_VALUE = { .name = "STRING_EQUALS_VALUE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "StringEquals", - "column": "stringField", - "value": "value1" - } -})" - ), + .query = "default.filter(stringField = 'value1').project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([{"primaryKey":"id_0"}])") }; const QueryTestScenario STRING_EQUALS_INDEXED_VALUE = { .name = "STRING_EQUALS_INDEXED_VALUE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "StringEquals", - "column": "indexedStringField", - "value": "indexed1" - } -})" - ), + .query = "default.filter(indexedStringField = 'indexed1').project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([{"primaryKey":"id_0"}])") }; const QueryTestScenario STRING_EQUALS_NO_MATCH = { .name = "STRING_EQUALS_NO_MATCH", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey"] - }, - "filterExpression": { - "type": "StringEquals", - "column": "stringField", - "value": "nonexistent" - } -})" - ), + .query = "default.filter(stringField = 'nonexistent').project(primaryKey)", .expected_query_result = nlohmann::json::parse(R"([])") }; diff --git a/src/silo/query_engine/filter/expressions/string_in_set.cpp b/src/silo/query_engine/filter/expressions/string_in_set.cpp index 6afff2edd..ed7237e02 100644 --- a/src/silo/query_engine/filter/expressions/string_in_set.cpp +++ b/src/silo/query_engine/filter/expressions/string_in_set.cpp @@ -4,7 +4,6 @@ #include #include -#include #include "silo/common/panic.h" #include "silo/query_engine/filter/expressions/expression.h" @@ -68,31 +67,4 @@ std::unique_ptr StringInSet::compile(const storage::Table& ); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("column"), "The field 'column' is required in a StringInSet expression" - ); - CHECK_SILO_QUERY( - json["column"].is_string(), - "The field 'column' in an StringInSet expression needs to be a string" - ); - CHECK_SILO_QUERY( - json.contains("values"), "The field 'values' is required in a StringInSet expression" - ); - CHECK_SILO_QUERY( - json["values"].is_array(), - "The field 'values' in an StringInSet expression needs to be an array" - ); - const std::string& column_name = json["column"]; - std::unordered_set values; - for (const auto& value : json["values"]) { - CHECK_SILO_QUERY( - value.is_string(), "The field 'values' in a StringInSet may only contain strings" - ); - values.insert(value.get()); - } - filter = std::make_unique(column_name, values); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/string_in_set.h b/src/silo/query_engine/filter/expressions/string_in_set.h index 60642b6ed..2d329644c 100644 --- a/src/silo/query_engine/filter/expressions/string_in_set.h +++ b/src/silo/query_engine/filter/expressions/string_in_set.h @@ -4,8 +4,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -34,7 +32,4 @@ class StringInSet : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/string_in_set.test.cpp b/src/silo/query_engine/filter/expressions/string_in_set.test.cpp index be75f8c89..0e81167f9 100644 --- a/src/silo/query_engine/filter/expressions/string_in_set.test.cpp +++ b/src/silo/query_engine/filter/expressions/string_in_set.test.cpp @@ -22,7 +22,7 @@ nlohmann::json createData(const std::string& primary_key, const std::string& cou )", primary_key, country, - country == "Switzerland" ? "Europe" : (country == "USA" ? "Americas" : "Europe") + country == "USA" ? "Americas" : "Europe" )); } @@ -62,20 +62,7 @@ const QueryTestData TEST_DATA{ const QueryTestScenario STRING_IN_SET_SINGLE_VALUE = { .name = "STRING_IN_SET_SINGLE_VALUE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey", "country"] - }, - "filterExpression": { - "type": "StringInSet", - "column": "country", - "values": ["Switzerland"] - } -})" - ), + .query = "default.filter(country.in({'Switzerland'})).project({primaryKey, country})", .expected_query_result = nlohmann::json::parse( R"([{"country":"Switzerland","primaryKey":"id_0"},{"country":"Switzerland","primaryKey":"id_3"}])" ) @@ -83,20 +70,7 @@ const QueryTestScenario STRING_IN_SET_SINGLE_VALUE = { const QueryTestScenario STRING_IN_SET_MULTIPLE_VALUES = { .name = "STRING_IN_SET_MULTIPLE_VALUES", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey", "country"] - }, - "filterExpression": { - "type": "StringInSet", - "column": "country", - "values": ["Switzerland", "Germany"] - } -})" - ), + .query = "default.filter(country.in({'Switzerland', 'Germany'})).project({primaryKey, country})", .expected_query_result = nlohmann::json::parse( R"([{"country":"Switzerland","primaryKey":"id_0"},{"country":"Germany","primaryKey":"id_1"},{"country":"Switzerland","primaryKey":"id_3"},{"country":"Germany","primaryKey":"id_5"}])" ) @@ -104,58 +78,19 @@ const QueryTestScenario STRING_IN_SET_MULTIPLE_VALUES = { const QueryTestScenario STRING_IN_SET_NO_MATCH = { .name = "STRING_IN_SET_NO_MATCH", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey", "country"] - }, - "filterExpression": { - "type": "StringInSet", - "column": "country", - "values": ["Japan", "China"] - } -})" - ), + .query = "default.filter(country.in({'Japan', 'China'})).project({primaryKey, country})", .expected_query_result = nlohmann::json::parse(R"([])") }; const QueryTestScenario STRING_IN_SET_EMPTY_VALUES = { .name = "STRING_IN_SET_EMPTY_VALUES", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey", "country"] - }, - "filterExpression": { - "type": "StringInSet", - "column": "country", - "values": [] - } -})" - ), + .query = "default.filter(country.in({})).project({primaryKey, country})", .expected_query_result = nlohmann::json::parse(R"([])") }; const QueryTestScenario STRING_IN_SET_INDEXED_COLUMN = { .name = "STRING_IN_SET_INDEXED_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey", "region"] - }, - "filterExpression": { - "type": "StringInSet", - "column": "region", - "values": ["Europe"] - } -})" - ), + .query = "default.filter(region.in({'Europe'})).project({primaryKey, region})", .expected_query_result = nlohmann::json::parse( R"([{"primaryKey":"id_0","region":"Europe"},{"primaryKey":"id_1","region":"Europe"},{"primaryKey":"id_3","region":"Europe"},{"primaryKey":"id_4","region":"Europe"},{"primaryKey":"id_5","region":"Europe"}])" ) @@ -163,30 +98,9 @@ const QueryTestScenario STRING_IN_SET_INDEXED_COLUMN = { const QueryTestScenario STRING_IN_SET_WITH_AND = { .name = "STRING_IN_SET_WITH_AND", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey", "country", "region"] - }, - "filterExpression": { - "type": "And", - "children": [ - { - "type": "StringInSet", - "column": "country", - "values": ["Switzerland", "Germany", "France"] - }, - { - "type": "StringEquals", - "column": "region", - "value": "Europe" - } - ] - } -})" - ), + .query = + "default.filter(country.in({'Switzerland', 'Germany', 'France'}) && region = " + "'Europe').project({primaryKey, country, region})", .expected_query_result = nlohmann::json::parse( R"([{"country":"Switzerland","primaryKey":"id_0","region":"Europe"},{"country":"Germany","primaryKey":"id_1","region":"Europe"},{"country":"Switzerland","primaryKey":"id_3","region":"Europe"},{"country":"France","primaryKey":"id_4","region":"Europe"},{"country":"Germany","primaryKey":"id_5","region":"Europe"}])" ) @@ -194,121 +108,13 @@ const QueryTestScenario STRING_IN_SET_WITH_AND = { const QueryTestScenario STRING_IN_SET_NEGATED = { .name = "STRING_IN_SET_NEGATED", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details", - "fields": ["primaryKey", "country"] - }, - "filterExpression": { - "type": "Not", - "child": { - "type": "StringInSet", - "column": "country", - "values": ["Switzerland", "Germany"] - } - } -})" - ), + .query = + "default.filter(!(country.in({'Switzerland', 'Germany'}))).project({primaryKey, country})", .expected_query_result = nlohmann::json::parse( R"([{"country":"USA","primaryKey":"id_2"},{"country":"France","primaryKey":"id_4"}])" ) }; -const QueryTestScenario STRING_IN_SET_MISSING_COLUMN = { - .name = "STRING_IN_SET_MISSING_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details" - }, - "filterExpression": { - "type": "StringInSet", - "values": ["Switzerland"] - } -})" - ), - .expected_query_result = {}, - .expected_error_message = "The field 'column' is required in a StringInSet expression" -}; - -const QueryTestScenario STRING_IN_SET_MISSING_VALUES = { - .name = "STRING_IN_SET_MISSING_VALUES", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details" - }, - "filterExpression": { - "type": "StringInSet", - "column": "country" - } -})" - ), - .expected_query_result = {}, - .expected_error_message = "The field 'values' is required in a StringInSet expression" -}; - -const QueryTestScenario STRING_IN_SET_INVALID_COLUMN_TYPE = { - .name = "STRING_IN_SET_INVALID_COLUMN_TYPE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details" - }, - "filterExpression": { - "type": "StringInSet", - "column": 123, - "values": ["Switzerland"] - } -})" - ), - .expected_query_result = {}, - .expected_error_message = "The field 'column' in an StringInSet expression needs to be a string" -}; - -const QueryTestScenario STRING_IN_SET_INVALID_VALUES_TYPE = { - .name = "STRING_IN_SET_INVALID_VALUES_TYPE", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details" - }, - "filterExpression": { - "type": "StringInSet", - "column": "country", - "values": "Switzerland" - } -})" - ), - .expected_query_result = {}, - .expected_error_message = "The field 'values' in an StringInSet expression needs to be an array" -}; - -const QueryTestScenario STRING_IN_SET_NONEXISTENT_COLUMN = { - .name = "STRING_IN_SET_NONEXISTENT_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details" - }, - "filterExpression": { - "type": "StringInSet", - "column": "nonexistent", - "values": ["Switzerland"] - } -})" - ), - .expected_query_result = {}, - .expected_error_message = "The database does not contain the string column 'nonexistent'" -}; - } // namespace QUERY_TEST( @@ -321,11 +127,6 @@ QUERY_TEST( STRING_IN_SET_EMPTY_VALUES, STRING_IN_SET_INDEXED_COLUMN, STRING_IN_SET_WITH_AND, - STRING_IN_SET_NEGATED, - STRING_IN_SET_MISSING_COLUMN, - STRING_IN_SET_MISSING_VALUES, - STRING_IN_SET_INVALID_COLUMN_TYPE, - STRING_IN_SET_INVALID_VALUES_TYPE, - STRING_IN_SET_NONEXISTENT_COLUMN + STRING_IN_SET_NEGATED ) ); diff --git a/src/silo/query_engine/filter/expressions/string_search.cpp b/src/silo/query_engine/filter/expressions/string_search.cpp index d9b07a9f8..979f5e082 100644 --- a/src/silo/query_engine/filter/expressions/string_search.cpp +++ b/src/silo/query_engine/filter/expressions/string_search.cpp @@ -3,7 +3,6 @@ #include #include -#include #include "silo/common/panic.h" #include "silo/query_engine/filter/expressions/expression.h" @@ -71,33 +70,4 @@ std::unique_ptr StringSearch::compile(const storage::Table& return createMatchingBitmap(string_column, *search_expression, table.sequence_count); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter) { - CHECK_SILO_QUERY( - json.contains("column"), "The field 'column' is required in an StringSearch expression" - ) - CHECK_SILO_QUERY( - json["column"].is_string(), - "The field 'column' in an StringSearch expression needs to be a string" - ) - CHECK_SILO_QUERY( - json.contains("searchExpression"), - "The field 'searchExpression' is required in an StringSearch expression" - ) - CHECK_SILO_QUERY( - json["searchExpression"].is_string(), - "The field 'searchExpression' in an StringSearch expression needs to be a string" - ) - const std::string& column = json["column"]; - const std::string& search_expression_string = json["searchExpression"].get(); - auto search_expression = std::make_unique(search_expression_string); - CHECK_SILO_QUERY( - search_expression->ok(), - "Invalid Regular Expression. The parsing of the regular expression failed with the error " - "'{}'. See https://github.com/google/re2/wiki/Syntax for a Syntax specification.", - search_expression->error() - ) - filter = std::make_unique(column, std::move(search_expression)); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/string_search.h b/src/silo/query_engine/filter/expressions/string_search.h index 8c218f2ee..1fe410137 100644 --- a/src/silo/query_engine/filter/expressions/string_search.h +++ b/src/silo/query_engine/filter/expressions/string_search.h @@ -30,7 +30,4 @@ class StringSearch : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/symbol_equals.cpp b/src/silo/query_engine/filter/expressions/symbol_equals.cpp index c9328ac3d..9fb94c6fd 100644 --- a/src/silo/query_engine/filter/expressions/symbol_equals.cpp +++ b/src/silo/query_engine/filter/expressions/symbol_equals.cpp @@ -4,7 +4,6 @@ #include #include -#include #include "silo/common/aa_symbols.h" #include "silo/common/nucleotide_symbols.h" @@ -108,67 +107,6 @@ std::unique_ptr SymbolEquals::compile( throw QueryCompilationException("SymbolEquals should have been rewritten before compilation"); } -template -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr>& filter) { - CHECK_SILO_QUERY( - json.is_object() && json.contains("position"), - "The field 'position' is required in a SymbolEquals expression" - ); - CHECK_SILO_QUERY( - json["position"].is_number_unsigned(), - "The field 'position' in a SymbolEquals expression needs to be an unsigned integer" - ); - CHECK_SILO_QUERY( - json.contains("symbol"), "The field 'symbol' is required in a SymbolEquals expression" - ); - CHECK_SILO_QUERY( - json["symbol"].is_string(), - "The field 'symbol' in a SymbolEquals expression needs to be a string" - ); - std::optional sequence_name; - if (json.contains("sequenceName")) { - sequence_name = json["sequenceName"].get(); - } - const uint32_t position_idx_1_indexed = json["position"].get(); - CHECK_SILO_QUERY( - position_idx_1_indexed > 0, "The field 'position' is 1-indexed. Value of 0 not allowed." - ); - const uint32_t position_idx = position_idx_1_indexed - 1; - const std::string& symbol = json["symbol"]; - - CHECK_SILO_QUERY( - symbol.size() == 1, "The string field 'symbol' must be exactly one character long" - ); - - if (symbol.at(0) == '.') { - filter = std::make_unique>( - sequence_name, position_idx, SymbolOrDot::dot() - ); - return; - } - const std::optional symbol_char = - SymbolType::charToSymbol(symbol.at(0)); - CHECK_SILO_QUERY( - symbol_char.has_value(), - "The string field 'symbol' must be either a valid {} symbol or the '.' symbol.", - SymbolType::SYMBOL_NAME - ); - filter = std::make_unique>( - sequence_name, position_idx, SymbolOrDot{symbol_char.value()} - ); -} - -template void from_json( - const nlohmann::json& json, - std::unique_ptr>& filter -); - -template void from_json( - const nlohmann::json& json, - std::unique_ptr>& filter -); - template class SymbolEquals; template class SymbolEquals; diff --git a/src/silo/query_engine/filter/expressions/symbol_equals.h b/src/silo/query_engine/filter/expressions/symbol_equals.h index 82fbaa054..fa5bbaa63 100644 --- a/src/silo/query_engine/filter/expressions/symbol_equals.h +++ b/src/silo/query_engine/filter/expressions/symbol_equals.h @@ -5,8 +5,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -59,8 +57,4 @@ class SymbolEquals : public Expression { } }; -template -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr>& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/symbol_in_set.h b/src/silo/query_engine/filter/expressions/symbol_in_set.h index a9ad311ca..e1b441292 100644 --- a/src/silo/query_engine/filter/expressions/symbol_in_set.h +++ b/src/silo/query_engine/filter/expressions/symbol_in_set.h @@ -5,8 +5,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" diff --git a/src/silo/query_engine/filter/expressions/true.cpp b/src/silo/query_engine/filter/expressions/true.cpp index 69d55d754..d55fb2f03 100644 --- a/src/silo/query_engine/filter/expressions/true.cpp +++ b/src/silo/query_engine/filter/expressions/true.cpp @@ -25,9 +25,4 @@ std::unique_ptr True::compile(const storage::Table& table) return std::make_unique(table.sequence_count); } -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& /*json*/, std::unique_ptr& filter) { - filter = std::make_unique(); -} - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/filter/expressions/true.h b/src/silo/query_engine/filter/expressions/true.h index 0cd525387..925c12341 100644 --- a/src/silo/query_engine/filter/expressions/true.h +++ b/src/silo/query_engine/filter/expressions/true.h @@ -3,8 +3,6 @@ #include #include -#include - #include "silo/query_engine/filter/expressions/expression.h" #include "silo/query_engine/filter/operators/operator.h" @@ -25,7 +23,4 @@ class True : public Expression { ) const override; }; -// NOLINTNEXTLINE(readability-identifier-naming) -void from_json(const nlohmann::json& json, std::unique_ptr& filter); - } // namespace silo::query_engine::filter::expressions diff --git a/src/silo/query_engine/operators/aggregate_node.cpp b/src/silo/query_engine/operators/aggregate_node.cpp index ac5a9478a..c260e302b 100644 --- a/src/silo/query_engine/operators/aggregate_node.cpp +++ b/src/silo/query_engine/operators/aggregate_node.cpp @@ -15,18 +15,50 @@ namespace { -arrow::acero::AggregateNodeOptions getAggregateOptionsForGroupByFields( +using silo::query_engine::operators::AggregateDefinition; +using silo::query_engine::operators::AggregateFunction; + +std::string arrowFunctionName(AggregateFunction func, bool has_groups) { + switch (func) { + case AggregateFunction::COUNT: + return has_groups ? "hash_count_all" : "count_all"; + } + SILO_UNREACHABLE(); +} + +arrow::acero::AggregateNodeOptions buildAggregateOptions( const std::vector& group_by_fields, + const std::vector& aggregates, const arrow::Schema& input_schema ) { - if (group_by_fields.empty()) { - auto count_options = - std::make_shared(arrow::compute::CountOptions::CountMode::ALL - ); - const arrow::compute::Aggregate aggregate{ - "count_all", count_options, std::vector{}, std::string("count") - }; - return arrow::acero::AggregateNodeOptions({aggregate}); + const bool has_groups = !group_by_fields.empty(); + + std::vector arrow_aggregates; + arrow_aggregates.reserve(aggregates.size()); + + for (const auto& agg : aggregates) { + std::vector source_refs; + std::shared_ptr options; + + switch (agg.function) { + case AggregateFunction::COUNT: { + options = std::make_shared( + arrow::compute::CountOptions::CountMode::ALL + ); + break; + } + } + + arrow_aggregates.emplace_back( + arrowFunctionName(agg.function, has_groups), + options, + std::move(source_refs), + agg.output_name + ); + } + + if (!has_groups) { + return arrow::acero::AggregateNodeOptions(std::move(arrow_aggregates)); } std::vector field_refs; @@ -36,12 +68,7 @@ arrow::acero::AggregateNodeOptions getAggregateOptionsForGroupByFields( field_refs.emplace_back(field.name); } - auto count_options = - std::make_shared(arrow::compute::CountOptions::CountMode::ALL); - const arrow::compute::Aggregate aggregate{ - "hash_count_all", count_options, std::vector{}, std::string("count") - }; - return arrow::acero::AggregateNodeOptions({aggregate}, field_refs); + return arrow::acero::AggregateNodeOptions(std::move(arrow_aggregates), std::move(field_refs)); } } // namespace @@ -50,14 +77,24 @@ namespace silo::query_engine::operators { AggregateNode::AggregateNode( QueryNodePtr child, - std::vector group_by_fields + std::vector group_by_fields, + std::vector aggregates ) : child(std::move(child)), - group_by_fields(std::move(group_by_fields)) {} + group_by_fields(std::move(group_by_fields)), + aggregates(std::move(aggregates)) {} std::vector AggregateNode::getOutputSchema() const { auto output_fields = group_by_fields; - output_fields.emplace_back("count", schema::ColumnType::INT64); + for (const auto& agg : aggregates) { + schema::ColumnType type; + switch (agg.function) { + case AggregateFunction::COUNT: + type = schema::ColumnType::INT64; + break; + } + output_fields.emplace_back(agg.output_name, type); + } return output_fields; } @@ -70,7 +107,7 @@ arrow::Result AggregateNode::toQueryPlan( auto input_schema = plan.top_node->output_schema(); const arrow::acero::AggregateNodeOptions aggregate_node_options = - getAggregateOptionsForGroupByFields(group_by_fields, *input_schema); + buildAggregateOptions(group_by_fields, aggregates, *input_schema); ARROW_ASSIGN_OR_RAISE( plan.top_node, diff --git a/src/silo/query_engine/operators/aggregate_node.h b/src/silo/query_engine/operators/aggregate_node.h index f73cb9054..e393ec022 100644 --- a/src/silo/query_engine/operators/aggregate_node.h +++ b/src/silo/query_engine/operators/aggregate_node.h @@ -1,6 +1,8 @@ #pragma once #include +#include +#include #include #include @@ -12,12 +14,25 @@ namespace silo::query_engine::operators { +enum class AggregateFunction : uint8_t { COUNT }; + +struct AggregateDefinition { + std::string output_name; + AggregateFunction function; + std::optional source_column; +}; + class AggregateNode final : public QueryNode { public: QueryNodePtr child; std::vector group_by_fields; + std::vector aggregates; - AggregateNode(QueryNodePtr child, std::vector group_by_fields); + AggregateNode( + QueryNodePtr child, + std::vector group_by_fields, + std::vector aggregates + ); [[nodiscard]] std::vector getOutputSchema() const override; diff --git a/src/silo/query_engine/operators/filter_node.cpp b/src/silo/query_engine/operators/filter_node.cpp new file mode 100644 index 000000000..2b8952470 --- /dev/null +++ b/src/silo/query_engine/operators/filter_node.cpp @@ -0,0 +1,24 @@ +#include "silo/query_engine/operators/filter_node.h" + +#include + +namespace silo::query_engine::operators { + +FilterNode::FilterNode(QueryNodePtr child, std::unique_ptr filter) + : child(std::move(child)), + filter(std::move(filter)) {} + +std::vector FilterNode::getOutputSchema() const { + return child->getOutputSchema(); +} + +arrow::Result FilterNode::toQueryPlan( + const std::map>& /*tables*/, + const config::QueryOptions& /*query_options*/ +) const { + throw std::runtime_error( + "FilterNode must be eliminated during pushdown before query plan generation" + ); +} + +} // namespace silo::query_engine::operators diff --git a/src/silo/query_engine/operators/filter_node.h b/src/silo/query_engine/operators/filter_node.h new file mode 100644 index 000000000..ff787ac2b --- /dev/null +++ b/src/silo/query_engine/operators/filter_node.h @@ -0,0 +1,33 @@ +#pragma once + +#include +#include +#include + +#include + +#include "silo/query_engine/filter/expressions/expression.h" +#include "silo/query_engine/operators/query_node.h" +#include "silo/schema/database_schema.h" +#include "silo/storage/table.h" + +namespace silo::query_engine::operators { + +/// Applies a filter expression to its child's output. +/// Must be eliminated during pushdown before query plan generation. +class FilterNode final : public QueryNode { + public: + QueryNodePtr child; + std::unique_ptr filter; + + FilterNode(QueryNodePtr child, std::unique_ptr filter); + + [[nodiscard]] std::vector getOutputSchema() const override; + + [[nodiscard]] arrow::Result toQueryPlan( + const std::map>& tables, + const config::QueryOptions& query_options + ) const override; +}; + +} // namespace silo::query_engine::operators diff --git a/src/silo/query_engine/operators/most_recent_common_ancestor_node.cpp b/src/silo/query_engine/operators/most_recent_common_ancestor_node.cpp index bf8a6f5f8..528710aa1 100644 --- a/src/silo/query_engine/operators/most_recent_common_ancestor_node.cpp +++ b/src/silo/query_engine/operators/most_recent_common_ancestor_node.cpp @@ -70,6 +70,17 @@ MostRecentCommonAncestorNode::MostRecentCommonAncestorNode( column_name(std::move(column_name)), print_nodes_not_in_tree(print_nodes_not_in_tree) {} +MostRecentCommonAncestorNode::MostRecentCommonAncestorNode( + schema::TableName table_name, + std::unique_ptr filter, + std::string column_name, + bool print_nodes_not_in_tree +) + : filter(std::move(filter)), + column_name(std::move(column_name)), + print_nodes_not_in_tree(print_nodes_not_in_tree), + table_name(std::move(table_name)) {} + std::vector MostRecentCommonAncestorNode::getOutputSchema() const { std::vector output_fields; output_fields.emplace_back("missingNodeCount", schema::ColumnType::INT32); @@ -84,24 +95,30 @@ std::vector MostRecentCommonAncestorNode::getOutputSch // NOLINTNEXTLINE(readability-function-cognitive-complexity) arrow::Result MostRecentCommonAncestorNode::toQueryPlan( - const std::map>& /*tables*/, + const std::map>& tables, const config::QueryOptions& /*query_options*/ ) const { - auto bitmap_filter = computeFilter(filter, *table); + auto resolved_table = table; + if (!resolved_table && table_name.has_value()) { + auto it = tables.find(table_name.value()); + CHECK_SILO_QUERY(it != tables.end(), "table '{}' not found", table_name.value().getName()); + resolved_table = it->second; + } + auto bitmap_filter = computeFilter(filter, *resolved_table); CHECK_SILO_QUERY( - table->schema->getColumn(column_name).has_value(), + resolved_table->schema->getColumn(column_name).has_value(), "Column '{}' not found in table schema", column_name ); CHECK_SILO_QUERY( - table->schema->getColumn(column_name).value().type == schema::ColumnType::STRING, + resolved_table->schema->getColumn(column_name).value().type == schema::ColumnType::STRING, "MostRecentCommonAncestor action cannot be called on column '{}' as it is not a column " "of type STRING", column_name ); const auto& optional_table_metadata = - table->schema->getColumnMetadata(column_name); + resolved_table->schema->getColumnMetadata(column_name); CHECK_SILO_QUERY( optional_table_metadata.has_value() && optional_table_metadata.value()->phylo_tree.has_value(), @@ -113,7 +130,7 @@ arrow::Result MostRecentCommonAncestorNode::toQueryPlan( const std::vector output_fields = getOutputSchema(); - auto table_handle = table; + auto table_handle = resolved_table; const auto column_name_copy = column_name; std::function>()> producer = diff --git a/src/silo/query_engine/operators/most_recent_common_ancestor_node.h b/src/silo/query_engine/operators/most_recent_common_ancestor_node.h index f80817635..ce8e8482c 100644 --- a/src/silo/query_engine/operators/most_recent_common_ancestor_node.h +++ b/src/silo/query_engine/operators/most_recent_common_ancestor_node.h @@ -2,6 +2,7 @@ #include #include +#include #include #include #include @@ -22,6 +23,7 @@ class MostRecentCommonAncestorNode final : public QueryNode { std::unique_ptr filter; std::string column_name; bool print_nodes_not_in_tree; + std::optional table_name; MostRecentCommonAncestorNode( std::shared_ptr table, @@ -30,6 +32,13 @@ class MostRecentCommonAncestorNode final : public QueryNode { bool print_nodes_not_in_tree ); + MostRecentCommonAncestorNode( + schema::TableName table_name, + std::unique_ptr filter, + std::string column_name, + bool print_nodes_not_in_tree + ); + [[nodiscard]] std::vector getOutputSchema() const override; [[nodiscard]] arrow::Result toQueryPlan( diff --git a/src/silo/query_engine/operators/mutations_node.h b/src/silo/query_engine/operators/mutations_node.h index 00f677cbe..a73f366d9 100644 --- a/src/silo/query_engine/operators/mutations_node.h +++ b/src/silo/query_engine/operators/mutations_node.h @@ -26,6 +26,16 @@ class MutationsNode final : public QueryNode { constexpr static std::string_view PROPORTION_FIELD_NAME = "proportion"; constexpr static std::string_view COVERAGE_FIELD_NAME = "coverage"; constexpr static std::string_view COUNT_FIELD_NAME = "count"; + static constexpr std::array VALID_FIELDS{ + MUTATION_FIELD_NAME, + MUTATION_FROM_FIELD_NAME, + MUTATION_TO_FIELD_NAME, + POSITION_FIELD_NAME, + SEQUENCE_FIELD_NAME, + PROPORTION_FIELD_NAME, + COVERAGE_FIELD_NAME, + COUNT_FIELD_NAME + }; std::shared_ptr table; std::unique_ptr filter; diff --git a/src/silo/query_engine/operators/order_by_node.cpp b/src/silo/query_engine/operators/order_by_node.cpp index b97c72743..cb2ec0944 100644 --- a/src/silo/query_engine/operators/order_by_node.cpp +++ b/src/silo/query_engine/operators/order_by_node.cpp @@ -1,5 +1,6 @@ #include "silo/query_engine/operators/order_by_node.h" +#include #include #include #include @@ -15,10 +16,13 @@ #include #include #include +#include +#include #include #include "silo/common/panic.h" #include "silo/query_engine/exec_node/arrow_util.h" +#include "silo/query_engine/illegal_query_exception.h" #include "silo/schema/database_schema.h" #include "silo/storage/table.h" @@ -162,10 +166,28 @@ std::vector OrderByNode::getOutputSchema() const { return child->getOutputSchema(); } +// NOLINTNEXTLINE(readability-function-cognitive-complexity) arrow::Result OrderByNode::toQueryPlan( const std::map>& tables, const config::QueryOptions& query_options ) const { + // Validate order-by fields exist in child output schema + auto child_schema = child->getOutputSchema(); + std::vector field_names; + field_names.reserve(child_schema.size()); + for (const auto& identifier : child_schema) { + field_names.push_back(identifier.name); + } + for (const auto& order_by_field : fields) { + CHECK_SILO_QUERY( + std::ranges::find(field_names, order_by_field.name) != field_names.end(), + "OrderByField {} is not contained in the result of this operation. " + "Allowed values are {}.", + order_by_field.name, + fmt::join(field_names, ", ") + ); + } + ARROW_ASSIGN_OR_RAISE(auto plan, child->toQueryPlan(tables, query_options)); using arrow::compute::NullPlacement; diff --git a/src/silo/query_engine/operators/phylo_subtree_node.cpp b/src/silo/query_engine/operators/phylo_subtree_node.cpp index 712c5e5c7..112f9261c 100644 --- a/src/silo/query_engine/operators/phylo_subtree_node.cpp +++ b/src/silo/query_engine/operators/phylo_subtree_node.cpp @@ -72,6 +72,19 @@ PhyloSubtreeNode::PhyloSubtreeNode( print_nodes_not_in_tree(print_nodes_not_in_tree), contract_unary_nodes(contract_unary_nodes) {} +PhyloSubtreeNode::PhyloSubtreeNode( + schema::TableName table_name, + std::unique_ptr filter, + std::string column_name, + bool print_nodes_not_in_tree, + bool contract_unary_nodes +) + : filter(std::move(filter)), + column_name(std::move(column_name)), + print_nodes_not_in_tree(print_nodes_not_in_tree), + contract_unary_nodes(contract_unary_nodes), + table_name(std::move(table_name)) {} + std::vector PhyloSubtreeNode::getOutputSchema() const { std::vector output_fields; output_fields.emplace_back("missingNodeCount", schema::ColumnType::INT32); @@ -84,23 +97,29 @@ std::vector PhyloSubtreeNode::getOutputSchema() const // NOLINTNEXTLINE(readability-function-cognitive-complexity) arrow::Result PhyloSubtreeNode::toQueryPlan( - const std::map>& /*tables*/, + const std::map>& tables, const config::QueryOptions& /*query_options*/ ) const { - auto bitmap_filter = computeFilter(filter, *table); + auto resolved_table = table; + if (!resolved_table && table_name.has_value()) { + auto it = tables.find(table_name.value()); + CHECK_SILO_QUERY(it != tables.end(), "table '{}' not found", table_name.value().getName()); + resolved_table = it->second; + } + auto bitmap_filter = computeFilter(filter, *resolved_table); CHECK_SILO_QUERY( - table->schema->getColumn(column_name).has_value(), + resolved_table->schema->getColumn(column_name).has_value(), "Column '{}' not found in table schema", column_name ); CHECK_SILO_QUERY( - table->schema->getColumn(column_name).value().type == schema::ColumnType::STRING, + resolved_table->schema->getColumn(column_name).value().type == schema::ColumnType::STRING, "PhyloSubtree action cannot be called on column '{}' as it is not a column of type STRING", column_name ); const auto& optional_table_metadata = - table->schema->getColumnMetadata(column_name); + resolved_table->schema->getColumnMetadata(column_name); CHECK_SILO_QUERY( optional_table_metadata.has_value() && optional_table_metadata.value()->phylo_tree.has_value(), @@ -112,7 +131,7 @@ arrow::Result PhyloSubtreeNode::toQueryPlan( const std::vector output_fields = getOutputSchema(); - auto table_handle = table; + auto table_handle = resolved_table; const auto column_name_copy = column_name; const bool contract = contract_unary_nodes; diff --git a/src/silo/query_engine/operators/phylo_subtree_node.h b/src/silo/query_engine/operators/phylo_subtree_node.h index 384be276c..140c51dce 100644 --- a/src/silo/query_engine/operators/phylo_subtree_node.h +++ b/src/silo/query_engine/operators/phylo_subtree_node.h @@ -2,6 +2,7 @@ #include #include +#include #include #include #include @@ -23,6 +24,7 @@ class PhyloSubtreeNode final : public QueryNode { std::string column_name; bool print_nodes_not_in_tree; bool contract_unary_nodes; + std::optional table_name; PhyloSubtreeNode( std::shared_ptr table, @@ -32,6 +34,14 @@ class PhyloSubtreeNode final : public QueryNode { bool contract_unary_nodes ); + PhyloSubtreeNode( + schema::TableName table_name, + std::unique_ptr filter, + std::string column_name, + bool print_nodes_not_in_tree, + bool contract_unary_nodes + ); + [[nodiscard]] std::vector getOutputSchema() const override; [[nodiscard]] arrow::Result toQueryPlan( diff --git a/src/silo/query_engine/operators/project_node.cpp b/src/silo/query_engine/operators/project_node.cpp new file mode 100644 index 000000000..fa99917c1 --- /dev/null +++ b/src/silo/query_engine/operators/project_node.cpp @@ -0,0 +1,40 @@ +#include "silo/query_engine/operators/project_node.h" + +#include +#include +#include + +namespace silo::query_engine::operators { + +ProjectNode::ProjectNode(QueryNodePtr child, std::vector fields) + : child(std::move(child)), + fields(std::move(fields)) {} + +std::vector ProjectNode::getOutputSchema() const { + return fields; +} + +arrow::Result ProjectNode::toQueryPlan( + const std::map>& tables, + const config::QueryOptions& query_options +) const { + ARROW_ASSIGN_OR_RAISE(auto plan, child->toQueryPlan(tables, query_options)); + + std::vector expressions; + std::vector names; + expressions.reserve(fields.size()); + names.reserve(fields.size()); + for (const auto& field : fields) { + expressions.push_back(arrow::compute::field_ref(field.name)); + names.push_back(field.name); + } + + const arrow::acero::ProjectNodeOptions options{std::move(expressions), std::move(names)}; + ARROW_ASSIGN_OR_RAISE( + plan.top_node, + arrow::acero::MakeExecNode("project", plan.plan.get(), {plan.top_node}, options) + ); + return plan; +} + +} // namespace silo::query_engine::operators diff --git a/src/silo/query_engine/operators/project_node.h b/src/silo/query_engine/operators/project_node.h new file mode 100644 index 000000000..e217bb7e1 --- /dev/null +++ b/src/silo/query_engine/operators/project_node.h @@ -0,0 +1,32 @@ +#pragma once + +#include +#include +#include + +#include + +#include "silo/query_engine/operators/query_node.h" +#include "silo/schema/database_schema.h" +#include "silo/storage/table.h" + +namespace silo::query_engine::operators { + +/// Selects specific columns from its child's output. +/// Must be eliminated during pushdown before query plan generation. +class ProjectNode final : public QueryNode { + public: + QueryNodePtr child; + std::vector fields; + + ProjectNode(QueryNodePtr child, std::vector fields); + + [[nodiscard]] std::vector getOutputSchema() const override; + + [[nodiscard]] arrow::Result toQueryPlan( + const std::map>& tables, + const config::QueryOptions& query_options + ) const override; +}; + +} // namespace silo::query_engine::operators diff --git a/src/silo/query_engine/operators/scan_node.cpp b/src/silo/query_engine/operators/scan_node.cpp new file mode 100644 index 000000000..b3f3cdb6f --- /dev/null +++ b/src/silo/query_engine/operators/scan_node.cpp @@ -0,0 +1,30 @@ +#include "silo/query_engine/operators/scan_node.h" + +#include + +#include + +namespace silo::query_engine::operators { + +ScanNode::ScanNode( + schema::TableName table_name, + std::vector output_schema +) + : table_name(std::move(table_name)), + output_schema(std::move(output_schema)) {} + +std::vector ScanNode::getOutputSchema() const { + return output_schema; +} + +arrow::Result ScanNode::toQueryPlan( + const std::map>& /*tables*/, + const config::QueryOptions& /*query_options*/ +) const { + throw std::runtime_error(fmt::format( + "ScanNode('{}') must be eliminated during pushdown before query plan generation", + table_name.getName() + )); +} + +} // namespace silo::query_engine::operators diff --git a/src/silo/query_engine/operators/scan_node.h b/src/silo/query_engine/operators/scan_node.h new file mode 100644 index 000000000..66ffe8ac3 --- /dev/null +++ b/src/silo/query_engine/operators/scan_node.h @@ -0,0 +1,33 @@ +#pragma once + +#include +#include +#include +#include + +#include + +#include "silo/query_engine/operators/query_node.h" +#include "silo/schema/database_schema.h" +#include "silo/storage/table.h" + +namespace silo::query_engine::operators { + +/// Leaf node referencing a table by name. Outputs all columns. +/// Must be eliminated during pushdown before query plan generation. +class ScanNode final : public QueryNode { + public: + schema::TableName table_name; + std::vector output_schema; + + ScanNode(schema::TableName table_name, std::vector output_schema); + + [[nodiscard]] std::vector getOutputSchema() const override; + + [[nodiscard]] arrow::Result toQueryPlan( + const std::map>& tables, + const config::QueryOptions& query_options + ) const override; +}; + +} // namespace silo::query_engine::operators diff --git a/src/silo/query_engine/operators/unresolved_insertions_node.h b/src/silo/query_engine/operators/unresolved_insertions_node.h new file mode 100644 index 000000000..5697ba894 --- /dev/null +++ b/src/silo/query_engine/operators/unresolved_insertions_node.h @@ -0,0 +1,33 @@ +#pragma once + +#include +#include + +#include "silo/query_engine/operators/query_node.h" + +namespace silo::query_engine::operators { + +/// Placeholder for insertions action, resolved during pushdown. +template +class UnresolvedInsertionsNode final : public QueryNode { + public: + QueryNodePtr child; + std::vector sequence_names; + + UnresolvedInsertionsNode(QueryNodePtr child, std::vector sequence_names) + : child(std::move(child)), + sequence_names(std::move(sequence_names)) {} + + [[nodiscard]] std::vector getOutputSchema() const override { + return {}; + } + + [[nodiscard]] arrow::Result toQueryPlan( + const std::map>& /*tables*/, + const config::QueryOptions& /*query_options*/ + ) const override { + throw std::runtime_error("UnresolvedInsertionsNode must be eliminated during pushdown"); + } +}; + +} // namespace silo::query_engine::operators diff --git a/src/silo/query_engine/operators/unresolved_most_recent_common_ancestor_node.h b/src/silo/query_engine/operators/unresolved_most_recent_common_ancestor_node.h new file mode 100644 index 000000000..439e3de2b --- /dev/null +++ b/src/silo/query_engine/operators/unresolved_most_recent_common_ancestor_node.h @@ -0,0 +1,39 @@ +#pragma once + +#include + +#include "silo/query_engine/operators/query_node.h" + +namespace silo::query_engine::operators { + +/// Placeholder for mostRecentCommonAncestor action, resolved during pushdown. +class UnresolvedMostRecentCommonAncestorNode final : public QueryNode { + public: + QueryNodePtr child; + std::string column_name; + bool print_nodes_not_in_tree; + + UnresolvedMostRecentCommonAncestorNode( + QueryNodePtr child, + std::string column_name, + bool print_nodes_not_in_tree + ) + : child(std::move(child)), + column_name(std::move(column_name)), + print_nodes_not_in_tree(print_nodes_not_in_tree) {} + + [[nodiscard]] std::vector getOutputSchema() const override { + return {}; + } + + [[nodiscard]] arrow::Result toQueryPlan( + const std::map>& /*tables*/, + const config::QueryOptions& /*query_options*/ + ) const override { + throw std::runtime_error( + "UnresolvedMostRecentCommonAncestorNode must be eliminated during pushdown" + ); + } +}; + +} // namespace silo::query_engine::operators diff --git a/src/silo/query_engine/operators/unresolved_mutations_node.h b/src/silo/query_engine/operators/unresolved_mutations_node.h new file mode 100644 index 000000000..f58c188b7 --- /dev/null +++ b/src/silo/query_engine/operators/unresolved_mutations_node.h @@ -0,0 +1,42 @@ +#pragma once + +#include +#include + +#include "silo/query_engine/operators/query_node.h" + +namespace silo::query_engine::operators { + +/// Placeholder for mutations action, resolved during pushdown. +template +class UnresolvedMutationsNode final : public QueryNode { + public: + QueryNodePtr child; + std::vector sequence_names; + double min_proportion; + std::vector fields; + + UnresolvedMutationsNode( + QueryNodePtr child, + std::vector sequence_names, + double min_proportion, + std::vector fields + ) + : child(std::move(child)), + sequence_names(std::move(sequence_names)), + min_proportion(min_proportion), + fields(std::move(fields)) {} + + [[nodiscard]] std::vector getOutputSchema() const override { + return {}; + } + + [[nodiscard]] arrow::Result toQueryPlan( + const std::map>& /*tables*/, + const config::QueryOptions& /*query_options*/ + ) const override { + throw std::runtime_error("UnresolvedMutationsNode must be eliminated during pushdown"); + } +}; + +} // namespace silo::query_engine::operators diff --git a/src/silo/query_engine/operators/unresolved_phylo_subtree_node.h b/src/silo/query_engine/operators/unresolved_phylo_subtree_node.h new file mode 100644 index 000000000..580aab7c2 --- /dev/null +++ b/src/silo/query_engine/operators/unresolved_phylo_subtree_node.h @@ -0,0 +1,40 @@ +#pragma once + +#include + +#include "silo/query_engine/operators/query_node.h" + +namespace silo::query_engine::operators { + +/// Placeholder for phyloSubtree action, resolved during pushdown. +class UnresolvedPhyloSubtreeNode final : public QueryNode { + public: + QueryNodePtr child; + std::string column_name; + bool print_nodes_not_in_tree; + bool contract_unary_nodes; + + UnresolvedPhyloSubtreeNode( + QueryNodePtr child, + std::string column_name, + bool print_nodes_not_in_tree, + bool contract_unary_nodes + ) + : child(std::move(child)), + column_name(std::move(column_name)), + print_nodes_not_in_tree(print_nodes_not_in_tree), + contract_unary_nodes(contract_unary_nodes) {} + + [[nodiscard]] std::vector getOutputSchema() const override { + return {}; + } + + [[nodiscard]] arrow::Result toQueryPlan( + const std::map>& /*tables*/, + const config::QueryOptions& /*query_options*/ + ) const override { + throw std::runtime_error("UnresolvedPhyloSubtreeNode must be eliminated during pushdown"); + } +}; + +} // namespace silo::query_engine::operators diff --git a/src/silo/query_engine/planner.cpp b/src/silo/query_engine/planner.cpp index b4e49c990..23df3ffa3 100644 --- a/src/silo/query_engine/planner.cpp +++ b/src/silo/query_engine/planner.cpp @@ -1,13 +1,30 @@ #include "silo/query_engine/planner.h" +#include +#include + +#include "silo/common/aa_symbols.h" +#include "silo/common/nucleotide_symbols.h" +#include "silo/query_engine/filter/expressions/true.h" +#include "silo/query_engine/illegal_query_exception.h" #include "silo/query_engine/operators/aggregate_node.h" #include "silo/query_engine/operators/count_filter_node.h" #include "silo/query_engine/operators/fetch_node.h" +#include "silo/query_engine/operators/filter_node.h" #include "silo/query_engine/operators/insertions_node.h" #include "silo/query_engine/operators/most_recent_common_ancestor_node.h" #include "silo/query_engine/operators/mutations_node.h" #include "silo/query_engine/operators/order_by_node.h" +#include "silo/query_engine/operators/phylo_subtree_node.h" +#include "silo/query_engine/operators/project_node.h" +#include "silo/query_engine/operators/scan_node.h" #include "silo/query_engine/operators/table_scan_node.h" +#include "silo/query_engine/operators/unresolved_insertions_node.h" +#include "silo/query_engine/operators/unresolved_most_recent_common_ancestor_node.h" +#include "silo/query_engine/operators/unresolved_mutations_node.h" +#include "silo/query_engine/operators/unresolved_phylo_subtree_node.h" +#include "silo/query_engine/operators/zstd_decompress_node.h" +#include "silo/query_engine/saneql/ast_to_query.h" namespace silo::query_engine { @@ -25,10 +42,243 @@ arrow::Result planQueryOrError( ); } +operators::QueryNodePtr wrapWithDecompressIfNeeded( + operators::QueryNodePtr node, + const std::shared_ptr& table_schema +) { + std::map> + table_schemas_for_decompression; + for (const auto& column_identifier : node->getOutputSchema()) { + if (schema::isSequenceColumn(column_identifier.type)) { + table_schemas_for_decompression.emplace(column_identifier, table_schema); + } + } + if (table_schemas_for_decompression.empty()) { + return node; + } + return std::make_unique( + std::move(node), std::move(table_schemas_for_decompression) + ); +} + +std::shared_ptr resolveTable( + const schema::TableName& table_name, + const std::map>& tables +) { + auto iter = tables.find(table_name); + CHECK_SILO_QUERY(iter != tables.end(), "table '{}' not found in database", table_name.getName()); + return iter->second; +} + +struct ExtractedScanInfo { + schema::TableName table_name; + std::unique_ptr filter; +}; + +/// Extracts table name and filter from a ScanNode or FilterNode(ScanNode) chain. +std::optional extractScanInfo(operators::QueryNodePtr& node) { + auto* scan = dynamic_cast(node.get()); + if (scan != nullptr) { + return ExtractedScanInfo{.table_name=scan->table_name, .filter=std::make_unique()}; + } + auto* filter = dynamic_cast(node.get()); + if (filter != nullptr) { + auto* inner_scan = dynamic_cast(filter->child.get()); + if (inner_scan != nullptr) { + return ExtractedScanInfo{.table_name=inner_scan->table_name, .filter=std::move(filter->filter)}; + } + } + return std::nullopt; +} + +template +operators::QueryNodePtr pushdownUnresolvedMutations( + operators::UnresolvedMutationsNode* unresolved, + const std::map>& tables +) { + auto scan_info = extractScanInfo(unresolved->child); + CHECK_SILO_QUERY(scan_info.has_value(), "mutations() must be applied to a table scan"); + + auto table = resolveTable(scan_info->table_name, tables); + + std::vector bound_sequence_columns; + for (const auto& sequence_name : unresolved->sequence_names) { + auto column_identifier = table->schema->getColumn(sequence_name); + CHECK_SILO_QUERY( + column_identifier.has_value() && column_identifier.value().type == SymbolType::COLUMN_TYPE, + "The database does not contain the {} sequence '{}'", + SymbolType::SYMBOL_NAME, + sequence_name + ); + bound_sequence_columns.emplace_back(column_identifier.value()); + } + if (unresolved->sequence_names.empty()) { + for (const auto& column_identifier : + table->schema->template getColumnByType()) { + bound_sequence_columns.emplace_back(column_identifier); + } + } + + std::vector fields_to_use; + if (unresolved->fields.empty()) { + fields_to_use = { + operators::MutationsNode::MUTATION_FIELD_NAME, + operators::MutationsNode::MUTATION_FROM_FIELD_NAME, + operators::MutationsNode::MUTATION_TO_FIELD_NAME, + operators::MutationsNode::POSITION_FIELD_NAME, + operators::MutationsNode::SEQUENCE_FIELD_NAME, + operators::MutationsNode::PROPORTION_FIELD_NAME, + operators::MutationsNode::COVERAGE_FIELD_NAME, + operators::MutationsNode::COUNT_FIELD_NAME + }; + } else { + for (const auto& field_str : unresolved->fields) { + auto it = std::ranges::find(operators::MutationsNode::VALID_FIELDS, field_str); + CHECK_SILO_QUERY( + it != operators::MutationsNode::VALID_FIELDS.end(), + "The attribute 'fields' contains an invalid field '{}'. Valid fields are mutation, " + "mutationFrom, mutationTo, position, sequenceName, proportion, coverage, count.", + field_str + ); + fields_to_use.push_back(*it); + } + } + + return std::make_unique>( + std::move(table), + std::move(scan_info->filter), + std::move(bound_sequence_columns), + unresolved->min_proportion, + std::move(fields_to_use) + ); +} + +template +operators::QueryNodePtr pushdownUnresolvedInsertions( + operators::UnresolvedInsertionsNode* unresolved, + const std::map>& tables +) { + auto scan_info = extractScanInfo(unresolved->child); + CHECK_SILO_QUERY(scan_info.has_value(), "insertions() must be applied to a table scan"); + + auto table = resolveTable(scan_info->table_name, tables); + + std::vector bound_sequence_columns; + for (const auto& sequence_name : unresolved->sequence_names) { + auto column_identifier = table->schema->getColumn(sequence_name); + CHECK_SILO_QUERY( + column_identifier.has_value() && column_identifier.value().type == SymbolType::COLUMN_TYPE, + "The database does not contain the {} sequence '{}'", + SymbolType::SYMBOL_NAME, + sequence_name + ); + bound_sequence_columns.emplace_back(column_identifier.value()); + } + if (unresolved->sequence_names.empty()) { + for (const auto& column_identifier : + table->schema->template getColumnByType()) { + bound_sequence_columns.emplace_back(column_identifier); + } + } + + return std::make_unique>( + std::move(table), std::move(scan_info->filter), std::move(bound_sequence_columns) + ); +} + +operators::QueryNodePtr pushdownUnresolvedPhyloSubtree( + operators::UnresolvedPhyloSubtreeNode* unresolved, + const std::map>& tables +) { + auto scan_info = extractScanInfo(unresolved->child); + CHECK_SILO_QUERY(scan_info.has_value(), "phyloSubtree() must be applied to a table scan"); + auto table = resolveTable(scan_info->table_name, tables); + return std::make_unique( + std::move(table), + std::move(scan_info->filter), + std::move(unresolved->column_name), + unresolved->print_nodes_not_in_tree, + unresolved->contract_unary_nodes + ); +} + +operators::QueryNodePtr pushdownUnresolvedMostRecentCommonAncestor( + operators::UnresolvedMostRecentCommonAncestorNode* unresolved, + const std::map>& tables +) { + auto scan_info = extractScanInfo(unresolved->child); + CHECK_SILO_QUERY( + scan_info.has_value(), "mostRecentCommonAncestor() must be applied to a table scan" + ); + auto table = resolveTable(scan_info->table_name, tables); + return std::make_unique( + std::move(table), + std::move(scan_info->filter), + std::move(unresolved->column_name), + unresolved->print_nodes_not_in_tree + ); +} + +/// Collapses Scan/Filter/Project combinations into a single TableScanNode. +/// Handles: ScanNode, FilterNode(ScanNode), ProjectNode(ScanNode), +/// ProjectNode(FilterNode(ScanNode)), FilterNode(ProjectNode(ScanNode)). +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr pushdownScanFilterProject( + operators::QueryNodePtr node, + const std::map>& tables +) { + operators::ProjectNode* project_node = nullptr; + operators::FilterNode* filter_node = nullptr; + operators::ScanNode* scan_node = nullptr; + operators::QueryNode* current = node.get(); + + for (int i = 0; i < 3; ++i) { + if (auto* proj = dynamic_cast(current); + proj != nullptr && project_node == nullptr) { + project_node = proj; + current = proj->child.get(); + continue; + } + if (auto* flt = dynamic_cast(current); + flt != nullptr && filter_node == nullptr) { + filter_node = flt; + current = flt->child.get(); + continue; + } + break; + } + scan_node = dynamic_cast(current); + if (scan_node == nullptr) { + return node; + } + + auto table = resolveTable(scan_node->table_name, tables); + std::unique_ptr filter = + filter_node != nullptr ? std::move(filter_node->filter) + : std::make_unique(); + + std::vector fields; + std::unordered_set seen_names; + const auto& source_fields = + project_node != nullptr ? project_node->fields : scan_node->output_schema; + for (const auto& field : source_fields) { + if (seen_names.insert(field.name).second) { + fields.push_back(field); + } + } + + auto table_schema = table->schema; + operators::QueryNodePtr result = std::make_unique( + std::move(table), std::move(filter), std::move(fields) + ); + return wrapWithDecompressIfNeeded(std::move(result), table_schema); +} + // NOLINTNEXTLINE(misc-no-recursion) operators::QueryNodePtr optimizeInstance(operators::AggregateNode* node) { // Full aggregations (COUNT(*) and only a filter below can be optimized) - if (node->group_by_fields.empty() && + if (node->group_by_fields.empty() && node->aggregates.size() == 1 && + node->aggregates[0].function == operators::AggregateFunction::COUNT && dynamic_cast(node->child.get()) != nullptr) { auto* table_scan_child = dynamic_cast(node->child.get()); return std::make_unique( @@ -36,7 +286,9 @@ operators::QueryNodePtr optimizeInstance(operators::AggregateNode* node) { ); } return std::make_unique( - Planner::optimize(std::move(node->child)), std::move(node->group_by_fields) + Planner::optimize(std::move(node->child)), + std::move(node->group_by_fields), + std::move(node->aggregates) ); } @@ -54,8 +306,129 @@ operators::QueryNodePtr optimizeInstance(operators::FetchNode* node) { ); } +/// Pushes a ProjectNode below a child OrderByNode/FetchNode when this is safe: +/// - Project(Fetch(X)) -> Fetch(Project(X)): always safe +/// - Project(OrderBy(X)) -> OrderBy(Project(X)): safe iff all sort keys are projected +/// Returns the reordered node, or the original if no reorder was possible. +operators::QueryNodePtr tryReorderProject(operators::QueryNodePtr node) { + auto* project = dynamic_cast(node.get()); + if (project == nullptr) { + return node; + } + + if (auto* fetch = dynamic_cast(project->child.get())) { + auto new_project = std::make_unique( + std::move(fetch->child), std::move(project->fields) + ); + return std::make_unique( + std::move(new_project), fetch->count, fetch->offset + ); + } + + if (auto* order_by = dynamic_cast(project->child.get())) { + std::unordered_set projected_names; + for (const auto& field : project->fields) { + projected_names.insert(field.name); + } + for (const auto& order_field : order_by->fields) { + if (!projected_names.contains(order_field.name)) { + return node; + } + } + auto new_project = std::make_unique( + std::move(order_by->child), std::move(project->fields) + ); + return std::make_unique( + std::move(new_project), std::move(order_by->fields), order_by->randomize_seed + ); + } + + return node; +} + } // namespace +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr Planner::pushdown( + operators::QueryNodePtr node, + const std::map>& tables +) { + // Push Project below OrderBy/Fetch when safe, so it can collapse into TableScan. + if (dynamic_cast(node.get()) != nullptr) { + operators::QueryNode* before = node.get(); + node = tryReorderProject(std::move(node)); + if (node.get() != before) { + return pushdown(std::move(node), tables); + } + } + + // Try to collapse Scan/Filter/Project combinations into TableScanNode + if (dynamic_cast(node.get()) != nullptr) { + return pushdownScanFilterProject(std::move(node), tables); + } + if (auto* filter_node = dynamic_cast(node.get())) { + if (dynamic_cast(filter_node->child.get()) != nullptr || + dynamic_cast(filter_node->child.get()) != nullptr) { + return pushdownScanFilterProject(std::move(node), tables); + } + } + if (auto* project_node = dynamic_cast(node.get())) { + if (dynamic_cast(project_node->child.get()) != nullptr || + dynamic_cast(project_node->child.get()) != nullptr) { + return pushdownScanFilterProject(std::move(node), tables); + } + } + + // Resolve unresolved mutations/insertions nodes + if (auto* unresolved = + dynamic_cast*>(node.get())) { + return pushdownUnresolvedMutations(unresolved, tables); + } + if (auto* unresolved = + dynamic_cast*>(node.get())) { + return pushdownUnresolvedMutations(unresolved, tables); + } + if (auto* unresolved = + dynamic_cast*>(node.get())) { + return pushdownUnresolvedInsertions(unresolved, tables); + } + if (auto* unresolved = + dynamic_cast*>(node.get())) { + return pushdownUnresolvedInsertions(unresolved, tables); + } + if (auto* unresolved = dynamic_cast(node.get())) { + return pushdownUnresolvedPhyloSubtree(unresolved, tables); + } + if (auto* unresolved = + dynamic_cast(node.get())) { + return pushdownUnresolvedMostRecentCommonAncestor(unresolved, tables); + } + + // Recurse into nodes with children + if (auto* aggregate = dynamic_cast(node.get())) { + aggregate->child = pushdown(std::move(aggregate->child), tables); + return node; + } + if (auto* order_by = dynamic_cast(node.get())) { + order_by->child = pushdown(std::move(order_by->child), tables); + return node; + } + if (auto* fetch = dynamic_cast(node.get())) { + fetch->child = pushdown(std::move(fetch->child), tables); + return node; + } + if (auto* project = dynamic_cast(node.get())) { + project->child = pushdown(std::move(project->child), tables); + return node; + } + if (auto* filter_node = dynamic_cast(node.get())) { + filter_node->child = pushdown(std::move(filter_node->child), tables); + return node; + } + + return node; +} + // NOLINTNEXTLINE(misc-no-recursion) operators::QueryNodePtr Planner::optimize(operators::QueryNodePtr node) { if (dynamic_cast(node.get()) != nullptr) { @@ -79,7 +452,8 @@ QueryPlan Planner::planQuery( const config::QueryOptions& query_options, std::string_view request_id ) { - auto optimized_tree = optimize(std::move(node)); + auto pushed_down_tree = pushdown(std::move(node), tables); + auto optimized_tree = optimize(std::move(pushed_down_tree)); auto result = planQueryOrError(*optimized_tree, tables, query_options, request_id); if (!result.ok()) { throw std::runtime_error( @@ -89,4 +463,14 @@ QueryPlan Planner::planQuery( return std::move(result.ValueUnsafe()); } +QueryPlan Planner::planSaneqlQuery( + std::string_view query_string, + const std::map>& tables, + const config::QueryOptions& query_options, + std::string_view request_id +) { + auto query_node = saneql::parseAndConvertToQueryTree(query_string, tables); + return planQuery(std::move(query_node), tables, query_options, request_id); +} + } // namespace silo::query_engine diff --git a/src/silo/query_engine/planner.h b/src/silo/query_engine/planner.h index fd4b0b7f8..4f85c7970 100644 --- a/src/silo/query_engine/planner.h +++ b/src/silo/query_engine/planner.h @@ -1,5 +1,7 @@ #pragma once +#include + #include "silo/query_engine/operators/query_node.h" #include "silo/query_engine/query_plan.h" @@ -7,6 +9,11 @@ namespace silo::query_engine { class Planner { public: + static operators::QueryNodePtr pushdown( + operators::QueryNodePtr node, + const std::map>& tables + ); + static operators::QueryNodePtr optimize(operators::QueryNodePtr node); static QueryPlan planQuery( @@ -15,6 +22,13 @@ class Planner { const config::QueryOptions& query_options, std::string_view request_id ); + + static QueryPlan planSaneqlQuery( + std::string_view query_string, + const std::map>& tables, + const config::QueryOptions& query_options, + std::string_view request_id + ); }; } // namespace silo::query_engine diff --git a/src/silo/query_engine/query_plan.h b/src/silo/query_engine/query_plan.h index f78ffcdae..9fd6e6d6e 100644 --- a/src/silo/query_engine/query_plan.h +++ b/src/silo/query_engine/query_plan.h @@ -20,8 +20,6 @@ class QueryPlan { arrow::AsyncGenerator> results_generator; arrow::acero::BackpressureMonitor* backpressure_monitor; std::string_view request_id; - // Pinned resource kept alive for the lifetime of the query plan (e.g. Action used by lambdas) - std::shared_ptr pinned_resource; static arrow::Result makeQueryPlan( std::shared_ptr arrow_plan, diff --git a/src/silo/query_engine/saneql/ast.cpp b/src/silo/query_engine/saneql/ast.cpp new file mode 100644 index 000000000..32dce37a1 --- /dev/null +++ b/src/silo/query_engine/saneql/ast.cpp @@ -0,0 +1,253 @@ +#include "silo/query_engine/saneql/ast.h" + +#include + +#include "silo/query_engine/illegal_query_exception.h" + +namespace silo::query_engine::saneql::ast { + +std::string binaryOpToString(BinaryOp op) { + switch (op) { + case BinaryOp::AND: + return "&&"; + case BinaryOp::OR: + return "||"; + case BinaryOp::EQUALS: + return "="; + case BinaryOp::NOT_EQUALS: + return "<>"; + case BinaryOp::LESS_THAN: + return "<"; + case BinaryOp::LESS_EQUAL: + return "<="; + case BinaryOp::GREATER_THAN: + return ">"; + case BinaryOp::GREATER_EQUAL: + return ">="; + } + return "?"; +} + +namespace { + +struct ExprToString { + std::string operator()(const IntLiteral& lit) const { return std::to_string(lit.value); } + + std::string operator()(const FloatLiteral& lit) const { return fmt::format("{}", lit.value); } + + std::string operator()(const StringLiteral& lit) const { return fmt::format("'{}'", lit.value); } + + std::string operator()(const BoolLiteral& lit) const { return lit.value ? "true" : "false"; } + + std::string operator()(const NullLiteral& /*unused*/) const { return "null"; } + + std::string operator()(const Identifier& identifier) const { return identifier.name; } + + std::string operator()(const BinaryExpr& expr) const { + return fmt::format( + "({} {} {})", expr.left->toString(), binaryOpToString(expr.op), expr.right->toString() + ); + } + + std::string operator()(const UnaryNotExpr& expr) const { + return fmt::format("(!{})", expr.operand->toString()); + } + + std::string operator()(const FunctionCall& call) const { + std::string args; + for (size_t i = 0; i < call.positional_arguments.size(); i++) { + if (i > 0) { + args += ", "; + } + args += call.positional_arguments[i].value->toString(); + } + for (const auto& named_argument : call.named_arguments) { + if (!args.empty()) { + args += ", "; + } + args += named_argument.name + ":=" + named_argument.value->toString(); + } + return fmt::format("{}({})", call.function_name, args); + } + + std::string operator()(const TypeCast& cast) const { + return fmt::format("{}::{}", cast.operand->toString(), cast.target_type); + } + + std::string operator()(const SetLiteral& set) const { + std::string elements; + for (size_t i = 0; i < set.elements.size(); i++) { + if (i > 0) { + elements += ", "; + } + elements += set.elements[i]->toString(); + } + return fmt::format("{{{}}}", elements); + } + + std::string operator()(const RecordLiteral& record) const { + std::string fields; + for (size_t i = 0; i < record.fields.size(); i++) { + if (i > 0) { + fields += ", "; + } + fields += record.fields[i].name + ":=" + record.fields[i].value->toString(); + } + return fmt::format("{{{}}}", fields); + } +}; + +} // namespace + +std::string Expression::toString() const { + return std::visit(ExprToString{}, value); +} + +ExpressionPtr makeExpr(ExpressionVariant value, SourceLocation location) { + auto expr = std::make_unique(); + expr->value = std::move(value); + expr->location = location; + return expr; +} + +std::string extractIdentifierName(const Expression& expression) { + CHECK_SILO_QUERY( + std::holds_alternative(expression.value), + "expected identifier at {}:{}", + expression.location.line, + expression.location.column + ); + return std::get(expression.value).name; +} + +std::string extractStringLiteral(const Expression& expression) { + CHECK_SILO_QUERY( + std::holds_alternative(expression.value), + "expected string literal at {}:{}", + expression.location.line, + expression.location.column + ); + return std::get(expression.value).value; +} + +int64_t extractIntLiteral(const Expression& expression) { + CHECK_SILO_QUERY( + std::holds_alternative(expression.value), + "expected integer literal at {}:{}", + expression.location.line, + expression.location.column + ); + return std::get(expression.value).value; +} + +double extractFloatLiteral(const Expression& expression) { + if (std::holds_alternative(expression.value)) { + return std::get(expression.value).value; + } + if (std::holds_alternative(expression.value)) { + return static_cast(std::get(expression.value).value); + } + throw query_engine::IllegalQueryException( + "expected numeric literal at {}:{}", expression.location.line, expression.location.column + ); +} + +bool extractBoolLiteral(const Expression& expression) { + CHECK_SILO_QUERY( + std::holds_alternative(expression.value), + "expected boolean literal at {}:{}", + expression.location.line, + expression.location.column + ); + return std::get(expression.value).value; +} + +common::Date32 extractDateValue(const Expression& expression) { + CHECK_SILO_QUERY( + std::holds_alternative(expression.value), + "expected date type cast at {}:{}", + expression.location.line, + expression.location.column + ); + const auto& cast = std::get(expression.value); + CHECK_SILO_QUERY( + cast.target_type == "date", + "expected cast to 'date', got '{}' at {}:{}", + cast.target_type, + expression.location.line, + expression.location.column + ); + auto date_string = extractStringLiteral(*cast.operand); + auto result = common::stringToDate32(date_string); + CHECK_SILO_QUERY( + result.has_value(), + "invalid date '{}' at {}:{}: {}", + date_string, + expression.location.line, + expression.location.column, + result.error() + ); + return result.value(); +} + +std::optional extractOptionalDateValue(const Expression& expression) { + if (std::holds_alternative(expression.value)) { + return std::nullopt; + } + return extractDateValue(expression); +} + +std::vector extractSetOfIdentifiers(const Expression& expression) { + CHECK_SILO_QUERY( + std::holds_alternative(expression.value), + "expected set literal at {}:{}", + expression.location.line, + expression.location.column + ); + const auto& set = std::get(expression.value); + std::vector result; + result.reserve(set.elements.size()); + for (const auto& elem : set.elements) { + result.push_back(extractIdentifierName(*elem)); + } + return result; +} + +const SetLiteral& extractSetLiteral(const Expression& expression) { + CHECK_SILO_QUERY( + std::holds_alternative(expression.value), + "expected set literal at {}:{}", + expression.location.line, + expression.location.column + ); + return std::get(expression.value); +} + +bool isDateExpression(const Expression& expression) { + if (!std::holds_alternative(expression.value)) { + return false; + } + return std::get(expression.value).target_type == "date"; +} + +bool isNullLiteral(const Expression& expression) { + return std::holds_alternative(expression.value); +} + +bool isIntLiteral(const Expression& expression) { + return std::holds_alternative(expression.value); +} + +bool isFloatLiteral(const Expression& expression) { + return std::holds_alternative(expression.value); +} + +bool isStringLiteral(const Expression& expression) { + return std::holds_alternative(expression.value); +} + +bool isBoolLiteral(const Expression& expression) { + return std::holds_alternative(expression.value); +} + +} // namespace silo::query_engine::saneql::ast diff --git a/src/silo/query_engine/saneql/ast.h b/src/silo/query_engine/saneql/ast.h new file mode 100644 index 000000000..608fc7a28 --- /dev/null +++ b/src/silo/query_engine/saneql/ast.h @@ -0,0 +1,139 @@ +#pragma once + +#include +#include +#include +#include +#include +#include + +#include "silo/common/date32.h" +#include "silo/query_engine/saneql/source_location.h" + +namespace silo::query_engine::saneql::ast { + +struct Expression; + +using ExpressionPtr = std::unique_ptr; + +struct PositionalArgument { + ExpressionPtr value; + SourceLocation location; +}; + +struct NamedArgument { + std::string name; + ExpressionPtr value; + SourceLocation location; +}; + +enum class BinaryOp : uint8_t { + AND, + OR, + EQUALS, + NOT_EQUALS, + LESS_THAN, + LESS_EQUAL, + GREATER_THAN, + GREATER_EQUAL +}; + +[[nodiscard]] std::string binaryOpToString(BinaryOp op); + +struct IntLiteral { + int64_t value; +}; + +struct FloatLiteral { + double value; +}; + +struct StringLiteral { + std::string value; +}; + +struct BoolLiteral { + bool value; +}; + +struct NullLiteral {}; + +struct Identifier { + std::string name; +}; + +struct BinaryExpr { + BinaryOp op; + ExpressionPtr left; + ExpressionPtr right; +}; + +struct UnaryNotExpr { + ExpressionPtr operand; +}; + +struct FunctionCall { + std::string function_name; + std::vector positional_arguments; + std::vector named_arguments; +}; + +struct TypeCast { + ExpressionPtr operand; + std::string target_type; +}; + +struct SetLiteral { + std::vector elements; +}; + +struct RecordField { + std::string name; + ExpressionPtr value; +}; + +struct RecordLiteral { + std::vector fields; +}; + +using ExpressionVariant = std::variant< + IntLiteral, + FloatLiteral, + StringLiteral, + BoolLiteral, + NullLiteral, + Identifier, + BinaryExpr, + UnaryNotExpr, + FunctionCall, + TypeCast, + SetLiteral, + RecordLiteral>; + +struct Expression { + ExpressionVariant value; + SourceLocation location; + + [[nodiscard]] std::string toString() const; +}; + +ExpressionPtr makeExpr(ExpressionVariant value, SourceLocation location); + +[[nodiscard]] std::string extractIdentifierName(const Expression& expression); +[[nodiscard]] std::string extractStringLiteral(const Expression& expression); +[[nodiscard]] int64_t extractIntLiteral(const Expression& expression); +[[nodiscard]] double extractFloatLiteral(const Expression& expression); +[[nodiscard]] bool extractBoolLiteral(const Expression& expression); +[[nodiscard]] common::Date32 extractDateValue(const Expression& expression); +[[nodiscard]] std::optional extractOptionalDateValue(const Expression& expression); +[[nodiscard]] std::vector extractSetOfIdentifiers(const Expression& expression); +[[nodiscard]] const SetLiteral& extractSetLiteral(const Expression& expression); + +[[nodiscard]] bool isDateExpression(const Expression& expression); +[[nodiscard]] bool isNullLiteral(const Expression& expression); +[[nodiscard]] bool isIntLiteral(const Expression& expression); +[[nodiscard]] bool isFloatLiteral(const Expression& expression); +[[nodiscard]] bool isStringLiteral(const Expression& expression); +[[nodiscard]] bool isBoolLiteral(const Expression& expression); + +} // namespace silo::query_engine::saneql::ast diff --git a/src/silo/query_engine/saneql/ast_to_query.cpp b/src/silo/query_engine/saneql/ast_to_query.cpp new file mode 100644 index 000000000..0099c5afd --- /dev/null +++ b/src/silo/query_engine/saneql/ast_to_query.cpp @@ -0,0 +1,973 @@ +#include "silo/query_engine/saneql/ast_to_query.h" + +#include +#include +#include +#include +#include +#include +#include + +#include "silo/query_engine/saneql/parser.h" + +#include +#include + +#include "silo/common/aa_symbols.h" +#include "silo/common/lineage_tree.h" +#include "silo/common/nucleotide_symbols.h" +#include "silo/query_engine/actions/order_by_field.h" +#include "silo/query_engine/filter/expressions/and.h" +#include "silo/query_engine/filter/expressions/bool_equals.h" +#include "silo/query_engine/filter/expressions/date_between.h" +#include "silo/query_engine/filter/expressions/date_equals.h" +#include "silo/query_engine/filter/expressions/exact.h" +#include "silo/query_engine/filter/expressions/expression.h" +#include "silo/query_engine/filter/expressions/false.h" +#include "silo/query_engine/filter/expressions/float_between.h" +#include "silo/query_engine/filter/expressions/float_equals.h" +#include "silo/query_engine/filter/expressions/has_mutation.h" +#include "silo/query_engine/filter/expressions/insertion_contains.h" +#include "silo/query_engine/filter/expressions/int_between.h" +#include "silo/query_engine/filter/expressions/int_equals.h" +#include "silo/query_engine/filter/expressions/is_null.h" +#include "silo/query_engine/filter/expressions/lineage_filter.h" +#include "silo/query_engine/filter/expressions/maybe.h" +#include "silo/query_engine/filter/expressions/negation.h" +#include "silo/query_engine/filter/expressions/nof.h" +#include "silo/query_engine/filter/expressions/or.h" +#include "silo/query_engine/filter/expressions/phylo_child_filter.h" +#include "silo/query_engine/filter/expressions/string_equals.h" +#include "silo/query_engine/filter/expressions/string_in_set.h" +#include "silo/query_engine/filter/expressions/string_search.h" +#include "silo/query_engine/filter/expressions/symbol_equals.h" +#include "silo/query_engine/filter/expressions/true.h" +#include "silo/query_engine/illegal_query_exception.h" +#include "silo/query_engine/operators/aggregate_node.h" +#include "silo/query_engine/operators/fetch_node.h" +#include "silo/query_engine/operators/filter_node.h" +#include "silo/query_engine/operators/order_by_node.h" +#include "silo/query_engine/operators/project_node.h" +#include "silo/query_engine/operators/scan_node.h" +#include "silo/query_engine/operators/unresolved_insertions_node.h" +#include "silo/query_engine/operators/unresolved_most_recent_common_ancestor_node.h" +#include "silo/query_engine/operators/unresolved_mutations_node.h" +#include "silo/query_engine/operators/unresolved_phylo_subtree_node.h" +#include "silo/query_engine/saneql/ast.h" +#include "silo/query_engine/saneql/function_registry.h" + +namespace silo::query_engine::saneql { + +namespace { + +FilterPtr convertEqualsToFilter(const std::string& column_name, const ast::Expression& value_expr) { + if (isNullLiteral(value_expr)) { + return std::make_unique(column_name); + } + if (isStringLiteral(value_expr)) { + return std::make_unique( + column_name, extractStringLiteral(value_expr) + ); + } + if (isIntLiteral(value_expr)) { + return std::make_unique( + column_name, static_cast(extractIntLiteral(value_expr)) + ); + } + if (isFloatLiteral(value_expr)) { + return std::make_unique( + column_name, extractFloatLiteral(value_expr) + ); + } + if (isBoolLiteral(value_expr)) { + return std::make_unique( + column_name, std::get(value_expr.value).value + ); + } + if (isDateExpression(value_expr)) { + return std::make_unique( + column_name, extractDateValue(value_expr) + ); + } + throw IllegalQueryException( + "unsupported value type in equality at {}:{}", + value_expr.location.line, + value_expr.location.column + ); +} + +FilterPtr convertIntComparison( + const std::string& column_name, + ast::BinaryOp binary_op, + uint32_t value +) { + switch (binary_op) { + case ast::BinaryOp::LESS_THAN: + return std::make_unique( + column_name, std::nullopt, value > 0 ? std::optional(value - 1) : 0 + ); + case ast::BinaryOp::LESS_EQUAL: + return std::make_unique(column_name, std::nullopt, value); + case ast::BinaryOp::GREATER_THAN: + return std::make_unique( + column_name, value + 1, std::nullopt + ); + case ast::BinaryOp::GREATER_EQUAL: + return std::make_unique(column_name, value, std::nullopt); + default: + throw IllegalQueryException("unexpected operator for integer comparison"); + } +} + +FilterPtr convertFloatComparison( + const std::string& column_name, + ast::BinaryOp binary_op, + double value +) { + switch (binary_op) { + case ast::BinaryOp::LESS_THAN: + case ast::BinaryOp::LESS_EQUAL: + return std::make_unique( + column_name, std::nullopt, value + ); + case ast::BinaryOp::GREATER_THAN: + case ast::BinaryOp::GREATER_EQUAL: + return std::make_unique( + column_name, value, std::nullopt + ); + default: + throw IllegalQueryException("unexpected operator for float comparison"); + } +} + +FilterPtr convertDateComparison( + const std::string& column_name, + ast::BinaryOp binary_op, + const ast::Expression& value_expr +) { + auto date_val = extractOptionalDateValue(value_expr); + switch (binary_op) { + case ast::BinaryOp::LESS_THAN: + return std::make_unique( + column_name, std::nullopt, date_val.has_value() ? date_val.value() - 1 : date_val + ); + case ast::BinaryOp::LESS_EQUAL: + return std::make_unique( + column_name, std::nullopt, date_val + ); + case ast::BinaryOp::GREATER_THAN: + return std::make_unique( + column_name, date_val.has_value() ? date_val.value() + 1 : date_val, std::nullopt + ); + case ast::BinaryOp::GREATER_EQUAL: + return std::make_unique( + column_name, date_val, std::nullopt + ); + default: + throw IllegalQueryException("unexpected operator for date comparison"); + } +} + +FilterPtr convertComparisonToFilter( + const std::string& column_name, + ast::BinaryOp binary_op, + const ast::Expression& value_expr +) { + if (isDateExpression(value_expr)) { + return convertDateComparison(column_name, binary_op, value_expr); + } + if (isFloatLiteral(value_expr)) { + return convertFloatComparison(column_name, binary_op, extractFloatLiteral(value_expr)); + } + if (isIntLiteral(value_expr)) { + return convertIntComparison( + column_name, binary_op, static_cast(extractIntLiteral(value_expr)) + ); + } + throw IllegalQueryException( + "unsupported value type in comparison at {}:{}", + value_expr.location.line, + value_expr.location.column + ); +} + +FilterPtr convertBinaryExprToFilter(const ast::BinaryExpr& bin_expr) { + switch (bin_expr.op) { + case ast::BinaryOp::AND: { + filter::expressions::ExpressionVector children; + children.push_back(convertToFilter(*bin_expr.left)); + children.push_back(convertToFilter(*bin_expr.right)); + return std::make_unique(std::move(children)); + } + case ast::BinaryOp::OR: { + filter::expressions::ExpressionVector children; + children.push_back(convertToFilter(*bin_expr.left)); + children.push_back(convertToFilter(*bin_expr.right)); + return std::make_unique(std::move(children)); + } + case ast::BinaryOp::EQUALS: { + if (std::holds_alternative(bin_expr.left->value)) { + return convertEqualsToFilter(extractIdentifierName(*bin_expr.left), *bin_expr.right); + } + if (std::holds_alternative(bin_expr.right->value)) { + return convertEqualsToFilter(extractIdentifierName(*bin_expr.right), *bin_expr.left); + } + throw IllegalQueryException( + "equality comparison requires an identifier on one side at {}:{}", + bin_expr.left->location.line, + bin_expr.left->location.column + ); + } + case ast::BinaryOp::NOT_EQUALS: { + if (std::holds_alternative(bin_expr.left->value)) { + return std::make_unique( + convertEqualsToFilter(extractIdentifierName(*bin_expr.left), *bin_expr.right) + ); + } + if (std::holds_alternative(bin_expr.right->value)) { + return std::make_unique( + convertEqualsToFilter(extractIdentifierName(*bin_expr.right), *bin_expr.left) + ); + } + throw IllegalQueryException( + "not-equals comparison requires an identifier on one side at {}:{}", + bin_expr.left->location.line, + bin_expr.left->location.column + ); + } + case ast::BinaryOp::LESS_THAN: + case ast::BinaryOp::LESS_EQUAL: + case ast::BinaryOp::GREATER_THAN: + case ast::BinaryOp::GREATER_EQUAL: { + CHECK_SILO_QUERY( + std::holds_alternative(bin_expr.left->value), + "comparison requires an identifier on the left side at {}:{}", + bin_expr.left->location.line, + bin_expr.left->location.column + ); + return convertComparisonToFilter( + extractIdentifierName(*bin_expr.left), bin_expr.op, *bin_expr.right + ); + } + } + throw IllegalQueryException("unhandled binary operator"); +} + +// ======================================================================== +// Filter function handlers (registered in FilterFunctionRegistry) +// ======================================================================== + +FilterPtr handleBetween(const BoundArguments& args) { + auto column_name = extractIdentifierName(args.at("column")); + const auto& from_expr = args.at("from"); + const auto& to_expr = args.at("to"); + + if (isDateExpression(from_expr) || isDateExpression(to_expr)) { + return std::make_unique( + column_name, extractOptionalDateValue(from_expr), extractOptionalDateValue(to_expr) + ); + } + if (isFloatLiteral(from_expr) || isFloatLiteral(to_expr)) { + std::optional from_val; + std::optional to_val; + if (!isNullLiteral(from_expr)) { + from_val = extractFloatLiteral(from_expr); + } + if (!isNullLiteral(to_expr)) { + to_val = extractFloatLiteral(to_expr); + } + return std::make_unique(column_name, from_val, to_val); + } + std::optional from_val; + std::optional to_val; + if (!isNullLiteral(from_expr)) { + from_val = static_cast(extractIntLiteral(from_expr)); + } + if (!isNullLiteral(to_expr)) { + to_val = static_cast(extractIntLiteral(to_expr)); + } + return std::make_unique(column_name, from_val, to_val); +} + +FilterPtr handleIn(const BoundArguments& args) { + auto column_name = extractIdentifierName(args.at("column")); + const auto& set_expr = args.at("values"); + CHECK_SILO_QUERY( + std::holds_alternative(set_expr.value), + "in() expects a set literal argument at {}:{}", + set_expr.location.line, + set_expr.location.column + ); + const auto& set = std::get(set_expr.value); + std::unordered_set values; + for (const auto& elem : set.elements) { + values.insert(extractStringLiteral(*elem)); + } + return std::make_unique(column_name, std::move(values)); +} + +FilterPtr handleIsNull(const BoundArguments& args) { + return std::make_unique(extractIdentifierName(args.at("column"))); +} + +FilterPtr handleIsNotNull(const BoundArguments& args) { + return std::make_unique( + std::make_unique(extractIdentifierName(args.at("column"))) + ); +} + +FilterPtr handleLineage(const BoundArguments& args) { + auto column_name = extractIdentifierName(args.at("column")); + const auto& value_expr = args.at("value"); + std::optional lineage_value; + if (!isNullLiteral(value_expr)) { + lineage_value = extractStringLiteral(value_expr); + } + bool include_sublineages = false; + if (const auto* expr = args.get("includeSublineages")) { + include_sublineages = extractBoolLiteral(*expr); + } + std::optional sublineage_mode; + if (include_sublineages) { + sublineage_mode = common::RecombinantEdgeFollowingMode::DO_NOT_FOLLOW; + } + auto recombinant_mode = args.getOptionalString("recombinantFollowingMode"); + if (recombinant_mode.has_value()) { + if (recombinant_mode.value() == "alwaysFollow") { + sublineage_mode = common::RecombinantEdgeFollowingMode::ALWAYS_FOLLOW; + } else if (recombinant_mode.value() == "followIfFullyContainedInClade") { + sublineage_mode = common::RecombinantEdgeFollowingMode::FOLLOW_IF_FULLY_CONTAINED_IN_CLADE; + } else if (recombinant_mode.value() == "doNotFollow") { + sublineage_mode = common::RecombinantEdgeFollowingMode::DO_NOT_FOLLOW; + } else { + throw IllegalQueryException( + "invalid recombinantFollowingMode: '{}'. Valid values are: alwaysFollow, " + "followIfFullyContainedInClade, doNotFollow", + recombinant_mode.value() + ); + } + } + return std::make_unique( + column_name, lineage_value, sublineage_mode + ); +} + +FilterPtr handlePhyloDescendantOf(const BoundArguments& args) { + return std::make_unique( + extractIdentifierName(args.at("column")), extractStringLiteral(args.at("node")) + ); +} + +FilterPtr handleLike(const BoundArguments& args) { + auto column_name = extractIdentifierName(args.at("column")); + auto pattern = extractStringLiteral(args.at("pattern")); + auto regex = std::make_unique(pattern); + CHECK_SILO_QUERY( + regex->ok(), + "Invalid Regular Expression. The parsing of the regular expression failed with the " + "error '{}'. See https://github.com/google/re2/wiki/Syntax for a Syntax specification.", + regex->error() + ); + return std::make_unique(column_name, std::move(regex)); +} + +template +FilterPtr handleSymbolEquals(const BoundArguments& args) { + auto position = static_cast(extractIntLiteral(args.at("position"))); + CHECK_SILO_QUERY(position > 0, "The field 'position' is 1-indexed. Value of 0 not allowed."); + const uint32_t position_idx = position - 1; + auto symbol_str = extractStringLiteral(args.at("symbol")); + CHECK_SILO_QUERY( + symbol_str.size() == 1, "{}() symbol must be a single character", args.functionName() + ); + auto sequence_name = args.getOptionalString("sequenceName"); + char symbol_char = symbol_str[0]; + if (symbol_char == '.') { + return std::make_unique>( + sequence_name, position_idx, filter::expressions::SymbolOrDot::dot() + ); + } + auto symbol = SymbolType::charToSymbol(symbol_char); + CHECK_SILO_QUERY( + symbol.has_value(), "{}() invalid symbol '{}'", args.functionName(), symbol_char + ); + return std::make_unique>( + sequence_name, position_idx, filter::expressions::SymbolOrDot(symbol.value()) + ); +} + +template +FilterPtr handleHasMutation(const BoundArguments& args) { + auto position = static_cast(extractIntLiteral(args.at("position"))); + CHECK_SILO_QUERY(position > 0, "The field 'position' is 1-indexed. Value of 0 not allowed."); + auto sequence_name = args.getOptionalString("sequenceName"); + return std::make_unique>( + sequence_name, position - 1 + ); +} + +template +FilterPtr handleInsertionContains(const BoundArguments& args) { + auto position = static_cast(extractIntLiteral(args.at("position"))); + auto value = extractStringLiteral(args.at("value")); + CHECK_SILO_QUERY( + !value.empty(), + "The field 'value' in an InsertionContains expression must not be an empty string" + ); + auto sequence_name = args.getOptionalString("sequenceName"); + return std::make_unique>( + sequence_name, position, std::move(value) + ); +} + +FilterPtr handleExact(const BoundArguments& args) { + return std::make_unique(convertToFilter(args.at("child"))); +} + +FilterPtr handleMaybe(const BoundArguments& args) { + return std::make_unique(convertToFilter(args.at("child"))); +} + +// NOLINTNEXTLINE(misc-no-recursion) +FilterPtr handleNOf(const BoundArguments& args) { + auto number_of_matchers = static_cast(extractIntLiteral(args.at("count"))); + bool match_exactly = false; + if (const auto* expr = args.get("matchExactly")) { + match_exactly = extractBoolLiteral(*expr); + } + const auto& children_set = extractSetLiteral(args.at("children")); + filter::expressions::ExpressionVector children; + for (const auto& child_expr : children_set.elements) { + children.push_back(convertToFilter(*child_expr)); + } + return std::make_unique( + std::move(children), number_of_matchers, match_exactly + ); +} + +} // namespace + +std::unique_ptr convertToFilter(const ast::Expression& ast) { + return std::visit( + [&](const auto& node) -> FilterPtr { + using T = std::decay_t; + + if constexpr (std::is_same_v) { + return convertBinaryExprToFilter(node); + } else if constexpr (std::is_same_v) { + return std::make_unique(convertToFilter(*node.operand)); + } else if constexpr (std::is_same_v) { + if (node.value) { + return std::make_unique(); + } + return std::make_unique(); + } else if constexpr (std::is_same_v) { + const auto* entry = FilterFunctionRegistry::instance().findFunction(node.function_name); + CHECK_SILO_QUERY(entry != nullptr, "unknown scalar function '{}'", node.function_name); + auto bound = bindArguments( + node.function_name, entry->signature, node.positional_arguments, node.named_arguments + ); + return entry->handler(bound); + } else if constexpr (std::is_same_v) { + return std::make_unique(node.name, true); + } else { + throw IllegalQueryException( + "unsupported expression type in filter context at {}:{}", + ast.location.line, + ast.location.column + ); + } + }, + ast.value + ); +} + +// ======================================================================== +// Pipeline function handlers (registered in FunctionRegistry) +// ======================================================================== + +namespace { + +operators::AggregateFunction parseAggregateFunctionName(const std::string& function_name) { + if (function_name == "count") { + return operators::AggregateFunction::COUNT; + } + throw IllegalQueryException( + "unknown aggregate function '{}'. Valid functions: count", function_name + ); +} + +struct GroupByArgs { + std::vector group_by_names; + std::vector aggregates; +}; + +GroupByArgs parseGroupBySpecs(const BoundArguments& args) { + GroupByArgs result; + + // Parse aggregates (required) β€” a RecordLiteral like {count:=count()} + const auto& agg_expr = args.at("aggregates"); + CHECK_SILO_QUERY( + std::holds_alternative(agg_expr.value), + "groupBy aggregates must be a record literal like {{count:=count()}}" + ); + const auto& record = std::get(agg_expr.value); + for (const auto& field : record.fields) { + CHECK_SILO_QUERY( + std::holds_alternative(field.value->value), + "aggregate definition '{}' must be a function call (e.g. count(), sum(col))", + field.name + ); + const auto& func = std::get(field.value->value); + auto agg_func = parseAggregateFunctionName(func.function_name); + std::optional source_column; + if (!func.positional_arguments.empty()) { + source_column = extractIdentifierName(*func.positional_arguments[0].value); + } + result.aggregates.push_back({field.name, agg_func, std::move(source_column)}); + } + + // Parse columns (optional) β€” a SetLiteral like {pango_lineage, division} + if (const auto* columns_expr = args.get("columns")) { + const auto& set = extractSetLiteral(*columns_expr); + for (const auto& elem : set.elements) { + result.group_by_names.push_back(extractIdentifierName(*elem)); + } + } + + return result; +} + +std::vector parseOrderByFields(const ast::Expression& expression) { + const auto& set = extractSetLiteral(expression); + std::vector fields; + for (const auto& elem : set.elements) { + if (std::holds_alternative(elem->value)) { + fields.push_back({.name = extractIdentifierName(*elem), .ascending = true}); + } else if (std::holds_alternative(elem->value)) { + const auto& call = std::get(elem->value); + CHECK_SILO_QUERY( + call.function_name == "asc" || call.function_name == "desc", + "orderBy field must be an identifier or asc()/desc() call, got '{}' at {}:{}", + call.function_name, + elem->location.line, + elem->location.column + ); + CHECK_SILO_QUERY( + call.positional_arguments.size() == 1 && call.named_arguments.empty(), + "{}() expects exactly one argument", + call.function_name + ); + fields.push_back( + {.name = extractIdentifierName(*call.positional_arguments[0].value), + .ascending = call.function_name == "asc"} + ); + } else { + throw IllegalQueryException( + "orderBy field must be an identifier or asc()/desc() call at {}:{}", + elem->location.line, + elem->location.column + ); + } + } + return fields; +} + +operators::QueryNodePtr buildScanNode( + const ast::Expression& ast, + const std::map>& tables +) { + const auto& name = std::get(ast.value).name; + auto table_name = schema::TableName(name); + auto iter = tables.find(table_name); + CHECK_SILO_QUERY(iter != tables.end(), "table '{}' not found in database", table_name.getName()); + std::vector output_schema; + for (const auto& identifier : iter->second->schema->getColumnIdentifiers()) { + output_schema.push_back(identifier); + } + return std::make_unique(std::move(table_name), std::move(output_schema)); +} + +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr handleFilter( + const BoundArguments& args, + const Tables& tables, + const ChildConverter& convert_child +) { + auto child = convert_child(args.at("input"), tables); + auto filter_expr = convertToFilter(args.at("predicate")); + return std::make_unique(std::move(child), std::move(filter_expr)); +} + +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr handleGroupBy( + const BoundArguments& args, + const Tables& tables, + const ChildConverter& convert_child +) { + auto [group_by_names, aggregates] = parseGroupBySpecs(args); + auto child = convert_child(args.at("input"), tables); + auto child_schema = child->getOutputSchema(); + + std::vector group_by_fields; + std::unordered_set seen_names; + for (auto& name : group_by_names) { + if (!seen_names.insert(name).second) { + continue; + } + auto found = + std::ranges::find_if(child_schema, [&](const auto& col) { return col.name == name; }); + CHECK_SILO_QUERY( + found != child_schema.end(), + "groupBy field '{}' is not present in the input's output schema", + name + ); + group_by_fields.emplace_back(std::move(name), found->type); + } + + return std::make_unique( + std::move(child), std::move(group_by_fields), std::move(aggregates) + ); +} + +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr handleProject( + const BoundArguments& args, + const Tables& tables, + const ChildConverter& convert_child +) { + const auto& field_argument = args.at("fields"); + const std::vector field_names = holds_alternative(field_argument.value) + ? std::vector{extractIdentifierName(field_argument)} + : extractSetOfIdentifiers(field_argument); + auto child = convert_child(args.at("input"), tables); + auto child_schema = child->getOutputSchema(); + std::vector fields; + for (const auto& name : field_names) { + auto found = + std::ranges::find_if(child_schema, [&](const auto& col) { return col.name == name; }); + CHECK_SILO_QUERY( + found != child_schema.end(), + "project field '{}' is not present in the input's output schema", + name + ); + fields.emplace_back(name, found->type); + } + return std::make_unique(std::move(child), std::move(fields)); +} + +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr handleMutations( + const BoundArguments& args, + const Tables& tables, + const ChildConverter& convert_child +) { + auto child = convert_child(args.at("input"), tables); + std::vector sequence_names; + if (const auto* seq_expr = args.get("sequenceNames")) { + sequence_names = extractSetOfIdentifiers(*seq_expr); + } + auto min_proportion = extractFloatLiteral(args.at("minProportion")); + CHECK_SILO_QUERY( + min_proportion >= 0 && min_proportion <= 1, + "Invalid proportion: minProportion must be in interval [0.0, 1.0]" + ); + std::vector field_strings; + if (const auto* expr = args.get("fields")) { + field_strings = extractSetOfIdentifiers(*expr); + } + if (args.functionName() == "mutations") { + return std::make_unique>( + std::move(child), std::move(sequence_names), min_proportion, std::move(field_strings) + ); + } + return std::make_unique>( + std::move(child), std::move(sequence_names), min_proportion, std::move(field_strings) + ); +} + +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr handleInsertions( + const BoundArguments& args, + const Tables& tables, + const ChildConverter& convert_child +) { + auto child = convert_child(args.at("input"), tables); + std::vector sequence_names; + if (const auto* seq_expr = args.get("sequenceNames")) { + sequence_names = extractSetOfIdentifiers(*seq_expr); + } + if (args.functionName() == "insertions") { + return std::make_unique>( + std::move(child), std::move(sequence_names) + ); + } + return std::make_unique>( + std::move(child), std::move(sequence_names) + ); +} + +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr handleRandomize( + const BoundArguments& args, + const Tables& tables, + const ChildConverter& convert_child +) { + auto child = convert_child(args.at("input"), tables); + uint32_t seed; + auto seed_arg = args.getOptionalUint32("seed"); + if (seed_arg.has_value()) { + seed = seed_arg.value(); + } else { + seed = static_cast(std::chrono::system_clock::now().time_since_epoch().count()); + } + return std::make_unique( + std::move(child), std::vector{}, seed + ); +} + +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr handleLimit( + const BoundArguments& args, + const Tables& tables, + const ChildConverter& convert_child +) { + auto child = convert_child(args.at("input"), tables); + auto limit_val = static_cast(extractIntLiteral(args.at("count"))); + CHECK_SILO_QUERY(limit_val > 0, "limit must be a positive number"); + auto offset = args.getOptionalUint32("offset"); + return std::make_unique(std::move(child), limit_val, offset); +} + +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr handleOffset( + const BoundArguments& args, + const Tables& tables, + const ChildConverter& convert_child +) { + auto child = convert_child(args.at("input"), tables); + int64_t offset_val = extractIntLiteral(args.at("count")); + CHECK_SILO_QUERY( + offset_val >= 0, "If the action contains an offset, it must be a non-negative number" + ); + return std::make_unique( + std::move(child), std::nullopt, static_cast(offset_val) + ); +} + +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr handleOrderBy( + const BoundArguments& args, + const Tables& tables, + const ChildConverter& convert_child +) { + auto order_fields = parseOrderByFields(args.at("fields")); + auto child = convert_child(args.at("input"), tables); + return std::make_unique( + std::move(child), std::move(order_fields), std::nullopt + ); +} + +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr handleMostRecentCommonAncestor( + const BoundArguments& args, + const Tables& tables, + const ChildConverter& convert_child +) { + auto column_name = extractStringLiteral(args.at("column")); + bool print_nodes_not_in_tree = false; + if (const auto* expr = args.get("printNodesNotInTree")) { + print_nodes_not_in_tree = extractBoolLiteral(*expr); + } + auto child = convert_child(args.at("input"), tables); + return std::make_unique( + std::move(child), std::move(column_name), print_nodes_not_in_tree + ); +} + +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr handlePhyloSubtree( + const BoundArguments& args, + const Tables& tables, + const ChildConverter& convert_child +) { + auto column_name = extractStringLiteral(args.at("column")); + bool print_nodes_not_in_tree = false; + bool contract_unary_nodes = false; + if (const auto* expr = args.get("printNodesNotInTree")) { + print_nodes_not_in_tree = extractBoolLiteral(*expr); + } + if (const auto* expr = args.get("contractUnaryNodes")) { + contract_unary_nodes = extractBoolLiteral(*expr); + } + auto child = convert_child(args.at("input"), tables); + return std::make_unique( + std::move(child), std::move(column_name), print_nodes_not_in_tree, contract_unary_nodes + ); +} + +} // namespace + +// NOLINTNEXTLINE(misc-no-recursion) +operators::QueryNodePtr convertExpression( + const ast::Expression& ast, + const std::map>& tables +) { + if (std::holds_alternative(ast.value)) { + return buildScanNode(ast, tables); + } + + CHECK_SILO_QUERY( + std::holds_alternative(ast.value), + "expected table reference or function call at {}:{}", + ast.location.line, + ast.location.column + ); + const auto& call = std::get(ast.value); + + const auto* entry = FunctionRegistry::instance().findFunction(call.function_name); + CHECK_SILO_QUERY( + entry != nullptr, + "unknown function '{}' at {}:{}", + call.function_name, + ast.location.line, + ast.location.column + ); + + auto bound = bindArguments( + call.function_name, entry->signature, call.positional_arguments, call.named_arguments + ); + return entry->handler(bound, tables, convertExpression); +} + +// ======================================================================== +// Public entry point +// ======================================================================== + +operators::QueryNodePtr convertToQueryTree( + const ast::Expression& ast, + const std::map>& tables +) { + return convertExpression(ast, tables); +} + +operators::QueryNodePtr parseAndConvertToQueryTree( + std::string_view query_string, + const std::map>& tables +) { + Parser parser(query_string); + auto ast = parser.parse(); + return convertToQueryTree(*ast, tables); +} + +// ======================================================================== +// Registry construction β€” signatures + handlers +// ======================================================================== + +// Shorthand helpers for building signatures +namespace { +ParameterDefinition pos(std::string name, bool required = true) { + return {.name = std::move(name), .required = required, .positional = true}; +} +ParameterDefinition named(std::string name, bool required = true) { + return {.name = std::move(name), .required = required, .positional = false}; +} +} // namespace + +FunctionRegistry::FunctionRegistry() { + registerFunction("filter", {{pos("input"), pos("predicate")}}, handleFilter); + + registerFunction( + "groupBy", {{pos("input"), pos("aggregates"), pos("columns", false)}}, handleGroupBy + ); + + registerFunction("project", {{pos("input"), pos("fields")}}, handleProject); + + auto mutations_sig = FunctionSignature{ + {pos("input"), named("minProportion"), named("sequenceNames", false), named("fields", false)} + }; + registerFunction("mutations", mutations_sig, handleMutations); + registerFunction("aminoAcidMutations", mutations_sig, handleMutations); + + auto insertions_sig = FunctionSignature{{pos("input"), named("sequenceNames", false)}}; + registerFunction("insertions", insertions_sig, handleInsertions); + registerFunction("aminoAcidInsertions", insertions_sig, handleInsertions); + + registerFunction("randomize", {{pos("input"), named("seed", false)}}, handleRandomize); + + registerFunction("limit", {{pos("input"), pos("count")}}, handleLimit); + + registerFunction("offset", {{pos("input"), pos("count")}}, handleOffset); + + registerFunction("orderBy", {{pos("input"), pos("fields")}}, handleOrderBy); + + registerFunction( + "mostRecentCommonAncestor", + {{pos("input"), pos("column"), named("printNodesNotInTree", false)}}, + handleMostRecentCommonAncestor + ); + + registerFunction( + "phyloSubtree", + {{pos("input"), + pos("column"), + named("printNodesNotInTree", false), + named("contractUnaryNodes", false)}}, + handlePhyloSubtree + ); +} + +FunctionRegistry& FunctionRegistry::instance() { + static FunctionRegistry registry; + return registry; +} + +FilterFunctionRegistry::FilterFunctionRegistry() { + registerFunction("between", {{pos("column"), pos("from"), pos("to")}}, handleBetween); + + registerFunction("in", {{pos("column"), pos("values")}}, handleIn); + + registerFunction("isNull", {{pos("column")}}, handleIsNull); + registerFunction("isNotNull", {{pos("column")}}, handleIsNotNull); + + registerFunction( + "lineage", + {{pos("column"), + pos("value"), + named("includeSublineages", false), + named("recombinantFollowingMode", false)}}, + handleLineage + ); + + registerFunction("phyloDescendantOf", {{pos("column"), pos("node")}}, handlePhyloDescendantOf); + + registerFunction("like", {{pos("column"), pos("pattern")}}, handleLike); + + auto symbol_equals_sig = + FunctionSignature{{named("position"), named("symbol"), named("sequenceName", false)}}; + registerFunction("nucleotideEquals", symbol_equals_sig, handleSymbolEquals); + registerFunction("aminoAcidEquals", symbol_equals_sig, handleSymbolEquals); + + auto has_mutation_sig = FunctionSignature{{named("position"), named("sequenceName", false)}}; + registerFunction("hasMutation", has_mutation_sig, handleHasMutation); + registerFunction("hasAAMutation", has_mutation_sig, handleHasMutation); + + auto insertion_contains_sig = + FunctionSignature{{named("position"), named("value"), named("sequenceName", false)}}; + registerFunction("insertionContains", insertion_contains_sig, handleInsertionContains); + registerFunction("aminoAcidInsertionContains", insertion_contains_sig, handleInsertionContains); + + registerFunction("exact", {{pos("child")}}, handleExact); + registerFunction("maybe", {{pos("child")}}, handleMaybe); + + registerFunction( + "nOf", {{pos("count"), pos("children"), named("matchExactly", false)}}, handleNOf + ); +} + +FilterFunctionRegistry& FilterFunctionRegistry::instance() { + static FilterFunctionRegistry registry; + return registry; +} + +} // namespace silo::query_engine::saneql diff --git a/src/silo/query_engine/saneql/ast_to_query.h b/src/silo/query_engine/saneql/ast_to_query.h new file mode 100644 index 000000000..8a4414200 --- /dev/null +++ b/src/silo/query_engine/saneql/ast_to_query.h @@ -0,0 +1,32 @@ +#pragma once + +#include +#include +#include + +#include "silo/query_engine/filter/expressions/expression.h" +#include "silo/query_engine/operators/query_node.h" +#include "silo/query_engine/saneql/ast.h" +#include "silo/schema/database_schema.h" +#include "silo/storage/table.h" + +namespace silo::query_engine::saneql { + +operators::QueryNodePtr convertToQueryTree( + const ast::Expression& ast, + const std::map>& tables +); + +operators::QueryNodePtr parseAndConvertToQueryTree( + std::string_view query_string, + const std::map>& tables +); + +operators::QueryNodePtr convertExpression( + const ast::Expression& ast, + const std::map>& tables +); + +std::unique_ptr convertToFilter(const ast::Expression& ast); + +} // namespace silo::query_engine::saneql diff --git a/src/silo/query_engine/saneql/function_registry.cpp b/src/silo/query_engine/saneql/function_registry.cpp new file mode 100644 index 000000000..ba8520919 --- /dev/null +++ b/src/silo/query_engine/saneql/function_registry.cpp @@ -0,0 +1,163 @@ +#include "silo/query_engine/saneql/function_registry.h" + +#include + +#include "silo/query_engine/illegal_query_exception.h" + +namespace silo::query_engine::saneql { + +// --- BoundArguments --- + +BoundArguments::BoundArguments( + std::string function_name, + std::map bound +) + : function_name_(std::move(function_name)), + bound_(std::move(bound)) {} + +const ast::Expression& BoundArguments::at(const std::string& name) const { + auto it = bound_.find(name); + CHECK_SILO_QUERY( + it != bound_.end(), "{}(): required argument '{}' is missing", function_name_, name + ); + return *it->second; +} + +const ast::Expression* BoundArguments::get(const std::string& name) const { + auto it = bound_.find(name); + if (it == bound_.end()) { + return nullptr; + } + return it->second; +} + +bool BoundArguments::has(const std::string& name) const { + return bound_.contains(name); +} + +const std::string& BoundArguments::functionName() const { + return function_name_; +} + +std::optional BoundArguments::getOptionalString(const std::string& name) const { + if (const auto* expr = get(name)) { + return extractStringLiteral(*expr); + } + return std::nullopt; +} + +std::optional BoundArguments::getOptionalUint32(const std::string& name) const { + if (const auto* expr = get(name)) { + int64_t value = extractIntLiteral(*expr); + CHECK_SILO_QUERY( + value >= 0, "If the action contains an {}, it must be a non-negative number", name + ); + return static_cast(value); + } + return std::nullopt; +} + +// NOLINTNEXTLINE(readability-function-cognitive-complexity) +BoundArguments bindArguments( + const std::string& function_name, + const FunctionSignature& signature, + const std::vector& positional, + const std::vector& named +) { + std::map bound; + + // Iterate through positional parameters in declaration order + size_t next_param = 0; + for (const auto& pos_arg : positional) { + // Find the next parameter that accepts positional binding + const ParameterDefinition* target = nullptr; + while (next_param < signature.parameters.size()) { + if (signature.parameters[next_param].positional) { + target = &signature.parameters[next_param]; + ++next_param; + break; + } + ++next_param; + } + CHECK_SILO_QUERY( + target != nullptr, "{}() received too many positional arguments", function_name + ); + bound[target->name] = pos_arg.value.get(); + } + + // Build set of valid parameter names for validation + std::set valid_names; + for (const auto& param : signature.parameters) { + valid_names.insert(param.name); + } + + // Bind named arguments + for (const auto& named_arg : named) { + CHECK_SILO_QUERY( + valid_names.contains(named_arg.name), + "{}() received unknown argument '{}'", + function_name, + named_arg.name + ); + CHECK_SILO_QUERY( + !bound.contains(named_arg.name), + "{}() received duplicate argument '{}' (already bound positionally)", + function_name, + named_arg.name + ); + bound[named_arg.name] = named_arg.value.get(); + } + + // Check that all required parameters are bound + for (const auto& param : signature.parameters) { + CHECK_SILO_QUERY( + !param.required || bound.contains(param.name), + "{}() requires argument '{}'", + function_name, + param.name + ); + } + + return {function_name, std::move(bound)}; +} + +// --- FunctionRegistry --- + +void FunctionRegistry::registerFunction( + std::string name, + FunctionSignature signature, + FunctionHandler handler +) { + entries_[std::move(name)] = + Entry{.signature = std::move(signature), .handler = std::move(handler)}; +} + +const FunctionRegistry::Entry* FunctionRegistry::findFunction(const std::string& name) const { + auto it = entries_.find(name); + if (it == entries_.end()) { + return nullptr; + } + return &it->second; +} + +// --- FilterFunctionRegistry --- + +void FilterFunctionRegistry::registerFunction( + std::string name, + FunctionSignature signature, + FilterHandler handler +) { + entries_[std::move(name)] = + Entry{.signature = std::move(signature), .handler = std::move(handler)}; +} + +const FilterFunctionRegistry::Entry* FilterFunctionRegistry::findFunction(const std::string& name +) const { + auto it = entries_.find(name); + if (it == entries_.end()) { + return nullptr; + } + return &it->second; +} + +} // namespace silo::query_engine::saneql diff --git a/src/silo/query_engine/saneql/function_registry.h b/src/silo/query_engine/saneql/function_registry.h new file mode 100644 index 000000000..d9af463ed --- /dev/null +++ b/src/silo/query_engine/saneql/function_registry.h @@ -0,0 +1,118 @@ +#pragma once + +#include +#include +#include +#include +#include + +#include "silo/query_engine/filter/expressions/expression.h" +#include "silo/query_engine/operators/query_node.h" +#include "silo/query_engine/saneql/ast.h" +#include "silo/storage/table.h" + +namespace silo::query_engine::saneql { + +using Tables = std::map>; + +struct ParameterDefinition { + std::string name; + bool required = true; + /// If false, the parameter can only be filled via named argument syntax. + bool positional = true; +}; + +struct FunctionSignature { + std::vector parameters; +}; + +/// Result of binding a FunctionCall's arguments against a FunctionSignature. +class BoundArguments { + public: + BoundArguments(std::string function_name, std::map bound); + + /// Returns the expression for a required parameter. Throws if absent. + [[nodiscard]] const ast::Expression& at(const std::string& name) const; + + /// Returns the expression for an optional parameter, or nullptr if absent. + [[nodiscard]] const ast::Expression* get(const std::string& name) const; + + [[nodiscard]] bool has(const std::string& name) const; + + [[nodiscard]] const std::string& functionName() const; + + [[nodiscard]] std::optional getOptionalString(const std::string& name) const; + + [[nodiscard]] std::optional getOptionalUint32(const std::string& name) const; + + private: + std::string function_name_; + std::map bound_; +}; + +/// Match positional then named arguments against a signature. +/// Errors on: too many positional args (non-variadic), unknown named args, +/// duplicate bindings, missing required parameters. +BoundArguments bindArguments( + const std::string& function_name, + const FunctionSignature& signature, + const std::vector& positional, + const std::vector& named +); + +// --- Pipeline function registry --- + +using ChildConverter = + std::function; + +using FunctionHandler = std::function; + +class FunctionRegistry { + public: + struct Entry { + FunctionSignature signature; + FunctionHandler handler; + }; + + FunctionRegistry(); + + void registerFunction(std::string name, FunctionSignature signature, FunctionHandler handler); + + [[nodiscard]] const Entry* findFunction(const std::string& name) const; + + [[nodiscard]] static FunctionRegistry& instance(); + + private: + std::map entries_; +}; + +// --- Filter function registry --- + +using FilterPtr = std::unique_ptr; + +using FilterHandler = std::function; + +class FilterFunctionRegistry { + public: + struct Entry { + FunctionSignature signature; + FilterHandler handler; + }; + + FilterFunctionRegistry(); + + void registerFunction(std::string name, FunctionSignature signature, FilterHandler handler); + + [[nodiscard]] const Entry* findFunction(const std::string& name) const; + + [[nodiscard]] static FilterFunctionRegistry& instance(); + + private: + std::map entries_; +}; + +} // namespace silo::query_engine::saneql diff --git a/src/silo/query_engine/saneql/lexer.cpp b/src/silo/query_engine/saneql/lexer.cpp new file mode 100644 index 000000000..675926de5 --- /dev/null +++ b/src/silo/query_engine/saneql/lexer.cpp @@ -0,0 +1,319 @@ +#include "silo/query_engine/saneql/lexer.h" + +#include +#include + +#include "silo/query_engine/saneql/parse_exception.h" + +namespace silo::query_engine::saneql { + +Lexer::Lexer(std::string_view input) + : input(input) {} + +bool Lexer::isAtEnd() const { + return position >= input.size(); +} + +char Lexer::peek() const { + if (isAtEnd()) { + return '\0'; + } + return input[position]; +} + +char Lexer::peekNext() const { + if (position + 1 >= input.size()) { + return '\0'; + } + return input[position + 1]; +} + +char Lexer::advance() { + const char current = input[position]; + position++; + if (current == '\n') { + current_location.line++; + current_location.column = 1; + } else { + current_location.column++; + } + return current; +} + +void Lexer::skipWhitespace() { + while (!isAtEnd()) { + const char current = peek(); + if (current == ' ' || current == '\t' || current == '\n' || current == '\r') { + advance(); + } else if (current == '-' && peekNext() == '-') { + while (!isAtEnd() && peek() != '\n') { + advance(); + } + } else { + break; + } + } +} + +Token Lexer::makeToken(TokenType type, SourceLocation loc) { + return Token{.type = type, .value = std::monostate{}, .location = loc}; +} + +Token Lexer::makeToken(TokenType type, TokenValue value, SourceLocation loc) { + return Token{.type = type, .value = std::move(value), .location = loc}; +} + +Token Lexer::readString() { + const SourceLocation start = current_location; + advance(); // consume opening quote + + std::string result; + while (!isAtEnd() && peek() != '\'') { + if (peek() == '\\') { + advance(); + if (isAtEnd()) { + throw ParseException("Unterminated string literal", start); + } + const char escaped = advance(); + switch (escaped) { + case '\'': + result += '\''; + break; + case '\\': + result += '\\'; + break; + case 'n': + result += '\n'; + break; + case 't': + result += '\t'; + break; + default: + result += '\\'; + result += escaped; + break; + } + } else { + result += advance(); + } + } + + if (isAtEnd()) { + throw ParseException("Unterminated string literal", start); + } + advance(); // consume closing quote + + return makeToken(TokenType::STRING_LITERAL, std::move(result), start); +} + +Token Lexer::readQuotedIdentifier() { + const SourceLocation start = current_location; + advance(); // consume opening double quote + + std::string result; + while (!isAtEnd()) { + if (peek() == '"') { + advance(); // consume the quote + if (!isAtEnd() && peek() == '"') { + // Escaped double quote: "" β†’ " + result += advance(); + } else { + // End of quoted identifier + return makeToken(TokenType::IDENTIFIER, std::move(result), start); + } + } else { + result += advance(); + } + } + + throw ParseException("Unterminated quoted identifier", start); +} + +Token Lexer::readNumber() { + const SourceLocation start = current_location; + const size_t num_start = position; + + if (peek() == '-') { + advance(); + } + + while (!isAtEnd() && std::isdigit(static_cast(peek()))) { + advance(); + } + + bool is_float = false; + if (!isAtEnd() && peek() == '.' && std::isdigit(static_cast(peekNext()))) { + is_float = true; + advance(); // consume '.' + while (!isAtEnd() && std::isdigit(static_cast(peek()))) { + advance(); + } + } + + const std::string_view num_str = input.substr(num_start, position - num_start); + + if (is_float) { + double val = 0; + auto [ptr, ec] = std::from_chars(num_str.data(), num_str.data() + num_str.size(), val); + if (ec != std::errc()) { + throw ParseException("Invalid float literal", start); + } + return makeToken(TokenType::FLOAT_LITERAL, val, start); + } + + int64_t val = 0; + auto [ptr, ec] = std::from_chars(num_str.data(), num_str.data() + num_str.size(), val); + if (ec != std::errc()) { + throw ParseException("Invalid integer literal", start); + } + return makeToken(TokenType::INT_LITERAL, val, start); +} + +Token Lexer::readIdentifierOrKeyword() { + const SourceLocation start = current_location; + const size_t id_start = position; + + while (!isAtEnd() && + (std::isalnum(static_cast(peek())) || peek() == '_' || peek() == '.')) { + // Only allow dot if followed by alnum or underscore (for qualified names like segment1.A123T) + if (peek() == '.') { + // Don't consume dot as part of identifier - let it be a separate token + break; + } + advance(); + } + + std::string identifier(input.substr(id_start, position - id_start)); + + if (identifier == "true") { + return makeToken(TokenType::BOOL_LITERAL, true, start); + } + if (identifier == "false") { + return makeToken(TokenType::BOOL_LITERAL, false, start); + } + if (identifier == "null") { + return makeToken(TokenType::NULL_LITERAL, std::monostate{}, start); + } + + return makeToken(TokenType::IDENTIFIER, std::move(identifier), start); +} + +// NOLINTNEXTLINE(readability-function-cognitive-complexity) +Token Lexer::nextToken() { + skipWhitespace(); + + if (isAtEnd()) { + return makeToken(TokenType::END_OF_FILE, current_location); + } + + const SourceLocation start = current_location; + char current = peek(); + + if (current == '"') { + return readQuotedIdentifier(); + } + + if (current == '\'') { + return readString(); + } + + if (std::isdigit(static_cast(current))) { + return readNumber(); + } + + if (current == '-' && position + 1 < input.size() && + std::isdigit(static_cast(input[position + 1]))) { + return readNumber(); + } + + if (std::isalpha(static_cast(current)) || current == '_') { + return readIdentifierOrKeyword(); + } + + switch (current) { + case '.': + advance(); + return makeToken(TokenType::DOT, start); + case ',': + advance(); + return makeToken(TokenType::COMMA, start); + case '(': + advance(); + return makeToken(TokenType::LEFT_PAREN, start); + case ')': + advance(); + return makeToken(TokenType::RIGHT_PAREN, start); + case '{': + advance(); + return makeToken(TokenType::LEFT_BRACE, start); + case '}': + advance(); + return makeToken(TokenType::RIGHT_BRACE, start); + case '!': + advance(); + return makeToken(TokenType::NOT, start); + case '=': + advance(); + return makeToken(TokenType::EQUALS, start); + case '<': + advance(); + if (!isAtEnd() && peek() == '>') { + advance(); + return makeToken(TokenType::NOT_EQUALS, start); + } + if (!isAtEnd() && peek() == '=') { + advance(); + return makeToken(TokenType::LESS_EQUAL, start); + } + return makeToken(TokenType::LESS_THAN, start); + case '>': + advance(); + if (!isAtEnd() && peek() == '=') { + advance(); + return makeToken(TokenType::GREATER_EQUAL, start); + } + return makeToken(TokenType::GREATER_THAN, start); + case '&': + advance(); + if (!isAtEnd() && peek() == '&') { + advance(); + return makeToken(TokenType::AND, start); + } + throw ParseException("Expected '&&'", start); + case '|': + advance(); + if (!isAtEnd() && peek() == '|') { + advance(); + return makeToken(TokenType::OR, start); + } + throw ParseException("Expected '||'", start); + case ':': + advance(); + if (!isAtEnd() && peek() == ':') { + advance(); + return makeToken(TokenType::DOUBLE_COLON, start); + } + if (!isAtEnd() && peek() == '=') { + advance(); + return makeToken(TokenType::COLON_EQUALS, start); + } + throw ParseException("Expected '::' or ':='", start); + default: + advance(); + throw ParseException(fmt::format("Unexpected character '{}'", current), start); + } +} + +std::vector Lexer::tokenizeAll() { + std::vector tokens; + while (true) { + const Token token = nextToken(); + tokens.push_back(token); + if (token.type == TokenType::END_OF_FILE) { + break; + } + } + return tokens; +} + +} // namespace silo::query_engine::saneql diff --git a/src/silo/query_engine/saneql/lexer.h b/src/silo/query_engine/saneql/lexer.h new file mode 100644 index 000000000..2d729a440 --- /dev/null +++ b/src/silo/query_engine/saneql/lexer.h @@ -0,0 +1,38 @@ +#pragma once + +#include +#include + +#include "silo/query_engine/saneql/source_location.h" +#include "silo/query_engine/saneql/token.h" + +namespace silo::query_engine::saneql { + +class Lexer { + std::string_view input; + size_t position = 0; + SourceLocation current_location; + + public: + explicit Lexer(std::string_view input); + + [[nodiscard]] Token nextToken(); + [[nodiscard]] std::vector tokenizeAll(); + + private: + [[nodiscard]] char peek() const; + [[nodiscard]] char peekNext() const; + char advance(); + void skipWhitespace(); + [[nodiscard]] bool isAtEnd() const; + + [[nodiscard]] static Token makeToken(TokenType type, SourceLocation loc); + [[nodiscard]] static Token makeToken(TokenType type, TokenValue value, SourceLocation loc); + + [[nodiscard]] Token readString(); + [[nodiscard]] Token readQuotedIdentifier(); + [[nodiscard]] Token readNumber(); + [[nodiscard]] Token readIdentifierOrKeyword(); +}; + +} // namespace silo::query_engine::saneql diff --git a/src/silo/query_engine/saneql/lexer.test.cpp b/src/silo/query_engine/saneql/lexer.test.cpp new file mode 100644 index 000000000..2e2145604 --- /dev/null +++ b/src/silo/query_engine/saneql/lexer.test.cpp @@ -0,0 +1,264 @@ +#include "silo/query_engine/saneql/lexer.h" + +#include +#include + +#include "silo/query_engine/saneql/parse_exception.h" + +using silo::query_engine::saneql::Lexer; +using silo::query_engine::saneql::ParseException; +using silo::query_engine::saneql::TokenType; + +TEST(SaneQLLexer, tokenizesEmptyInput) { + Lexer lexer(""); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 1); + EXPECT_EQ(tokens[0].type, TokenType::END_OF_FILE); +} + +TEST(SaneQLLexer, tokenizesIdentifier) { + Lexer lexer("country"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 2); + EXPECT_EQ(tokens[0].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[0].getStringValue(), "country"); + EXPECT_EQ(tokens[1].type, TokenType::END_OF_FILE); +} + +TEST(SaneQLLexer, tokenizesStringLiteral) { + Lexer lexer("'hello world'"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 2); + EXPECT_EQ(tokens[0].type, TokenType::STRING_LITERAL); + EXPECT_EQ(tokens[0].getStringValue(), "hello world"); +} + +TEST(SaneQLLexer, tokenizesIntLiteral) { + Lexer lexer("42"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 2); + EXPECT_EQ(tokens[0].type, TokenType::INT_LITERAL); + EXPECT_EQ(tokens[0].getIntValue(), 42); +} + +TEST(SaneQLLexer, tokenizesFloatLiteral) { + Lexer lexer("3.14"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 2); + EXPECT_EQ(tokens[0].type, TokenType::FLOAT_LITERAL); + EXPECT_DOUBLE_EQ(tokens[0].getFloatValue(), 3.14); +} + +TEST(SaneQLLexer, tokenizesBoolLiterals) { + Lexer lexer("true false"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 3); + EXPECT_EQ(tokens[0].type, TokenType::BOOL_LITERAL); + EXPECT_TRUE(tokens[0].getBoolValue()); + EXPECT_EQ(tokens[1].type, TokenType::BOOL_LITERAL); + EXPECT_FALSE(tokens[1].getBoolValue()); +} + +TEST(SaneQLLexer, tokenizesNullLiteral) { + Lexer lexer("null"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 2); + EXPECT_EQ(tokens[0].type, TokenType::NULL_LITERAL); +} + +TEST(SaneQLLexer, tokenizesOperators) { + Lexer lexer("= <> < > <= >= && || !"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 10); + EXPECT_EQ(tokens[0].type, TokenType::EQUALS); + EXPECT_EQ(tokens[1].type, TokenType::NOT_EQUALS); + EXPECT_EQ(tokens[2].type, TokenType::LESS_THAN); + EXPECT_EQ(tokens[3].type, TokenType::GREATER_THAN); + EXPECT_EQ(tokens[4].type, TokenType::LESS_EQUAL); + EXPECT_EQ(tokens[5].type, TokenType::GREATER_EQUAL); + EXPECT_EQ(tokens[6].type, TokenType::AND); + EXPECT_EQ(tokens[7].type, TokenType::OR); + EXPECT_EQ(tokens[8].type, TokenType::NOT); +} + +TEST(SaneQLLexer, tokenizesPunctuation) { + Lexer lexer(". , ( ) { } :: :="); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 9); + EXPECT_EQ(tokens[0].type, TokenType::DOT); + EXPECT_EQ(tokens[1].type, TokenType::COMMA); + EXPECT_EQ(tokens[2].type, TokenType::LEFT_PAREN); + EXPECT_EQ(tokens[3].type, TokenType::RIGHT_PAREN); + EXPECT_EQ(tokens[4].type, TokenType::LEFT_BRACE); + EXPECT_EQ(tokens[5].type, TokenType::RIGHT_BRACE); + EXPECT_EQ(tokens[6].type, TokenType::DOUBLE_COLON); + EXPECT_EQ(tokens[7].type, TokenType::COLON_EQUALS); +} + +TEST(SaneQLLexer, tokenizesMethodCallChain) { + Lexer lexer("default.filter(country = 'USA').groupBy({count:=count()})"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 20); + EXPECT_EQ(tokens[0].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[0].getStringValue(), "default"); + EXPECT_EQ(tokens[1].type, TokenType::DOT); + EXPECT_EQ(tokens[2].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[2].getStringValue(), "filter"); + EXPECT_EQ(tokens[3].type, TokenType::LEFT_PAREN); + EXPECT_EQ(tokens[4].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[4].getStringValue(), "country"); + EXPECT_EQ(tokens[5].type, TokenType::EQUALS); + EXPECT_EQ(tokens[6].type, TokenType::STRING_LITERAL); + EXPECT_EQ(tokens[6].getStringValue(), "USA"); + EXPECT_EQ(tokens[7].type, TokenType::RIGHT_PAREN); + EXPECT_EQ(tokens[8].type, TokenType::DOT); + EXPECT_EQ(tokens[9].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[9].getStringValue(), "groupBy"); + EXPECT_EQ(tokens[10].type, TokenType::LEFT_PAREN); + EXPECT_EQ(tokens[11].type, TokenType::LEFT_BRACE); + EXPECT_EQ(tokens[12].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[13].type, TokenType::COLON_EQUALS); + EXPECT_EQ(tokens[14].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[15].type, TokenType::LEFT_PAREN); + EXPECT_EQ(tokens[16].type, TokenType::RIGHT_PAREN); + EXPECT_EQ(tokens[17].type, TokenType::RIGHT_BRACE); + EXPECT_EQ(tokens[18].type, TokenType::RIGHT_PAREN); + EXPECT_EQ(tokens[19].type, TokenType::END_OF_FILE); +} + +TEST(SaneQLLexer, tokenizesNamedParameters) { + Lexer lexer("hasMutation(position:=1000, sequenceName:='segment1')"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 11); + EXPECT_EQ(tokens[0].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[1].type, TokenType::LEFT_PAREN); + EXPECT_EQ(tokens[2].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[2].getStringValue(), "position"); + EXPECT_EQ(tokens[3].type, TokenType::COLON_EQUALS); + EXPECT_EQ(tokens[4].type, TokenType::INT_LITERAL); + EXPECT_EQ(tokens[5].type, TokenType::COMMA); + EXPECT_EQ(tokens[6].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[7].type, TokenType::COLON_EQUALS); + EXPECT_EQ(tokens[8].type, TokenType::STRING_LITERAL); + EXPECT_EQ(tokens[9].type, TokenType::RIGHT_PAREN); +} + +TEST(SaneQLLexer, tokenizesTypeCast) { + Lexer lexer("'2020-01-01'::date"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 4); + EXPECT_EQ(tokens[0].type, TokenType::STRING_LITERAL); + EXPECT_EQ(tokens[1].type, TokenType::DOUBLE_COLON); + EXPECT_EQ(tokens[2].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[2].getStringValue(), "date"); +} + +TEST(SaneQLLexer, skipsWhitespaceAndNewlines) { + Lexer lexer(" metadata\n .filter(\n country = 'USA'\n )"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 9); + EXPECT_EQ(tokens[0].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[1].type, TokenType::DOT); +} + +TEST(SaneQLLexer, tracksLineAndColumn) { + Lexer lexer("a\nb"); + auto tokens = lexer.tokenizeAll(); + EXPECT_EQ(tokens[0].location.line, 1); + EXPECT_EQ(tokens[0].location.column, 1); + EXPECT_EQ(tokens[1].location.line, 2); + EXPECT_EQ(tokens[1].location.column, 1); +} + +TEST(SaneQLLexer, throwsOnUnterminatedString) { + const Lexer lexer("'unterminated"); + EXPECT_THAT( + []() { + Lexer lexer("'unterminated"); + (void)lexer.nextToken(); + }, + ThrowsMessage( + ::testing::HasSubstr("Parse error at 1:1: Unterminated string literal") + ) + ); +} + +TEST(SaneQLLexer, throwsOnInvalidCharacter) { + EXPECT_THAT( + []() { + Lexer lexer("@"); + (void)lexer.nextToken(); + }, + ThrowsMessage( + ::testing::HasSubstr("Parse error at 1:1: Unexpected character '@'") + ) + ); +} + +TEST(SaneQLLexer, skipsLineComments) { + Lexer lexer("a -- this is a comment\nb"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 3); + EXPECT_EQ(tokens[0].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[0].getStringValue(), "a"); + EXPECT_EQ(tokens[1].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[1].getStringValue(), "b"); +} + +TEST(SaneQLLexer, tokenizesSetLiteral) { + Lexer lexer("{'USA', 'Germany', 'France'}"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 8); + EXPECT_EQ(tokens[0].type, TokenType::LEFT_BRACE); + EXPECT_EQ(tokens[1].type, TokenType::STRING_LITERAL); + EXPECT_EQ(tokens[1].getStringValue(), "USA"); + EXPECT_EQ(tokens[2].type, TokenType::COMMA); + EXPECT_EQ(tokens[3].type, TokenType::STRING_LITERAL); + EXPECT_EQ(tokens[4].type, TokenType::COMMA); + EXPECT_EQ(tokens[5].type, TokenType::STRING_LITERAL); + EXPECT_EQ(tokens[6].type, TokenType::RIGHT_BRACE); +} + +TEST(SaneQLLexer, handlesEscapedQuotesInStrings) { + Lexer lexer("'it\\'s'"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 2); + EXPECT_EQ(tokens[0].type, TokenType::STRING_LITERAL); + EXPECT_EQ(tokens[0].getStringValue(), "it's"); +} + +TEST(SaneQLLexer, tokenizesQuotedIdentifier) { + Lexer lexer(R"("my column")"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 2); + EXPECT_EQ(tokens[0].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[0].getStringValue(), "my column"); +} + +TEST(SaneQLLexer, quotedIdentifierWithEscapedDoubleQuote) { + Lexer lexer(R"("say ""hello""")"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 2); + EXPECT_EQ(tokens[0].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[0].getStringValue(), R"(say "hello")"); +} + +TEST(SaneQLLexer, throwsOnUnterminatedQuotedIdentifier) { + EXPECT_THAT( + []() { + Lexer lexer(R"("unterminated)"); + (void)lexer.nextToken(); + }, + ThrowsMessage( + ::testing::HasSubstr("Parse error at 1:1: Unterminated quoted identifier") + ) + ); +} + +TEST(SaneQLLexer, quotedIdentifierWithNumericName) { + Lexer lexer(R"("2")"); + auto tokens = lexer.tokenizeAll(); + ASSERT_EQ(tokens.size(), 2); + EXPECT_EQ(tokens[0].type, TokenType::IDENTIFIER); + EXPECT_EQ(tokens[0].getStringValue(), "2"); +} diff --git a/src/silo/query_engine/saneql/parse_exception.cpp b/src/silo/query_engine/saneql/parse_exception.cpp new file mode 100644 index 000000000..5e14b0544 --- /dev/null +++ b/src/silo/query_engine/saneql/parse_exception.cpp @@ -0,0 +1,11 @@ +#include "silo/query_engine/saneql/parse_exception.h" + +#include + +namespace silo::query_engine::saneql { + +ParseException::ParseException(const std::string& message, SourceLocation location) + : std::runtime_error(fmt::format("Parse error at {}: {}", location.toString(), message)), + location(location) {} + +} // namespace silo::query_engine::saneql diff --git a/src/silo/query_engine/saneql/parse_exception.h b/src/silo/query_engine/saneql/parse_exception.h new file mode 100644 index 000000000..6e52c4065 --- /dev/null +++ b/src/silo/query_engine/saneql/parse_exception.h @@ -0,0 +1,34 @@ +#pragma once + +#include +#include + +#include + +#include "silo/query_engine/saneql/source_location.h" + +namespace silo::query_engine::saneql { + +class ParseException : public std::runtime_error { + SourceLocation location; + + public: + explicit ParseException(const std::string& message, SourceLocation location = {}); + + template + explicit ParseException( + SourceLocation location, + fmt::format_string fmt_str, + Args&&... args + ) + : std::runtime_error(fmt::format( + "Parse error at {}: {}", + location.toString(), + fmt::format(fmt_str, std::forward(args)...) + )), + location(location) {} + + [[nodiscard]] SourceLocation getLocation() const { return location; } +}; + +} // namespace silo::query_engine::saneql diff --git a/src/silo/query_engine/saneql/parser.cpp b/src/silo/query_engine/saneql/parser.cpp new file mode 100644 index 000000000..01eaddfe4 --- /dev/null +++ b/src/silo/query_engine/saneql/parser.cpp @@ -0,0 +1,382 @@ +#include "silo/query_engine/saneql/parser.h" + +#include "silo/common/panic.h" +#include "silo/query_engine/saneql/ast.h" +#include "silo/query_engine/saneql/parse_exception.h" + +namespace silo::query_engine::saneql { + +Parser::Parser(std::string_view input) + : lexer(input), + current_token(lexer.nextToken()) {} + +void Parser::advance() { + current_token = lexer.nextToken(); +} + +const Token& Parser::current() const { + return current_token; +} + +Token Parser::expect(TokenType type) { + if (current_token.type != type) { + throw ParseException( + current_token.location, + "Expected {} but got {}", + tokenTypeToString(type), + tokenTypeToString(current_token.type) + ); + } + Token token = current_token; + advance(); + return token; +} + +bool Parser::check(TokenType type) const { + return current_token.type == type; +} + +bool Parser::match(TokenType type) { + if (check(type)) { + advance(); + return true; + } + return false; +} + +ast::ExpressionPtr Parser::parse() { + auto expr = parseExpression(); + expect(TokenType::END_OF_FILE); + return expr; +} + +// NOLINTNEXTLINE(misc-no-recursion) +ast::ExpressionPtr Parser::parseExpression() { + return parseOrExpr(); +} + +// NOLINTNEXTLINE(misc-no-recursion) +ast::ExpressionPtr Parser::parseOrExpr() { + auto left = parseAndExpr(); + + while (check(TokenType::OR)) { + const SourceLocation loc = current().location; + advance(); + auto right = parseAndExpr(); + left = ast::makeExpr( + ast::BinaryExpr{ + .op = ast::BinaryOp::OR, .left = std::move(left), .right = std::move(right) + }, + loc + ); + } + + return left; +} + +// NOLINTNEXTLINE(misc-no-recursion) +ast::ExpressionPtr Parser::parseAndExpr() { + auto left = parseNotExpr(); + + while (check(TokenType::AND)) { + const SourceLocation loc = current().location; + advance(); + auto right = parseNotExpr(); + left = ast::makeExpr( + ast::BinaryExpr{ + .op = ast::BinaryOp::AND, .left = std::move(left), .right = std::move(right) + }, + loc + ); + } + + return left; +} + +// NOLINTNEXTLINE(misc-no-recursion) +ast::ExpressionPtr Parser::parseNotExpr() { + if (check(TokenType::NOT)) { + const SourceLocation loc = current().location; + advance(); + auto operand = parseNotExpr(); + return ast::makeExpr(ast::UnaryNotExpr{std::move(operand)}, loc); + } + return parseComparisonExpr(); +} + +// NOLINTNEXTLINE(misc-no-recursion) +ast::ExpressionPtr Parser::parseComparisonExpr() { + auto left = parsePostfixExpr(); + + if (check(TokenType::EQUALS) || check(TokenType::NOT_EQUALS) || check(TokenType::LESS_THAN) || + check(TokenType::LESS_EQUAL) || check(TokenType::GREATER_THAN) || + check(TokenType::GREATER_EQUAL)) { + const SourceLocation loc = current().location; + ast::BinaryOp op; + switch (current().type) { + case TokenType::EQUALS: + op = ast::BinaryOp::EQUALS; + break; + case TokenType::NOT_EQUALS: + op = ast::BinaryOp::NOT_EQUALS; + break; + case TokenType::LESS_THAN: + op = ast::BinaryOp::LESS_THAN; + break; + case TokenType::LESS_EQUAL: + op = ast::BinaryOp::LESS_EQUAL; + break; + case TokenType::GREATER_THAN: + op = ast::BinaryOp::GREATER_THAN; + break; + case TokenType::GREATER_EQUAL: + op = ast::BinaryOp::GREATER_EQUAL; + break; + default: + SILO_UNREACHABLE(); + } + advance(); + auto right = parsePostfixExpr(); + left = ast::makeExpr( + ast::BinaryExpr{.op = op, .left = std::move(left), .right = std::move(right)}, loc + ); + } + + return left; +} + +// NOLINTNEXTLINE(misc-no-recursion) +ast::ExpressionPtr Parser::parsePostfixExpr() { + auto expr = parsePrimaryExpr(); + + while (true) { + if (check(TokenType::DOT)) { + advance(); + const Token method_name = expect(TokenType::IDENTIFIER); + if (check(TokenType::LEFT_PAREN)) { + advance(); + ParsedArgs parsed; + // Insert the receiver as the first positional argument + parsed.positional.push_back( + ast::PositionalArgument{.value = std::move(expr), .location = method_name.location} + ); + if (!check(TokenType::RIGHT_PAREN)) { + auto rest = parseArgList(); + parsed.positional.insert( + parsed.positional.end(), + std::make_move_iterator(rest.positional.begin()), + std::make_move_iterator(rest.positional.end()) + ); + parsed.named = std::move(rest.named); + } + expect(TokenType::RIGHT_PAREN); + expr = ast::makeExpr( + ast::FunctionCall{ + .function_name = method_name.getStringValue(), + .positional_arguments = std::move(parsed.positional), + .named_arguments = std::move(parsed.named) + }, + method_name.location + ); + } else { + // Property access β€” treat as function call with receiver as sole arg + std::vector pos_args; + pos_args.push_back( + ast::PositionalArgument{.value = std::move(expr), .location = method_name.location} + ); + expr = ast::makeExpr( + ast::FunctionCall{ + .function_name = method_name.getStringValue(), + .positional_arguments = std::move(pos_args), + .named_arguments = {} + }, + method_name.location + ); + } + } else if (check(TokenType::DOUBLE_COLON)) { + const SourceLocation loc = current().location; + advance(); + const Token type_name = expect(TokenType::IDENTIFIER); + expr = ast::makeExpr( + ast::TypeCast{.operand = std::move(expr), .target_type = type_name.getStringValue()}, + loc + ); + } else { + break; + } + } + + return expr; +} + +// NOLINTNEXTLINE(misc-no-recursion) +ast::ExpressionPtr Parser::parsePrimaryExpr() { + if (check(TokenType::LEFT_PAREN)) { + advance(); + auto expr = parseExpression(); + expect(TokenType::RIGHT_PAREN); + return expr; + } + + if (check(TokenType::LEFT_BRACE)) { + return parseSetOrRecordExpression(); + } + + if (check(TokenType::IDENTIFIER)) { + return parseIdentifierOrFunctionCall(); + } + + return parseLiteral(); +} + +// NOLINTNEXTLINE(misc-no-recursion) +ast::ExpressionPtr Parser::parseSetOrRecordExpression() { + const SourceLocation loc = current().location; + expect(TokenType::LEFT_BRACE); + // Empty braces: empty set + if (check(TokenType::RIGHT_BRACE)) { + advance(); + return ast::makeExpr(ast::SetLiteral{{}}, loc); + } + // Peek: if first element is `identifier :=`, parse as RecordLiteral + if (check(TokenType::IDENTIFIER)) { + auto first_expression = parseExpression(); + if (holds_alternative(first_expression->value) && + check(TokenType::COLON_EQUALS)) { + // RecordLiteral: {name := expr, ...} + advance(); + std::vector fields; + auto first_value = parseExpression(); + fields.push_back({first_expression->toString(), std::move(first_value)}); + while (match(TokenType::COMMA)) { + const Token field_name = expect(TokenType::IDENTIFIER); + expect(TokenType::COLON_EQUALS); + auto value = parseExpression(); + fields.push_back({field_name.getStringValue(), std::move(value)}); + } + expect(TokenType::RIGHT_BRACE); + return ast::makeExpr(ast::RecordLiteral{std::move(fields)}, loc); + } + std::vector elements; + elements.push_back(std::move(first_expression)); + while (match(TokenType::COMMA)) { + elements.push_back(parseExpression()); + } + expect(TokenType::RIGHT_BRACE); + return ast::makeExpr(ast::SetLiteral{std::move(elements)}, loc); + } + // Non-identifier first element: regular SetLiteral + std::vector elements; + elements.push_back(parseExpression()); + while (match(TokenType::COMMA)) { + elements.push_back(parseExpression()); + } + expect(TokenType::RIGHT_BRACE); + return ast::makeExpr(ast::SetLiteral{std::move(elements)}, loc); +} + +// NOLINTNEXTLINE(misc-no-recursion) +ast::ExpressionPtr Parser::parseIdentifierOrFunctionCall() { + const SourceLocation loc = current().location; + std::string name = current().getStringValue(); + advance(); + + if (check(TokenType::LEFT_PAREN)) { + advance(); + ParsedArgs parsed; + if (!check(TokenType::RIGHT_PAREN)) { + parsed = parseArgList(); + } + expect(TokenType::RIGHT_PAREN); + return ast::makeExpr( + ast::FunctionCall{ + .function_name = std::move(name), + .positional_arguments = std::move(parsed.positional), + .named_arguments = std::move(parsed.named) + }, + loc + ); + } + + return ast::makeExpr(ast::Identifier{std::move(name)}, loc); +} + +ast::ExpressionPtr Parser::parseLiteral() { + const SourceLocation loc = current().location; + + if (check(TokenType::STRING_LITERAL)) { + std::string val = current().getStringValue(); + advance(); + return ast::makeExpr(ast::StringLiteral{std::move(val)}, loc); + } + + if (check(TokenType::INT_LITERAL)) { + const int64_t val = current().getIntValue(); + advance(); + return ast::makeExpr(ast::IntLiteral{val}, loc); + } + + if (check(TokenType::FLOAT_LITERAL)) { + const double val = current().getFloatValue(); + advance(); + return ast::makeExpr(ast::FloatLiteral{val}, loc); + } + + if (check(TokenType::BOOL_LITERAL)) { + const bool val = current().getBoolValue(); + advance(); + return ast::makeExpr(ast::BoolLiteral{val}, loc); + } + + if (check(TokenType::NULL_LITERAL)) { + advance(); + return ast::makeExpr(ast::NullLiteral{}, loc); + } + + throw ParseException(loc, "Unexpected token {}", tokenTypeToString(current().type)); +} + +// NOLINTNEXTLINE(misc-no-recursion) +Parser::ParsedArgs Parser::parseArgList() { + ParsedArgs result; + bool seen_named = false; + + // NOLINTNEXTLINE(misc-no-recursion) + auto parse_one = [&]() { + const SourceLocation loc = current().location; + if (check(TokenType::IDENTIFIER)) { + auto expr = parseExpression(); + if (holds_alternative(expr->value) && check(TokenType::COLON_EQUALS)) { + advance(); + auto value = parseExpression(); + seen_named = true; + result.named.push_back(ast::NamedArgument{ + .name = expr->toString(), .value = std::move(value), .location = loc + }); + return; + } + if (seen_named) { + throw ParseException(loc, "positional argument after named argument is not allowed"); + } + result.positional.push_back( + ast::PositionalArgument{.value = std::move(expr), .location = loc} + ); + return; + } + if (seen_named) { + throw ParseException(loc, "positional argument after named argument is not allowed"); + } + auto value = parseExpression(); + result.positional.push_back( + ast::PositionalArgument{.value = std::move(value), .location = loc} + ); + }; + + parse_one(); + while (match(TokenType::COMMA)) { + parse_one(); + } + return result; +} + +} // namespace silo::query_engine::saneql diff --git a/src/silo/query_engine/saneql/parser.h b/src/silo/query_engine/saneql/parser.h new file mode 100644 index 000000000..d2668cf84 --- /dev/null +++ b/src/silo/query_engine/saneql/parser.h @@ -0,0 +1,47 @@ +#pragma once + +#include +#include +#include + +#include "silo/query_engine/saneql/ast.h" +#include "silo/query_engine/saneql/lexer.h" +#include "silo/query_engine/saneql/token.h" + +namespace silo::query_engine::saneql { + +class Parser { + Lexer lexer; + Token current_token; + + public: + explicit Parser(std::string_view input); + + [[nodiscard]] ast::ExpressionPtr parse(); + + private: + void advance(); + [[nodiscard]] const Token& current() const; + Token expect(TokenType type); + [[nodiscard]] bool check(TokenType type) const; + bool match(TokenType type); + + [[nodiscard]] ast::ExpressionPtr parseExpression(); + [[nodiscard]] ast::ExpressionPtr parseOrExpr(); + [[nodiscard]] ast::ExpressionPtr parseAndExpr(); + [[nodiscard]] ast::ExpressionPtr parseNotExpr(); + [[nodiscard]] ast::ExpressionPtr parseComparisonExpr(); + [[nodiscard]] ast::ExpressionPtr parsePostfixExpr(); + [[nodiscard]] ast::ExpressionPtr parsePrimaryExpr(); + [[nodiscard]] ast::ExpressionPtr parseSetOrRecordExpression(); + [[nodiscard]] ast::ExpressionPtr parseIdentifierOrFunctionCall(); + [[nodiscard]] ast::ExpressionPtr parseLiteral(); + + struct ParsedArgs { + std::vector positional; + std::vector named; + }; + [[nodiscard]] ParsedArgs parseArgList(); +}; + +} // namespace silo::query_engine::saneql diff --git a/src/silo/query_engine/saneql/parser.test.cpp b/src/silo/query_engine/saneql/parser.test.cpp new file mode 100644 index 000000000..878b4f65a --- /dev/null +++ b/src/silo/query_engine/saneql/parser.test.cpp @@ -0,0 +1,229 @@ +#include "silo/query_engine/saneql/parser.h" + +#include +#include + +#include "silo/query_engine/saneql/ast.h" +#include "silo/query_engine/saneql/parse_exception.h" + +using silo::query_engine::saneql::ParseException; +using silo::query_engine::saneql::Parser; +namespace ast = silo::query_engine::saneql::ast; + +TEST(SaneQLParser, parsesIdentifier) { + Parser parser("country"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "country"); +} + +TEST(SaneQLParser, parsesStringLiteral) { + Parser parser("'USA'"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "'USA'"); +} + +TEST(SaneQLParser, parsesIntLiteral) { + Parser parser("42"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "42"); +} + +TEST(SaneQLParser, parsesFloatLiteral) { + Parser parser("3.14"); + auto expr = parser.parse(); + ASSERT_TRUE(std::holds_alternative(expr->value)); + EXPECT_DOUBLE_EQ(std::get(expr->value).value, 3.14); +} + +TEST(SaneQLParser, parsesBoolLiteral) { + Parser parser("true"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "true"); +} + +TEST(SaneQLParser, parsesNullLiteral) { + Parser parser("null"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "null"); +} + +TEST(SaneQLParser, parsesEquality) { + Parser parser("country = 'USA'"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "(country = 'USA')"); +} + +TEST(SaneQLParser, parsesNotEquals) { + Parser parser("country <> 'USA'"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "(country <> 'USA')"); +} + +TEST(SaneQLParser, parsesAndExpression) { + Parser parser("a = 1 && b = 2"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "((a = 1) && (b = 2))"); +} + +TEST(SaneQLParser, parsesOrExpression) { + Parser parser("a = 1 || b = 2"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "((a = 1) || (b = 2))"); +} + +TEST(SaneQLParser, parsesNotExpression) { + Parser parser("!active"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "(!active)"); +} + +TEST(SaneQLParser, parsesAndOrPrecedence) { + Parser parser("a = 1 && b = 2 || c = 3"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "(((a = 1) && (b = 2)) || (c = 3))"); +} + +TEST(SaneQLParser, parsesParenthesizedExpression) { + Parser parser("(a || b) && c"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "((a || b) && c)"); +} + +TEST(SaneQLParser, parsesFunctionCall) { + Parser parser("hasMutation('A123T')"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "hasMutation('A123T')"); +} + +TEST(SaneQLParser, parsesFunctionCallWithNamedArgs) { + Parser parser("hasMutation(position:=1000, sequenceName:='segment1')"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "hasMutation(position:=1000, sequenceName:='segment1')"); +} + +TEST(SaneQLParser, parsesMethodCall) { + Parser parser("default.filter(country = 'USA')"); + auto expr = parser.parse(); + // Method call syntax is desugared: receiver becomes first positional arg + EXPECT_EQ(expr->toString(), "filter(default, (country = 'USA'))"); +} + +TEST(SaneQLParser, parsesMethodCallChain) { + Parser parser("default.filter(country = 'USA').groupBy({count:=count()})"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "groupBy(filter(default, (country = 'USA')), {count:=count()})"); +} + +TEST(SaneQLParser, parsesTypeCast) { + Parser parser("'2020-01-01'::date"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "'2020-01-01'::date"); +} + +TEST(SaneQLParser, parsesSetLiteral) { + Parser parser("{'USA', 'Germany', 'France'}"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "{'USA', 'Germany', 'France'}"); +} + +TEST(SaneQLParser, parsesEmptySetLiteral) { + Parser parser("{}"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "{}"); +} + +TEST(SaneQLParser, parsesMethodCallOnSetLiteral) { + Parser parser("country.in({'USA', 'Germany'})"); + auto expr = parser.parse(); + EXPECT_EQ(expr->toString(), "in(country, {'USA', 'Germany'})"); +} + +TEST(SaneQLParser, parsesComplexFilterQuery) { + Parser parser("default.filter(country = 'USA' && age > 30).groupBy({count:=count()})"); + auto expr = parser.parse(); + ASSERT_TRUE(std::holds_alternative(expr->value)); + auto& outer = std::get(expr->value); + EXPECT_EQ(outer.function_name, "groupBy"); + // First positional arg is the child pipeline (filter call) + ASSERT_TRUE(std::holds_alternative(outer.positional_arguments[0].value->value) + ); + auto& filter = std::get(outer.positional_arguments[0].value->value); + EXPECT_EQ(filter.function_name, "filter"); +} + +TEST(SaneQLParser, parsesDateBetweenWithTypeCast) { + Parser parser("date_submitted.between('2020-01-01'::date, '2023-12-31'::date)"); + auto expr = parser.parse(); + ASSERT_TRUE(std::holds_alternative(expr->value)); + auto& call = std::get(expr->value); + EXPECT_EQ(call.function_name, "between"); + // 3 positional arguments: receiver (date_submitted) + 2 date args + ASSERT_EQ(call.positional_arguments.size(), 3); +} + +TEST(SaneQLParser, parsesLimitMethod) { + Parser parser("default.filter(country = 'USA').limit(100)"); + auto expr = parser.parse(); + ASSERT_TRUE(std::holds_alternative(expr->value)); + auto& limit_call = std::get(expr->value); + EXPECT_EQ(limit_call.function_name, "limit"); + // 2 positional arguments: receiver (filter(...)) + 100 + ASSERT_EQ(limit_call.positional_arguments.size(), 2); +} + +TEST(SaneQLParser, parsesComparisonOperators) { + Parser parser("age < 30"); + auto expr = parser.parse(); + ASSERT_TRUE(std::holds_alternative(expr->value)); + auto& bin = std::get(expr->value); + EXPECT_EQ(bin.op, ast::BinaryOp::LESS_THAN); +} + +TEST(SaneQLParser, parsesFullExampleQuery) { + Parser parser( + "metadata\n" + " .filter(country = 'USA' && date_submitted.between('2020-01-01'::date, " + "'2023-12-31'::date))\n" + " .groupBy({count:=count()})" + ); + auto expr = parser.parse(); + ASSERT_TRUE(std::holds_alternative(expr->value)); + auto& agg = std::get(expr->value); + EXPECT_EQ(agg.function_name, "groupBy"); +} + +TEST(SaneQLParser, throwsOnUnexpectedToken) { + EXPECT_THAT( + []() { + Parser parser("= 'broken'"); + (void)parser.parse(); + }, + ThrowsMessage( + ::testing::HasSubstr("arse error at 1:1: Unexpected token Equals") + ) + ); +} + +TEST(SaneQLParser, throwsOnMissingClosingParen) { + EXPECT_THAT( + []() { + Parser parser("func(a, b"); + (void)parser.parse(); + }, + ThrowsMessage( + ::testing::HasSubstr("Parse error at 1:10: Expected RightParen but got Eof") + ) + ); +} + +TEST(SaneQLParser, throwsOnTrailingGarbage) { + EXPECT_THAT( + []() { + Parser parser("a b"); + (void)parser.parse(); + }, + ThrowsMessage( + ::testing::HasSubstr("Parse error at 1:3: Expected Eof but got Identifier") + ) + ); +} diff --git a/src/silo/query_engine/saneql/source_location.h b/src/silo/query_engine/saneql/source_location.h new file mode 100644 index 000000000..834923aeb --- /dev/null +++ b/src/silo/query_engine/saneql/source_location.h @@ -0,0 +1,22 @@ +#pragma once + +#include +#include + +#include + +namespace silo::query_engine::saneql { + +struct SourceLocation { + uint32_t line = 1; + uint32_t column = 1; + + [[nodiscard]] std::string toString() const { return fmt::format("{}:{}", line, column); } +}; + +struct SourceRange { + SourceLocation start; + SourceLocation end; +}; + +} // namespace silo::query_engine::saneql diff --git a/src/silo/query_engine/saneql/token.cpp b/src/silo/query_engine/saneql/token.cpp new file mode 100644 index 000000000..105901ac7 --- /dev/null +++ b/src/silo/query_engine/saneql/token.cpp @@ -0,0 +1,98 @@ +#include "silo/query_engine/saneql/token.h" + +#include + +namespace silo::query_engine::saneql { + +std::string tokenTypeToString(TokenType type) { + switch (type) { + case TokenType::INT_LITERAL: + return "IntLiteral"; + case TokenType::FLOAT_LITERAL: + return "FloatLiteral"; + case TokenType::STRING_LITERAL: + return "StringLiteral"; + case TokenType::BOOL_LITERAL: + return "BoolLiteral"; + case TokenType::NULL_LITERAL: + return "NullLiteral"; + case TokenType::IDENTIFIER: + return "Identifier"; + case TokenType::DOT: + return "Dot"; + case TokenType::DOUBLE_COLON: + return "DoubleColon"; + case TokenType::COLON_EQUALS: + return "ColonEquals"; + case TokenType::EQUALS: + return "Equals"; + case TokenType::NOT_EQUALS: + return "NotEquals"; + case TokenType::LESS_THAN: + return "LessThan"; + case TokenType::LESS_EQUAL: + return "LessEqual"; + case TokenType::GREATER_THAN: + return "GreaterThan"; + case TokenType::GREATER_EQUAL: + return "GreaterEqual"; + case TokenType::AND: + return "And"; + case TokenType::OR: + return "Or"; + case TokenType::NOT: + return "Not"; + case TokenType::LEFT_PAREN: + return "LeftParen"; + case TokenType::RIGHT_PAREN: + return "RightParen"; + case TokenType::LEFT_BRACE: + return "LeftBrace"; + case TokenType::RIGHT_BRACE: + return "RightBrace"; + case TokenType::COMMA: + return "Comma"; + case TokenType::END_OF_FILE: + return "Eof"; + } + return "Unknown"; +} + +std::string Token::toString() const { + if (type == TokenType::STRING_LITERAL) { + return fmt::format("Token({}, '{}')", tokenTypeToString(type), getStringValue()); + } + if (type == TokenType::INT_LITERAL) { + return fmt::format("Token({}, {})", tokenTypeToString(type), getIntValue()); + } + if (type == TokenType::FLOAT_LITERAL) { + return fmt::format("Token({}, {})", tokenTypeToString(type), getFloatValue()); + } + if (type == TokenType::BOOL_LITERAL) { + return fmt::format( + "Token({}, {})", tokenTypeToString(type), getBoolValue() ? "true" : "false" + ); + } + if (type == TokenType::IDENTIFIER) { + return fmt::format("Token({}, {})", tokenTypeToString(type), getStringValue()); + } + return fmt::format("Token({})", tokenTypeToString(type)); +} + +std::string Token::getStringValue() const { + return std::get(value); +} + +int64_t Token::getIntValue() const { + return std::get(value); +} + +double Token::getFloatValue() const { + return std::get(value); +} + +bool Token::getBoolValue() const { + return std::get(value); +} + +} // namespace silo::query_engine::saneql diff --git a/src/silo/query_engine/saneql/token.h b/src/silo/query_engine/saneql/token.h new file mode 100644 index 000000000..eb6ca9770 --- /dev/null +++ b/src/silo/query_engine/saneql/token.h @@ -0,0 +1,55 @@ +#pragma once + +#include +#include +#include + +#include "silo/query_engine/saneql/source_location.h" + +namespace silo::query_engine::saneql { + +enum class TokenType : uint8_t { + INT_LITERAL, + FLOAT_LITERAL, + STRING_LITERAL, + BOOL_LITERAL, + NULL_LITERAL, + IDENTIFIER, + DOT, + DOUBLE_COLON, + COLON_EQUALS, + EQUALS, + NOT_EQUALS, + LESS_THAN, + LESS_EQUAL, + GREATER_THAN, + GREATER_EQUAL, + AND, + OR, + NOT, + LEFT_PAREN, + RIGHT_PAREN, + LEFT_BRACE, + RIGHT_BRACE, + COMMA, + END_OF_FILE +}; + +[[nodiscard]] std::string tokenTypeToString(TokenType type); + +using TokenValue = std::variant; + +struct Token { + TokenType type; + TokenValue value; + SourceLocation location; + + [[nodiscard]] std::string toString() const; + + [[nodiscard]] std::string getStringValue() const; + [[nodiscard]] int64_t getIntValue() const; + [[nodiscard]] double getFloatValue() const; + [[nodiscard]] bool getBoolValue() const; +}; + +} // namespace silo::query_engine::saneql diff --git a/src/silo/test/amino_acid_insertion_contains.test.cpp b/src/silo/test/amino_acid_insertion_contains.test.cpp index 77d8c3163..893a2a994 100644 --- a/src/silo/test/amino_acid_insertion_contains.test.cpp +++ b/src/silo/test/amino_acid_insertion_contains.test.cpp @@ -53,45 +53,18 @@ const QueryTestData TEST_DATA{ .reference_genomes = REFERENCE_GENOMES }; -nlohmann::json createAminoAcidInsertionContainsQuery( - const nlohmann::json& sequenceName, - int position, - const std::string& insertedSymbols -) { - return { - {"action", {{"type", "Details"}}}, - {"filterExpression", - {{"type", "AminoAcidInsertionContains"}, - {"position", position}, - {"value", insertedSymbols}, - {"sequenceName", sequenceName}}} - }; -} - -nlohmann::json createAminoAcidInsertionContainsQueryWithEmptySequenceName( - int position, - const std::string& insertedSymbols -) { - return { - {"action", {{"type", "Details"}}}, - {"filterExpression", - { - {"type", "AminoAcidInsertionContains"}, - {"position", position}, - {"value", insertedSymbols}, - }} - }; -} - const QueryTestScenario AMINO_ACID_INSERTION_CONTAINS_SCENARIO = { .name = "aminoAcidInsertionContains", - .query = createAminoAcidInsertionContainsQuery("gene1", 12, "A"), + .query = + "default.filter(aminoAcidInsertionContains(position:=12, value:='A', " + "sequenceName:='gene1')).project(primaryKey)" + "", .expected_query_result = nlohmann::json({{{"primaryKey", "id_0"}}, {{"primaryKey", "id_1"}}}) }; const QueryTestScenario AMINO_ACID_INSERTION_CONTAINS_WITH_NULL_SEGMENT_SCENARIO = { .name = "aminoAcidInsertionWithNullSegment", - .query = createAminoAcidInsertionContainsQueryWithEmptySequenceName(12, "A"), + .query = "default.filter(aminoAcidInsertionContains(position:=12, value:='A'))", .expected_error_message = "The database has no default amino acid sequence name", }; diff --git a/src/silo/test/amino_acid_symbol_equals.test.cpp b/src/silo/test/amino_acid_symbol_equals.test.cpp index 06a7e1ac9..5c95b640f 100644 --- a/src/silo/test/amino_acid_symbol_equals.test.cpp +++ b/src/silo/test/amino_acid_symbol_equals.test.cpp @@ -47,30 +47,19 @@ const QueryTestData TEST_DATA{ .reference_genomes = REFERENCE_GENOMES }; -nlohmann::json createAminoAcidSymbolEqualsQuery( - const std::string& symbol, - int position, - const std::string& gene -) { - return { - {"action", {{"type", "Aggregated"}}}, - {"filterExpression", - {{"type", "AminoAcidEquals"}, - {"position", position}, - {"symbol", symbol}, - {"sequenceName", gene}}} - }; -} - const QueryTestScenario AMINO_ACID_EQUALS_D = { .name = "AMINO_ACID_EQUALS_D", - .query = createAminoAcidSymbolEqualsQuery("D", 1, GENE), + .query = + "default.filter(aminoAcidEquals(position:=1, symbol:='D', sequenceName:='gene1'))" + ".groupBy({count:=count()})", .expected_query_result = nlohmann::json::parse(R"([{"count": 1}])") }; const QueryTestScenario AMINO_ACID_EQUALS_WITH_DOT_RETURNS_AS_IF_REFERENCE = { .name = "AMINO_ACID_EQUALS_WITH_DOT_RETURNS_AS_IF_REFERENCE", - .query = createAminoAcidSymbolEqualsQuery(".", 1, GENE), + .query = + "default.filter(aminoAcidEquals(position:=1, symbol:='.', sequenceName:='gene1'))" + ".groupBy({count:=count()})", .expected_query_result = nlohmann::json::parse(R"([{"count": 2}])") }; diff --git a/src/silo/test/date_between.test.cpp b/src/silo/test/date_between.test.cpp index 3308d3bf9..3773307be 100644 --- a/src/silo/test/date_between.test.cpp +++ b/src/silo/test/date_between.test.cpp @@ -13,15 +13,11 @@ const std::string UNSORTED_DATE_VALUE = "2023-01-20"; const nlohmann::json DATA = { {"primaryKey", "id"}, {"sorted_date", SORTED_DATE_VALUE}, - {"unsorted_date", UNSORTED_DATE_VALUE}, - {"segment1", nullptr}, - {"unaligned_segment1", nullptr}, - {"gene1", nullptr} + {"unsorted_date", UNSORTED_DATE_VALUE} }; const auto DATABASE_CONFIG = R"( -defaultNucleotideSequence: "segment1" schema: instanceName: "dummy name" metadata: @@ -35,8 +31,8 @@ defaultNucleotideSequence: "segment1" )"; const auto REFERENCE_GENOMES = ReferenceGenomes{ - {{"segment1", "A"}}, - {{"gene1", "*"}}, + {}, + {}, }; const QueryTestData TEST_DATA{ @@ -45,18 +41,6 @@ const QueryTestData TEST_DATA{ .reference_genomes = REFERENCE_GENOMES }; -nlohmann::json createDateBetweenQuery( - const std::string& column, - const nlohmann::json from_date, - const nlohmann::json to_date -) { - return { - {"action", {{"type", "Details"}}}, - {"filterExpression", - {{"type", "DateBetween"}, {"column", column}, {"from", from_date}, {"to", to_date}}} - }; -} - const nlohmann::json EXPECTED_RESULT = { {{"primaryKey", "id"}, {"sorted_date", SORTED_DATE_VALUE}, {"unsorted_date", UNSORTED_DATE_VALUE} } @@ -64,77 +48,49 @@ const nlohmann::json EXPECTED_RESULT = { const QueryTestScenario SORTED_DATE_WITH_TO_AND_FROM_SCENARIO = { .name = "sortedDateWithToEqualsFrom", - .query = createDateBetweenQuery("sorted_date", SORTED_DATE_VALUE, SORTED_DATE_VALUE), + .query = "default.filter(sorted_date.between('2020-12-24'::date, '2020-12-24'::date))", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario SORTED_DATE_WITH_TO_ONLY_SCENARIO = { .name = "sortedDateWithToOnly", - .query = createDateBetweenQuery("sorted_date", nullptr, SORTED_DATE_VALUE), + .query = "default.filter(sorted_date <= '2020-12-24'::date)", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario SORTED_DATE_WITH_FROM_ONLY_SCENARIO = { .name = "sortedDateWithFromOnly", - .query = createDateBetweenQuery("sorted_date", SORTED_DATE_VALUE, nullptr), + .query = "default.filter(sorted_date >= '2020-12-24'::date)", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario UNSORTED_DATE_WITH_TO_AND_FROM_SCENARIO = { .name = "unsortedDateWithToEqualsFrom", - .query = createDateBetweenQuery("unsorted_date", UNSORTED_DATE_VALUE, UNSORTED_DATE_VALUE), + .query = "default.filter(unsorted_date.between('2023-01-20'::date, '2023-01-20'::date))", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario UNSORTED_DATE_WITH_TO_ONLY_SCENARIO = { .name = "unsortedDateWithToOnly", - .query = createDateBetweenQuery("unsorted_date", nullptr, UNSORTED_DATE_VALUE), + .query = "default.filter(unsorted_date <= '2023-01-20'::date)", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario UNSORTED_DATE_WITH_FROM_ONLY_SCENARIO = { .name = "unsortedDateWithFromOnly", - .query = createDateBetweenQuery("unsorted_date", UNSORTED_DATE_VALUE, nullptr), + .query = "default.filter(unsorted_date >= '2023-01-20'::date)", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario UNSORTED_DATE_WITH_COLUMN_NOT_IN_DB = { .name = "UNSORTED_DATE_WITH_COLUMN_NOT_IN_DB", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details" - }, - "filterExpression": { - "type": "DateBetween", - "column": "something_not_in_database", - "from": null, - "to": null - } -} -)" - ), + .query = "default.filter(something_not_in_database >= '2000-01-01'::date)", .expected_error_message = "The database does not contain the column 'something_not_in_database'" }; const QueryTestScenario UNSORTED_DATE_WITH_NON_DATE_COLUMN = { .name = "UNSORTED_DATE_WITH_NON_DATE_COLUMN", - .query = nlohmann::json::parse( - R"( -{ - "action": { - "type": "Details" - }, - "filterExpression": { - "type": "DateBetween", - "column": "primaryKey", - "from": null, - "to": null - } -} -)" - ), + .query = "default.filter(primaryKey >= '2020-01-01'::date)", .expected_error_message = "The column 'primaryKey' is not of type date" }; diff --git a/src/silo/test/date_equals.test.cpp b/src/silo/test/date_equals.test.cpp index a118905c8..c849806a5 100644 --- a/src/silo/test/date_equals.test.cpp +++ b/src/silo/test/date_equals.test.cpp @@ -14,51 +14,35 @@ const std::string DATE_2023 = "2023-01-20"; const nlohmann::json DATA_ROW1 = { {"primaryKey", "row1"}, {"sorted_date", DATE_2020}, - {"unsorted_date", DATE_2023}, - {"segment1", nullptr}, - {"unaligned_segment1", nullptr}, - {"gene1", nullptr} + {"unsorted_date", DATE_2023} }; const nlohmann::json DATA_ROW2 = { {"primaryKey", "row2"}, {"sorted_date", DATE_2021}, - {"unsorted_date", DATE_2020}, - {"segment1", nullptr}, - {"unaligned_segment1", nullptr}, - {"gene1", nullptr} + {"unsorted_date", DATE_2020} }; const nlohmann::json DATA_ROW3 = { {"primaryKey", "row3"}, {"sorted_date", DATE_2020}, - {"unsorted_date", DATE_2021}, - {"segment1", nullptr}, - {"unaligned_segment1", nullptr}, - {"gene1", nullptr} + {"unsorted_date", DATE_2021} }; const nlohmann::json DATA_NULL1 = { {"primaryKey", "null1"}, {"sorted_date", nullptr}, - {"unsorted_date", nullptr}, - {"segment1", nullptr}, - {"unaligned_segment1", nullptr}, - {"gene1", nullptr} + {"unsorted_date", nullptr} }; const nlohmann::json DATA_NULL2 = { {"primaryKey", "null2"}, {"sorted_date", nullptr}, - {"unsorted_date", DATE_2023}, - {"segment1", nullptr}, - {"unaligned_segment1", nullptr}, - {"gene1", nullptr} + {"unsorted_date", DATE_2023} }; const auto DATABASE_CONFIG = R"( -defaultNucleotideSequence: "segment1" schema: instanceName: "dummy name" metadata: @@ -72,8 +56,8 @@ defaultNucleotideSequence: "segment1" )"; const auto REFERENCE_GENOMES = ReferenceGenomes{ - {{"segment1", "A"}}, - {{"gene1", "*"}}, + {}, + {}, }; const QueryTestData TEST_DATA{ @@ -82,11 +66,12 @@ const QueryTestData TEST_DATA{ .reference_genomes = REFERENCE_GENOMES }; -nlohmann::json createDateEqualsQuery(const std::string& column, const nlohmann::json& value) { - return { - {"action", {{"type", "Details"}}}, - {"filterExpression", {{"type", "DateEquals"}, {"column", column}, {"value", value}}} - }; +std::string createDateEqualsQuery(const std::string& column, const std::string& date_value) { + return fmt::format("default.filter({} = '{}'::date)", column, date_value); +} + +std::string createDateEqualsNullQuery(const std::string& column) { + return fmt::format("default.filter({} = null)", column); } // Matches row1 and row3 (both have sorted_date = 2020-12-24) @@ -126,7 +111,7 @@ const QueryTestScenario UNSORTED_DATE_SINGLE_MATCH = // Matches null1 and null2 (both have sorted_date = null) const QueryTestScenario SORTED_DATE_NULL = {.name = "SORTED_DATE_NULL", - .query = createDateEqualsQuery("sorted_date", nullptr), + .query = createDateEqualsNullQuery("sorted_date"), .expected_query_result = { {{"primaryKey", "null1"}, {"sorted_date", nullptr}, {"unsorted_date", nullptr}}, {{"primaryKey", "null2"}, {"sorted_date", nullptr}, {"unsorted_date", DATE_2023}}, @@ -135,7 +120,7 @@ const QueryTestScenario SORTED_DATE_NULL = // Matches only null1 (unsorted_date = null) const QueryTestScenario UNSORTED_DATE_NULL = {.name = "UNSORTED_DATE_NULL", - .query = createDateEqualsQuery("unsorted_date", nullptr), + .query = createDateEqualsNullQuery("unsorted_date"), .expected_query_result = { {{"primaryKey", "null1"}, {"sorted_date", nullptr}, {"unsorted_date", nullptr}}, }}; @@ -146,32 +131,19 @@ const QueryTestScenario DATE_EQUALS_NO_MATCH = { .expected_query_result = nlohmann::json::array() }; -const QueryTestScenario DATE_EQUALS_COLUMN_NOT_IN_DB = { - .name = "DATE_EQUALS_COLUMN_NOT_IN_DB", - .query = createDateEqualsQuery("something_not_in_database", "2020-01-01"), - .expected_error_message = "The database does not contain the column 'something_not_in_database'" -}; - -const QueryTestScenario DATE_EQUALS_WRONG_COLUMN_TYPE = { - .name = "DATE_EQUALS_WRONG_COLUMN_TYPE", - .query = createDateEqualsQuery("primaryKey", "2020-01-01"), - .expected_error_message = "The column 'primaryKey' is not of type date" -}; - const QueryTestScenario DATE_EQUALS_WRONG_FORMAT = { .name = "DATE_EQUALS_WRONG_FORMAT", - .query = createDateEqualsQuery("primaryKey", "2021-03-00018"), + .query = "default.filter(sorted_date = '2021-03-00018'::date)", .expected_error_message = - "The value for the DateEquals expression is not a valid date: Invalid date format " - "'2021-03-00018': expected exactly YYYY-MM-DD" + "invalid date '2021-03-00018' at 1:45: Invalid date format '2021-03-00018': " + "expected exactly YYYY-MM-DD" }; const QueryTestScenario DATE_EQUALS_WRONG_VALUE_TYPE = { .name = "DATE_EQUALS_WRONG_VALUE_TYPE", - .query = createDateEqualsQuery("primaryKey", "asdf"), + .query = "default.filter(sorted_date = 'asdf'::date)", .expected_error_message = - "The value for the DateEquals expression is not a valid date: Invalid date format 'asdf': " - "expected exactly YYYY-MM-DD" + "invalid date 'asdf' at 1:36: Invalid date format 'asdf': expected exactly YYYY-MM-DD" }; } // namespace @@ -187,8 +159,6 @@ QUERY_TEST( SORTED_DATE_NULL, UNSORTED_DATE_NULL, DATE_EQUALS_NO_MATCH, - DATE_EQUALS_COLUMN_NOT_IN_DB, - DATE_EQUALS_WRONG_COLUMN_TYPE, DATE_EQUALS_WRONG_FORMAT, DATE_EQUALS_WRONG_VALUE_TYPE ) diff --git a/src/silo/test/default_sequence.test.cpp b/src/silo/test/default_sequence.test.cpp index 1afb8b422..63d585c60 100644 --- a/src/silo/test/default_sequence.test.cpp +++ b/src/silo/test/default_sequence.test.cpp @@ -54,112 +54,86 @@ const QueryTestData TEST_DATA{ .lineage_trees = {{"test", silo::common::LineageTreeAndIdMap()}} }; -nlohmann::json createQueryWithFilter(const nlohmann::json filter) { - return {{"action", {{"type", "Details"}}}, {"filterExpression", filter}}; -} - const nlohmann::json EXPECTED_RESULT = {{{"primaryKey", "id"}}}; const QueryTestScenario NUCLEOTIDE_EQUALS_NO_SEQUENCE_NAME = { .name = "nucleotideEqualsWithoutSegmentTakesDefaultSequence", - .query = createQueryWithFilter( - {{"type", "NucleotideEquals"}, {"position", 1}, {"symbol", VALUE_SEGMENT_1}} - ), + .query = "default.filter(nucleotideEquals(position:=1, symbol:='A')).project(primaryKey)", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario NUCLEOTIDE_EQUALS_NO_SEQUENCE_NAME_FILTER_BY_WRONG_VALUE = { .name = "nucleotideEqualsWithoutSegmentFilterByWrongValue", - .query = createQueryWithFilter( - {{"type", "NucleotideEquals"}, {"position", 1}, {"symbol", VALUE_SEGMENT_2}} - ), + .query = "default.filter(nucleotideEquals(position:=1, symbol:='C')).project(primaryKey)", .expected_query_result = nlohmann::json::array() }; const QueryTestScenario NUCLEOTIDE_EQUALS_SEGMENT_1 = { .name = "nucleotideEqualsSegment1", - .query = createQueryWithFilter( - {{"type", "NucleotideEquals"}, - {"sequenceName", "segment1"}, - {"position", 1}, - {"symbol", VALUE_SEGMENT_1}} - ), + .query = + "default.filter(nucleotideEquals(position:=1, symbol:='A', " + "sequenceName:='segment1')).project(primaryKey)", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario NUCLEOTIDE_EQUALS_SEGMENT_2 = { .name = "nucleotideEqualsSegment2", - .query = createQueryWithFilter( - {{"type", "NucleotideEquals"}, - {"sequenceName", "segment2"}, - {"position", 1}, - {"symbol", VALUE_SEGMENT_2}} - ), + .query = + "default.filter(nucleotideEquals(position:=1, symbol:='C', " + "sequenceName:='segment2')).project(primaryKey)", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario AMINO_ACID_EQUALS_NO_SEQUENCE_NAME = { .name = "aminoAcidEqualsWithoutSequenceNameTakesDefaultSequence", - .query = createQueryWithFilter( - {{"type", "AminoAcidEquals"}, {"position", 1}, {"symbol", VALUE_SEGMENT_1}} - ), + .query = "default.filter(aminoAcidEquals(position:=1, symbol:='A')).project(primaryKey)", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario AMINO_ACID_EQUALS_NO_SEQUENCE_NAME_FILTER_BY_WRONG_VALUE = { .name = "aminoAcidEqualsWithoutSequenceNameFilterByWrongValue", - .query = createQueryWithFilter( - {{"type", "AminoAcidEquals"}, {"position", 1}, {"symbol", VALUE_SEGMENT_2}} - ), + .query = "default.filter(aminoAcidEquals(position:=1, symbol:='C')).project(primaryKey)", .expected_query_result = nlohmann::json::array() }; const QueryTestScenario AMINO_ACID_EQUALS_GENE_1 = { .name = "aminoAcidEqualsGene1", - .query = createQueryWithFilter( - {{"type", "AminoAcidEquals"}, - {"sequenceName", "gene1"}, - {"position", 1}, - {"symbol", VALUE_SEGMENT_1}} - ), + .query = + "default.filter(aminoAcidEquals(position:=1, symbol:='A', " + "sequenceName:='gene1')).project(primaryKey)", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario AMINO_ACID_EQUALS_GENE_2 = { .name = "aminoAcidEqualsGene2", - .query = createQueryWithFilter( - {{"type", "AminoAcidEquals"}, - {"sequenceName", "gene2"}, - {"position", 1}, - {"symbol", VALUE_SEGMENT_2}} - ), + .query = + "default.filter(aminoAcidEquals(position:=1, symbol:='C', " + "sequenceName:='gene2')).project(primaryKey)", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario HAS_NUCLEOTIDE_MUTATION_WITHOUT_SEQUENCE_NAME = { .name = "hasNucleotideMutationWithoutSequenceName", - .query = createQueryWithFilter({{"type", "HasNucleotideMutation"}, {"position", 1}}), + .query = "default.filter(hasMutation(position:=1)).project(primaryKey)", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario HAS_AMINO_ACID_MUTATION_WITHOUT_SEQUENCE_NAME = { .name = "hasAminoAcidMutationWithoutSequenceName", - .query = createQueryWithFilter({{"type", "HasAminoAcidMutation"}, {"position", 1}}), + .query = "default.filter(hasAAMutation(position:=1)).project(primaryKey)", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario NUCLEOTIDE_INSERTION_CONTAINS_WITHOUT_SEQUENCE_NAME = { .name = "nucleotideInsertionContainsWithoutSequenceName", - .query = - createQueryWithFilter({{"type", "InsertionContains"}, {"value", "AAA"}, {"position", 1}}), + .query = "default.filter(insertionContains(position:=1, value:='AAA')).project(primaryKey)", .expected_query_result = EXPECTED_RESULT }; const QueryTestScenario AMINO_ACID_INSERTION_CONTAINS_WITHOUT_SEQUENCE_NAME = { .name = "aminoAcidInsertionContainsWithoutSequenceName", - .query = createQueryWithFilter( - {{"type", "AminoAcidInsertionContains"}, {"value", "AAA"}, {"position", 1}} - ), + .query = + "default.filter(aminoAcidInsertionContains(position:=1, value:='AAA')).project(primaryKey)", .expected_query_result = EXPECTED_RESULT }; diff --git a/src/silo/test/fasta.test.cpp b/src/silo/test/fasta.test.cpp index 91fc3ea94..c69f08dfb 100644 --- a/src/silo/test/fasta.test.cpp +++ b/src/silo/test/fasta.test.cpp @@ -58,31 +58,17 @@ const QueryTestData TEST_DATA{ .reference_genomes = REFERENCE_GENOMES }; -nlohmann::json createFastaAlignedQuery(const std::string& primaryKey) { - return nlohmann::json::parse(fmt::format( - R"( -{{ - "action": {{ - "type": "Fasta", - "sequenceNames": [ - "unaligned_segment1", - "unaligned_segment2" - ] - }}, - "filterExpression": {{ - "type": "StringEquals", - "column": "primaryKey", - "value": "{}" - }} -}} -)", +std::string createFastaQuery(const std::string& primaryKey) { + return fmt::format( + "default.filter(primaryKey = '{}').project({{primaryKey, unaligned_segment1, " + "unaligned_segment2}})", primaryKey - )); + ); } const QueryTestScenario SEQUENCE_WITH_BOTH_SEGMENTS_SCENARIO = { .name = "sequenceWithBothSegments", - .query = createFastaAlignedQuery("bothSegments"), + .query = createFastaQuery("bothSegments"), .expected_query_result = nlohmann::json( {{{"primaryKey", "bothSegments"}, {"unaligned_segment1", "A"}, {"unaligned_segment2", "G"}}} ) @@ -90,7 +76,7 @@ const QueryTestScenario SEQUENCE_WITH_BOTH_SEGMENTS_SCENARIO = { const QueryTestScenario SEQUENCE_WITH_ONLY_FIRST_SEGMENT_SCENARIO = { .name = "sequenceWithOnlyFirstSegment", - .query = createFastaAlignedQuery("onlySegment1"), + .query = createFastaQuery("onlySegment1"), .expected_query_result = nlohmann::json( {{{"primaryKey", "onlySegment1"}, {"unaligned_segment1", "T"}, {"unaligned_segment2", nullptr} }} @@ -99,7 +85,7 @@ const QueryTestScenario SEQUENCE_WITH_ONLY_FIRST_SEGMENT_SCENARIO = { const QueryTestScenario SEQUENCE_WITH_ONLY_SECOND_SEGMENT_SCENARIO = { .name = "sequenceWithOnlySecondSegment", - .query = createFastaAlignedQuery("onlySegment2"), + .query = createFastaQuery("onlySegment2"), .expected_query_result = nlohmann::json( {{{"primaryKey", "onlySegment2"}, {"unaligned_segment1", nullptr}, {"unaligned_segment2", "T"} }} @@ -108,7 +94,7 @@ const QueryTestScenario SEQUENCE_WITH_ONLY_SECOND_SEGMENT_SCENARIO = { const QueryTestScenario SEQUENCE_WITH_NO_SEGMENT_SCENARIO = { .name = "sequenceWithNoSegment", - .query = createFastaAlignedQuery("noSegment"), + .query = createFastaQuery("noSegment"), .expected_query_result = nlohmann::json( {{{"primaryKey", "noSegment"}, {"unaligned_segment1", nullptr}, @@ -118,23 +104,8 @@ const QueryTestScenario SEQUENCE_WITH_NO_SEGMENT_SCENARIO = { const QueryTestScenario DOWNLOAD_ALL_SEQUENCES_SCENARIO = { .name = "downloadAllSequences", - .query = nlohmann::json::parse(R"( -{ - "action": { - "type": "Fasta", - "orderByFields": [ - "primaryKey" - ], - "sequenceNames": [ - "unaligned_segment1", - "unaligned_segment2" - ] - }, - "filterExpression": { - "type": "True" - } -} -)"), + .query = + "default.project({primaryKey, unaligned_segment1, unaligned_segment2}).orderBy({primaryKey})", .expected_query_result = nlohmann::json( {{{"primaryKey", "1"}, {"unaligned_segment1", nullptr}, {"unaligned_segment2", "A"}}, {{"primaryKey", "2"}, {"unaligned_segment1", nullptr}, {"unaligned_segment2", nullptr}}, @@ -153,26 +124,9 @@ const QueryTestScenario DOWNLOAD_ALL_SEQUENCES_SCENARIO = { const QueryTestScenario DOWNLOAD_ALL_DATA = { .name = "DOWNLOAD_ALL_DATA", - .query = nlohmann::json::parse(R"( -{ - "action": { - "type": "Fasta", - "orderByFields": [ - "primaryKey" - ], - "sequenceNames": [ - "unaligned_segment1", - "unaligned_segment2" - ], - "additionalFields": [ - "date" - ] - }, - "filterExpression": { - "type": "True" - } -} -)"), + .query = + "default.project({primaryKey, unaligned_segment1, unaligned_segment2, date})" + ".orderBy({primaryKey})", .expected_query_result = nlohmann::json::parse(R"( [{"date":"2024-08-05","primaryKey":"1","unaligned_segment1":null,"unaligned_segment2":"A"}, {"date":"2024-08-03","primaryKey":"2","unaligned_segment1":null,"unaligned_segment2":null}, @@ -186,28 +140,9 @@ const QueryTestScenario DOWNLOAD_ALL_DATA = { const QueryTestScenario DUPLICATE_FIELDS = { .name = "DUPLICATE_FIELDS", - .query = nlohmann::json::parse(R"( -{ - "action": { - "type": "Fasta", - "orderByFields": [ - "primaryKey" - ], - "sequenceNames": [ - "unaligned_segment1", - "unaligned_segment2", - "unaligned_segment1" - ], - "additionalFields": [ - "date", - "date" - ] - }, - "filterExpression": { - "type": "True" - } -} -)"), + .query = + "default.project({primaryKey, unaligned_segment1, unaligned_segment2, unaligned_segment1, " + "date, date}).orderBy({primaryKey})", .expected_query_result = nlohmann::json::parse(R"( [{"date":"2024-08-05","primaryKey":"1","unaligned_segment1":null,"unaligned_segment2":"A"}, {"date":"2024-08-03","primaryKey":"2","unaligned_segment1":null,"unaligned_segment2":null}, @@ -221,26 +156,7 @@ const QueryTestScenario DUPLICATE_FIELDS = { const QueryTestScenario ORDER_BY_NOT_IN_OUTPUT = { .name = "ORDER_BY_NOT_IN_OUTPUT", - .query = nlohmann::json::parse(R"( -{ - "action": { - "sequenceNames": [ - "unaligned_segment1" - ], - "limit": 1, - "orderByFields": [ - { - "field": "date", - "order": "descending" - } - ], - "type": "Fasta" - }, - "filterExpression": { - "type": "True" - } -} -)"), + .query = "default.project({primaryKey, unaligned_segment1}).orderBy({date.desc()})", .expected_error_message = "OrderByField date is not contained in the result of this operation. " "Allowed values are primaryKey, unaligned_segment1." @@ -248,29 +164,9 @@ const QueryTestScenario ORDER_BY_NOT_IN_OUTPUT = { const QueryTestScenario ORDER_BY_ADDITIONAL_FIELD = { .name = "ORDER_BY_ADDITIONAL_FIELD", - .query = nlohmann::json::parse(R"( -{ - "action": { - "sequenceNames": [ - "unaligned_segment1", - "unaligned_segment2" - ], - "additionalFields": [ - "date" - ], - "orderByFields": [ - { - "field": "date", - "order": "ascending" - } - ], - "type": "Fasta" - }, - "filterExpression": { - "type": "True" - } -} -)"), + .query = + "default.project({primaryKey, unaligned_segment1, unaligned_segment2, date})" + ".orderBy({date.asc()})", .expected_query_result = nlohmann::json::parse(R"( [{"date":"2024-08-01","primaryKey":"bothSegments","unaligned_segment1":"A","unaligned_segment2":"G"}, {"date":"2024-08-02","primaryKey":"onlySegment2","unaligned_segment1":null,"unaligned_segment2":"T"}, diff --git a/src/silo/test/float_equals_and_between.test.cpp b/src/silo/test/float_equals_and_between.test.cpp index a9ba24459..945473a9f 100644 --- a/src/silo/test/float_equals_and_between.test.cpp +++ b/src/silo/test/float_equals_and_between.test.cpp @@ -3,7 +3,6 @@ #include "silo/test/query_fixture.test.h" using silo::ReferenceGenomes; -using silo::test::negateFilter; using silo::test::QueryTestData; using silo::test::QueryTestScenario; @@ -12,8 +11,6 @@ namespace { const double VALUE_IN_FILTER = 1.23; const double VALUE_BELOW_FILTER = 0.345; const double VALUE_ABOVE_FILTER = 2.345; -const double BELOW_FILTER = 0.5; -const double ABOVE_FILTER = 1.5; nlohmann::json createDataWithFloatValue(const std::string& primaryKey, double value) { return { @@ -67,28 +64,9 @@ const QueryTestData TEST_DATA{ .reference_genomes = REFERENCE_GENOMES }; -nlohmann::json createFloatEqualsQuery(const std::string& column, const nlohmann::json value) { - return { - {"action", {{"type", "Details"}}}, - {"filterExpression", {{"type", "FloatEquals"}, {"column", column}, {"value", value}}} - }; -} - -nlohmann::json createFloatBetweenQuery( - const std::string& column, - const nlohmann::json from_value, - const nlohmann::json to_value -) { - return { - {"action", {{"type", "Details"}}}, - {"filterExpression", - {{"type", "FloatBetween"}, {"column", column}, {"from", from_value}, {"to", to_value}}} - }; -} - const QueryTestScenario FLOAT_EQUALS_VALUE_SCENARIO = { .name = "FLOAT_EQUALS_VALUE_SCENARIO", - .query = createFloatEqualsQuery("float_value", VALUE_IN_FILTER), + .query = "default.filter(float_value = 1.23).project({primaryKey, float_value})", .expected_query_result = nlohmann::json( {{{"primaryKey", "id_0"}, {"float_value", VALUE_IN_FILTER}}, {{"primaryKey", "id_1"}, {"float_value", VALUE_IN_FILTER}}} @@ -97,7 +75,7 @@ const QueryTestScenario FLOAT_EQUALS_VALUE_SCENARIO = { const QueryTestScenario NEGATED_FLOAT_EQUALS_VALUE_SCENARIO = { .name = "NEGATED_FLOAT_EQUALS_VALUE_SCENARIO", - .query = negateFilter(createFloatEqualsQuery("float_value", VALUE_IN_FILTER)), + .query = "default.filter(!(float_value = 1.23)).project({primaryKey, float_value})", .expected_query_result = nlohmann::json( {{{"primaryKey", "id_2"}, {"float_value", VALUE_BELOW_FILTER}}, {{"primaryKey", "id_3"}, {"float_value", VALUE_ABOVE_FILTER}}, @@ -107,13 +85,13 @@ const QueryTestScenario NEGATED_FLOAT_EQUALS_VALUE_SCENARIO = { const QueryTestScenario FLOAT_EQUALS_NULL_SCENARIO = { .name = "FLOAT_EQUALS_NULL_SCENARIO", - .query = createFloatEqualsQuery("float_value", nullptr), + .query = "default.filter(float_value = null).project({primaryKey, float_value})", .expected_query_result = nlohmann::json({{{"primaryKey", "id_4"}, {"float_value", nullptr}}}) }; const QueryTestScenario NEGATED_FLOAT_EQUALS_NULL_SCENARIO = { .name = "NEGATED_FLOAT_EQUALS_NULL_SCENARIO", - .query = negateFilter(createFloatEqualsQuery("float_value", nullptr)), + .query = "default.filter(!(float_value = null)).project({primaryKey, float_value})", .expected_query_result = nlohmann::json( {{{"primaryKey", "id_0"}, {"float_value", VALUE_IN_FILTER}}, {{"primaryKey", "id_1"}, {"float_value", VALUE_IN_FILTER}}, @@ -124,7 +102,7 @@ const QueryTestScenario NEGATED_FLOAT_EQUALS_NULL_SCENARIO = { const QueryTestScenario FLOAT_BETWEEN_WITH_FROM_AND_TO_SCENARIO = { .name = "FLOAT_BETWEEN_WITH_FROM_AND_TO_SCENARIO", - .query = createFloatBetweenQuery("float_value", BELOW_FILTER, ABOVE_FILTER), + .query = "default.filter(float_value.between(0.5, 1.5)).project({primaryKey, float_value})", .expected_query_result = nlohmann::json({ {{"primaryKey", "id_0"}, {"float_value", VALUE_IN_FILTER}}, {{"primaryKey", "id_1"}, {"float_value", VALUE_IN_FILTER}}, @@ -133,7 +111,7 @@ const QueryTestScenario FLOAT_BETWEEN_WITH_FROM_AND_TO_SCENARIO = { const QueryTestScenario NEGATED_FLOAT_BETWEEN_WITH_FROM_AND_TO_SCENARIO = { .name = "NEGATED_FLOAT_BETWEEN_WITH_FROM_AND_TO_SCENARIO", - .query = negateFilter(createFloatBetweenQuery("float_value", BELOW_FILTER, ABOVE_FILTER)), + .query = "default.filter(!(float_value.between(0.5, 1.5))).project({primaryKey, float_value})", .expected_query_result = nlohmann::json( {{{"primaryKey", "id_2"}, {"float_value", VALUE_BELOW_FILTER}}, {{"primaryKey", "id_3"}, {"float_value", VALUE_ABOVE_FILTER}}, @@ -143,7 +121,7 @@ const QueryTestScenario NEGATED_FLOAT_BETWEEN_WITH_FROM_AND_TO_SCENARIO = { const QueryTestScenario FLOAT_BETWEEN_WITH_FROM_SCENARIO = { .name = "FLOAT_BETWEEN_WITH_FROM_SCENARIO", - .query = createFloatBetweenQuery("float_value", BELOW_FILTER, nullptr), + .query = "default.filter(float_value >= 0.5).project({primaryKey, float_value})", .expected_query_result = nlohmann::json( {{{"primaryKey", "id_0"}, {"float_value", VALUE_IN_FILTER}}, {{"primaryKey", "id_1"}, {"float_value", VALUE_IN_FILTER}}, @@ -153,7 +131,7 @@ const QueryTestScenario FLOAT_BETWEEN_WITH_FROM_SCENARIO = { const QueryTestScenario NEGATED_FLOAT_BETWEEN_WITH_FROM_SCENARIO = { .name = "NEGATED_FLOAT_BETWEEN_WITH_FROM_SCENARIO", - .query = negateFilter(createFloatBetweenQuery("float_value", BELOW_FILTER, nullptr)), + .query = "default.filter(!(float_value >= 0.5)).project({primaryKey, float_value})", .expected_query_result = nlohmann::json( {{{"primaryKey", "id_2"}, {"float_value", VALUE_BELOW_FILTER}}, {{"primaryKey", "id_4"}, {"float_value", nullptr}}} @@ -162,7 +140,7 @@ const QueryTestScenario NEGATED_FLOAT_BETWEEN_WITH_FROM_SCENARIO = { const QueryTestScenario FLOAT_BETWEEN_WITH_TO_SCENARIO = { .name = "FLOAT_BETWEEN_WITH_TO_SCENARIO", - .query = createFloatBetweenQuery("float_value", nullptr, ABOVE_FILTER), + .query = "default.filter(float_value <= 1.5).project({primaryKey, float_value})", .expected_query_result = nlohmann::json( {{{"primaryKey", "id_0"}, {"float_value", VALUE_IN_FILTER}}, {{"primaryKey", "id_1"}, {"float_value", VALUE_IN_FILTER}}, @@ -172,7 +150,7 @@ const QueryTestScenario FLOAT_BETWEEN_WITH_TO_SCENARIO = { const QueryTestScenario NEGATED_FLOAT_BETWEEN_WITH_TO_SCENARIO = { .name = "NEGATED_FLOAT_BETWEEN_WITH_TO_SCENARIO", - .query = negateFilter(createFloatBetweenQuery("float_value", nullptr, ABOVE_FILTER)), + .query = "default.filter(!(float_value <= 1.5)).project({primaryKey, float_value})", .expected_query_result = nlohmann::json( {{{"primaryKey", "id_3"}, {"float_value", VALUE_ABOVE_FILTER}}, {{"primaryKey", "id_4"}, {"float_value", nullptr}}} @@ -181,7 +159,7 @@ const QueryTestScenario NEGATED_FLOAT_BETWEEN_WITH_TO_SCENARIO = { const QueryTestScenario FLOAT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO = { .name = "FLOAT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO", - .query = createFloatBetweenQuery("float_value", nullptr, nullptr), + .query = "default.filter(float_value.isNotNull()).project({primaryKey, float_value})", .expected_query_result = nlohmann::json( {{{"primaryKey", "id_0"}, {"float_value", VALUE_IN_FILTER}}, {{"primaryKey", "id_1"}, {"float_value", VALUE_IN_FILTER}}, @@ -192,27 +170,10 @@ const QueryTestScenario FLOAT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO = { const QueryTestScenario NEGATED_FLOAT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO = { .name = "NEGATED_FLOAT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO", - .query = negateFilter(createFloatBetweenQuery("float_value", nullptr, nullptr)), + .query = "default.filter(!(float_value.isNotNull())).project({primaryKey, float_value})", .expected_query_result = nlohmann::json({{{"primaryKey", "id_4"}, {"float_value", nullptr}}}) }; -const QueryTestScenario FLOAT_EQUALS_WITH_INVALID_VALUE = { - .name = "FLOAT_EQUALS_WITH_INVALID_VALUE", - .query = createFloatEqualsQuery("float_value", "not_a_number"), - .expected_error_message = "The field 'value' in a FloatEquals expression must be a float or null" -}; - -const QueryTestScenario FLOAT_BETWEEN_WITH_INVALID_FROM_VALUE = { - .name = "FLOAT_BETWEEN_WITH_INVALID_FROM_VALUE", - .query = createFloatBetweenQuery("float_value", false, 1.0), - .expected_error_message = "The field 'from' in a FloatBetween expression must be a float or null" -}; - -const QueryTestScenario FLOAT_BETWEEN_WITH_INVALID_TO_VALUE = { - .name = "FLOAT_BETWEEN_WITH_INVALID_TO_VALUE", - .query = createFloatBetweenQuery("float_value", 0.0, "test"), - .expected_error_message = "The field 'to' in a FloatBetween expression must be a float or null" -}; } // namespace QUERY_TEST( @@ -230,9 +191,6 @@ QUERY_TEST( FLOAT_BETWEEN_WITH_TO_SCENARIO, NEGATED_FLOAT_BETWEEN_WITH_TO_SCENARIO, FLOAT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO, - NEGATED_FLOAT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO, - FLOAT_EQUALS_WITH_INVALID_VALUE, - FLOAT_BETWEEN_WITH_INVALID_FROM_VALUE, - FLOAT_BETWEEN_WITH_INVALID_TO_VALUE + NEGATED_FLOAT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO ) -); \ No newline at end of file +); diff --git a/src/silo/test/has_mutation.test.cpp b/src/silo/test/has_mutation.test.cpp index f8f2c5c20..f2bd14090 100644 --- a/src/silo/test/has_mutation.test.cpp +++ b/src/silo/test/has_mutation.test.cpp @@ -53,61 +53,58 @@ const QueryTestData TEST_DATA{ .reference_genomes = REFERENCE_GENOMES }; -nlohmann::json createHasNucleotideMutationQuery(int position) { - return { - {"action", {{"type", "Aggregated"}}}, - {"filterExpression", - {{"type", "HasNucleotideMutation"}, {"position", position}, {"sequenceNames", {"segment1"}}}} - }; -} - -nlohmann::json createHasAminoAcidMutationQuery(int position) { - return { - {"action", {{"type", "Aggregated"}}}, - {"filterExpression", - {{"type", "HasAminoAcidMutation"}, {"position", position}, {"sequenceNames", {"gene1"}}}} - }; -} - const QueryTestScenario HAS_NUCLEOTIDE_MUTATION = { .name = "HAS_NUCLEOTIDE_MUTATION", - .query = createHasNucleotideMutationQuery(1), + .query = + "default.filter(hasMutation(position:=1, " + "sequenceName:='segment1')).groupBy({count:=count()})", .expected_query_result = nlohmann::json::parse(R"([{"count": 1}])") }; const QueryTestScenario HAS_AMINO_ACID_MUTATION = { .name = "HAS_AMINO_ACID_MUTATION", - .query = createHasAminoAcidMutationQuery(1), + .query = + "default.filter(hasAAMutation(position:=1, sequenceName:='gene1')).groupBy({count:=count()})", .expected_query_result = nlohmann::json::parse(R"([{"count": 1}])") }; const QueryTestScenario HAS_NUCLEOTIDE_MUTATION_OUT_OF_RANGE = { .name = "HAS_NUCLEOTIDE_MUTATION_OUT_OF_RANGE", - .query = createHasNucleotideMutationQuery(2000), + .query = + "default.filter(hasMutation(position:=2000, " + "sequenceName:='segment1')).groupBy({count:=count()})", .expected_error_message = "HasNucleotideMutation position is out of bounds 2000 > 5" }; const QueryTestScenario HAS_NUCLEOTIDE_MUTATION_OUT_OF_RANGE_EDGE_LOW = { .name = "HAS_NUCLEOTIDE_MUTATION_OUT_OF_RANGE_EDGE_HIGH", - .query = createHasNucleotideMutationQuery(0), + .query = + "default.filter(hasMutation(position:=0, " + "sequenceName:='segment1')).groupBy({count:=count()})", .expected_error_message = "The field 'position' is 1-indexed. Value of 0 not allowed." }; const QueryTestScenario HAS_NUCLEOTIDE_MUTATION_OUT_OF_RANGE_EDGE_HIGH = { .name = "HAS_NUCLEOTIDE_MUTATION_OUT_OF_RANGE_EDGE_LOW", - .query = createHasNucleotideMutationQuery(6), + .query = + "default.filter(hasMutation(position:=6, " + "sequenceName:='segment1')).groupBy({count:=count()})", .expected_error_message = "HasNucleotideMutation position is out of bounds 6 > 5" }; const QueryTestScenario HAS_NUCLEOTIDE_MUTATION_IN_RANGE_EDGE = { .name = "HAS_NUCLEOTIDE_MUTATION_IN_RANGE_EDGE", - .query = createHasNucleotideMutationQuery(5), + .query = + "default.filter(hasMutation(position:=5, " + "sequenceName:='segment1')).groupBy({count:=count()})", .expected_query_result = nlohmann::json::parse(R"([{"count": 1}])") }; const QueryTestScenario HAS_AMINO_ACID_MUTATION_OUT_OF_RANGE = { .name = "HAS_AMINO_ACID_MUTATION_OUT_OF_RANGE", - .query = createHasAminoAcidMutationQuery(1000), + .query = + "default.filter(hasAAMutation(position:=1000, " + "sequenceName:='gene1')).groupBy({count:=count()})", .expected_error_message = "HasAminoAcidMutation position is out of bounds 1000 > 2" }; diff --git a/src/silo/test/insertion_contains.test.cpp b/src/silo/test/insertion_contains.test.cpp index 301b67144..2ddfad366 100644 --- a/src/silo/test/insertion_contains.test.cpp +++ b/src/silo/test/insertion_contains.test.cpp @@ -56,58 +56,34 @@ const QueryTestData TEST_DATA{ .reference_genomes = REFERENCE_GENOMES }; -nlohmann::json createInsertionContainsQuery( - const nlohmann::json& sequenceName, - int position, - const std::string& insertedSymbols -) { - return { - {"action", {{"type", "Details"}}}, - {"filterExpression", - {{"type", "InsertionContains"}, - {"position", position}, - {"value", insertedSymbols}, - {"sequenceName", sequenceName}}} - }; -} - -nlohmann::json createInsertionContainsQueryWithEmptySequenceName( - int position, - const std::string& insertedSymbols -) { - return { - {"action", {{"type", "Details"}}}, - {"filterExpression", - { - {"type", "InsertionContains"}, - {"position", position}, - {"value", insertedSymbols}, - }} - }; -} - const QueryTestScenario INSERTION_CONTAINS_SCENARIO = { .name = "INSERTION_CONTAINS_SCENARIO", - .query = createInsertionContainsQuery("segment1", 12, "A"), + .query = + "default.filter(insertionContains(position:=12, value:='A', " + "sequenceName:='segment1')).project(primaryKey)", .expected_query_result = nlohmann::json({{{"primaryKey", "id_0"}}, {{"primaryKey", "id_1"}}}) }; const QueryTestScenario INSERTION_CONTAINS_WITH_EMPTY_SEGMENT_SCENARIO = { .name = "INSERTION_CONTAINS_WITH_EMPTY_SEGMENT_SCENARIO", - .query = createInsertionContainsQueryWithEmptySequenceName(12, "A"), + .query = "default.filter(insertionContains(position:=12, value:='A')).project(primaryKey)", .expected_query_result = nlohmann::json({{{"primaryKey", "id_0"}}, {{"primaryKey", "id_1"}}}) }; const QueryTestScenario INSERTION_CONTAINS_WITH_UNKNOWN_SEGMENT_SCENARIO = { .name = "INSERTION_CONTAINS_WITH_UNKNOWN_SEGMENT_SCENARIO", - .query = createInsertionContainsQuery("unknownSegmentName", 12, "A"), + .query = + "default.filter(insertionContains(position:=12, value:='A', " + "sequenceName:='unknownSegmentName'))", .expected_error_message = "Database does not contain the Nucleotide Sequence with name: 'unknownSegmentName'" }; const QueryTestScenario INSERTION_CONTAINS_POSITION_OUT_OF_RANGE = { .name = "INSERTION_CONTAINS_POSITION_OUT_OF_RANGE", - .query = createInsertionContainsQuery("segment2", 100, "A"), + .query = + "default.filter(insertionContains(position:=100, value:='A', sequenceName:='segment2'))" + "", .expected_error_message = "the requested insertion position (100) is larger than the length of the reference sequence " "(32) for sequence 'segment2'" @@ -115,7 +91,7 @@ const QueryTestScenario INSERTION_CONTAINS_POSITION_OUT_OF_RANGE = { const QueryTestScenario INSERTION_CONTAINS_POSITION_OUT_OF_RANGE_DEFAULT_SEQUENCE = { .name = "INSERTION_CONTAINS_POSITION_OUT_OF_RANGE_DEFAULT_SEQUENCE", - .query = createInsertionContainsQueryWithEmptySequenceName(100, "A"), + .query = "default.filter(insertionContains(position:=100, value:='A'))", .expected_error_message = "the requested insertion position (100) is larger than the length of the reference sequence " "(32) for sequence 'segment1'" diff --git a/src/silo/test/int_equals_and_between.test.cpp b/src/silo/test/int_equals_and_between.test.cpp index 5cd184606..5ad4b08d6 100644 --- a/src/silo/test/int_equals_and_between.test.cpp +++ b/src/silo/test/int_equals_and_between.test.cpp @@ -4,15 +4,12 @@ namespace { using silo::ReferenceGenomes; -using silo::test::negateFilter; using silo::test::QueryTestData; using silo::test::QueryTestScenario; const int VALUE_IN_FILTER = 3; const int VALUE_BELOW_FILTER = 1; const int VALUE_ABOVE_FILTER = 5; -const int BELOW_FILTER = 2; -const int ABOVE_FILTER = 4; nlohmann::json createDataWithIntValue(const std::string& primaryKey, int value) { return { @@ -66,158 +63,240 @@ const QueryTestData TEST_DATA{ .reference_genomes = REFERENCE_GENOMES }; -nlohmann::json createIntEqualsQuery(const std::string& column, const nlohmann::json value) { - return { - {"action", {{"type", "Details"}}}, - {"filterExpression", {{"type", "IntEquals"}, {"column", column}, {"value", value}}} - }; -} - -nlohmann::json createIntBetweenQuery( - const std::string& column, - const nlohmann::json from_value, - const nlohmann::json to_value -) { - return { - {"action", {{"type", "Details"}}}, - {"filterExpression", - {{"type", "IntBetween"}, {"column", column}, {"from", from_value}, {"to", to_value}}} - }; -} - const QueryTestScenario INT_EQUALS_VALUE_SCENARIO = { .name = "INT_EQUALS_VALUE_SCENARIO", - .query = createIntEqualsQuery("int_value", VALUE_IN_FILTER), + .query = "default.filter(int_value = 3)", .expected_query_result = nlohmann::json( - {{{"primaryKey", "id_0"}, {"int_value", VALUE_IN_FILTER}}, - {{"primaryKey", "id_1"}, {"int_value", VALUE_IN_FILTER}}} + {{{"primaryKey", "id_0"}, + {"int_value", VALUE_IN_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_1"}, + {"int_value", VALUE_IN_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}} ) }; const QueryTestScenario NEGATED_INT_EQUALS_VALUE_SCENARIO = { .name = "NEGATED_INT_EQUALS_VALUE_SCENARIO", - .query = negateFilter(createIntEqualsQuery("int_value", VALUE_IN_FILTER)), + .query = "default.filter(!(int_value = 3))", .expected_query_result = nlohmann::json( - {{{"primaryKey", "id_2"}, {"int_value", VALUE_BELOW_FILTER}}, - {{"primaryKey", "id_3"}, {"int_value", VALUE_ABOVE_FILTER}}, - {{"primaryKey", "id_4"}, {"int_value", nullptr}}} + {{{"primaryKey", "id_2"}, + {"int_value", VALUE_BELOW_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_3"}, + {"int_value", VALUE_ABOVE_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_4"}, + {"int_value", nullptr}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}} ) }; const QueryTestScenario INT_EQUALS_NULL_SCENARIO = { .name = "INT_EQUALS_NULL_SCENARIO", - .query = createIntEqualsQuery("int_value", nullptr), - .expected_query_result = nlohmann::json({{{"primaryKey", "id_4"}, {"int_value", nullptr}}}) + .query = "default.filter(int_value = null)", + .expected_query_result = nlohmann::json( + {{{"primaryKey", "id_4"}, + {"int_value", nullptr}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}} + ) }; const QueryTestScenario NEGATED_INT_EQUALS_NULL_SCENARIO = { .name = "NEGATED_INT_EQUALS_NULL_SCENARIO", - .query = negateFilter(createIntEqualsQuery("int_value", nullptr)), + .query = "default.filter(!(int_value = null))", .expected_query_result = nlohmann::json( - {{{"primaryKey", "id_0"}, {"int_value", VALUE_IN_FILTER}}, - {{"primaryKey", "id_1"}, {"int_value", VALUE_IN_FILTER}}, - {{"primaryKey", "id_2"}, {"int_value", VALUE_BELOW_FILTER}}, - {{"primaryKey", "id_3"}, {"int_value", VALUE_ABOVE_FILTER}}} + {{{"primaryKey", "id_0"}, + {"int_value", VALUE_IN_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_1"}, + {"int_value", VALUE_IN_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_2"}, + {"int_value", VALUE_BELOW_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_3"}, + {"int_value", VALUE_ABOVE_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}} ) }; const QueryTestScenario INT_BETWEEN_WITH_FROM_AND_TO_SCENARIO = { .name = "INT_BETWEEN_WITH_FROM_AND_TO_SCENARIO", - .query = createIntBetweenQuery("int_value", BELOW_FILTER, ABOVE_FILTER), + .query = "default.filter(int_value.between(2, 4))", .expected_query_result = nlohmann::json({ - {{"primaryKey", "id_0"}, {"int_value", VALUE_IN_FILTER}}, - {{"primaryKey", "id_1"}, {"int_value", VALUE_IN_FILTER}}, + {{"primaryKey", "id_0"}, + {"int_value", VALUE_IN_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_1"}, + {"int_value", VALUE_IN_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, }) }; const QueryTestScenario NEGATED_INT_BETWEEN_WITH_FROM_AND_TO_SCENARIO = { .name = "NEGATED_INT_BETWEEN_WITH_FROM_AND_TO_SCENARIO", - .query = negateFilter(createIntBetweenQuery("int_value", BELOW_FILTER, ABOVE_FILTER)), + .query = "default.filter(!(int_value.between(2, 4)))", .expected_query_result = nlohmann::json( - {{{"primaryKey", "id_2"}, {"int_value", VALUE_BELOW_FILTER}}, - {{"primaryKey", "id_3"}, {"int_value", VALUE_ABOVE_FILTER}}, - {{"primaryKey", "id_4"}, {"int_value", nullptr}}} + {{{"primaryKey", "id_2"}, + {"int_value", VALUE_BELOW_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_3"}, + {"int_value", VALUE_ABOVE_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_4"}, + {"int_value", nullptr}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}} ) }; const QueryTestScenario INT_BETWEEN_WITH_FROM_SCENARIO = { .name = "INT_BETWEEN_WITH_FROM_SCENARIO", - .query = createIntBetweenQuery("int_value", BELOW_FILTER, nullptr), + .query = "default.filter(int_value >= 2)", .expected_query_result = nlohmann::json( - {{{"primaryKey", "id_0"}, {"int_value", VALUE_IN_FILTER}}, - {{"primaryKey", "id_1"}, {"int_value", VALUE_IN_FILTER}}, - {{"primaryKey", "id_3"}, {"int_value", VALUE_ABOVE_FILTER}}} + {{{"primaryKey", "id_0"}, + {"int_value", VALUE_IN_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_1"}, + {"int_value", VALUE_IN_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_3"}, + {"int_value", VALUE_ABOVE_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}} ) }; const QueryTestScenario NEGATED_INT_BETWEEN_WITH_FROM_SCENARIO = { .name = "NEGATED_INT_BETWEEN_WITH_FROM_SCENARIO", - .query = negateFilter(createIntBetweenQuery("int_value", BELOW_FILTER, nullptr)), + .query = "default.filter(!(int_value >= 2))", .expected_query_result = nlohmann::json( - {{{"primaryKey", "id_2"}, {"int_value", VALUE_BELOW_FILTER}}, - {{"primaryKey", "id_4"}, {"int_value", nullptr}}} + {{{"primaryKey", "id_2"}, + {"int_value", VALUE_BELOW_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_4"}, + {"int_value", nullptr}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}} ) }; const QueryTestScenario INT_BETWEEN_WITH_TO_SCENARIO = { .name = "INT_BETWEEN_WITH_TO_SCENARIO", - .query = createIntBetweenQuery("int_value", nullptr, ABOVE_FILTER), + .query = "default.filter(int_value <= 4)", .expected_query_result = nlohmann::json( - {{{"primaryKey", "id_0"}, {"int_value", VALUE_IN_FILTER}}, - {{"primaryKey", "id_1"}, {"int_value", VALUE_IN_FILTER}}, - {{"primaryKey", "id_2"}, {"int_value", VALUE_BELOW_FILTER}}} + {{{"primaryKey", "id_0"}, + {"int_value", VALUE_IN_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_1"}, + {"int_value", VALUE_IN_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_2"}, + {"int_value", VALUE_BELOW_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}} ) }; const QueryTestScenario NEGATED_INT_BETWEEN_WITH_TO_SCENARIO = { .name = "NEGATED_INT_BETWEEN_WITH_TO_SCENARIO", - .query = negateFilter(createIntBetweenQuery("int_value", nullptr, ABOVE_FILTER)), + .query = "default.filter(!(int_value <= 4))", .expected_query_result = nlohmann::json( - {{{"primaryKey", "id_3"}, {"int_value", VALUE_ABOVE_FILTER}}, - {{"primaryKey", "id_4"}, {"int_value", nullptr}}} + {{{"primaryKey", "id_3"}, + {"int_value", VALUE_ABOVE_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_4"}, + {"int_value", nullptr}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}} ) }; const QueryTestScenario INT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO = { .name = "INT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO", - .query = createIntBetweenQuery("int_value", nullptr, nullptr), + .query = "default.filter(int_value.isNotNull())", .expected_query_result = nlohmann::json( - {{{"primaryKey", "id_0"}, {"int_value", VALUE_IN_FILTER}}, - {{"primaryKey", "id_1"}, {"int_value", VALUE_IN_FILTER}}, - {{"primaryKey", "id_2"}, {"int_value", VALUE_BELOW_FILTER}}, - {{"primaryKey", "id_3"}, {"int_value", VALUE_ABOVE_FILTER}}} + {{{"primaryKey", "id_0"}, + {"int_value", VALUE_IN_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_1"}, + {"int_value", VALUE_IN_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_2"}, + {"int_value", VALUE_BELOW_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}, + {{"primaryKey", "id_3"}, + {"int_value", VALUE_ABOVE_FILTER}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}} ) }; const QueryTestScenario NEGATED_INT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO = { .name = "NEGATED_INT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO", - .query = negateFilter(createIntBetweenQuery("int_value", nullptr, nullptr)), - .expected_query_result = nlohmann::json({{{"primaryKey", "id_4"}, {"int_value", nullptr}}}) -}; - -const QueryTestScenario INT_EQUALS_WITH_INVALID_VALUE = { - .name = "INT_EQUALS_WITH_INVALID_VALUE", - .query = createIntEqualsQuery("int_value", 0.3), - .expected_error_message = - "The field 'value' in an IntEquals expression must be an integer in [-2147483648; " - "2147483647] or null" -}; - -const QueryTestScenario INT_BETWEEN_WITH_INVALID_FROM_VALUE = { - .name = "INT_BETWEEN_WITH_INVALID_FROM_VALUE", - .query = createIntBetweenQuery("int_value", false, 1), - .expected_error_message = - "The field 'from' in an IntBetween expression must be an integer in [-2147483648; " - "2147483647] or null" + .query = "default.filter(!(int_value.isNotNull()))", + .expected_query_result = nlohmann::json( + {{{"primaryKey", "id_4"}, + {"int_value", nullptr}, + {"segment1", nullptr}, + {"gene1", nullptr}, + {"unaligned_segment1", nullptr}}} + ) }; -const QueryTestScenario INT_BETWEEN_WITH_INVALID_TO_VALUE = { - .name = "INT_BETWEEN_WITH_INVALID_TO_VALUE", - .query = createIntBetweenQuery("int_value", 0, "test"), - .expected_error_message = - "The field 'to' in an IntBetween expression must be an integer in [-2147483648; 2147483647] " - "or null" -}; } // namespace QUERY_TEST( @@ -235,9 +314,6 @@ QUERY_TEST( INT_BETWEEN_WITH_TO_SCENARIO, NEGATED_INT_BETWEEN_WITH_TO_SCENARIO, INT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO, - NEGATED_INT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO, - INT_EQUALS_WITH_INVALID_VALUE, - INT_BETWEEN_WITH_INVALID_FROM_VALUE, - INT_BETWEEN_WITH_INVALID_TO_VALUE + NEGATED_INT_BETWEEN_WITH_FROM_AND_TO_NULL_SCENARIO ) ); diff --git a/src/silo/test/nucleotide_symbol_equals.test.cpp b/src/silo/test/nucleotide_symbol_equals.test.cpp index 78c0ffb90..fb24df06c 100644 --- a/src/silo/test/nucleotide_symbol_equals.test.cpp +++ b/src/silo/test/nucleotide_symbol_equals.test.cpp @@ -52,24 +52,13 @@ const QueryTestData TEST_DATA{ .reference_genomes = REFERENCE_GENOMES }; -nlohmann::json createNucleotideSymbolEqualsQuery(const std::string& symbol, int position) { - return nlohmann::json::parse(fmt::format( - R"( -{{ - "action": {{ - "type": "Aggregated" - }}, - "filterExpression": {{ - "type": "NucleotideEquals", - "position": {}, - "symbol": "{}", - "sequenceName": "segment1" - }} -}} -)", +std::string createNucleotideSymbolEqualsQuery(const std::string& symbol, int position) { + return fmt::format( + "default.filter(nucleotideEquals(position:={}, symbol:='{}', " + "sequenceName:='segment1')).groupBy({{count:=count()}})", position, symbol - )); + ); } const QueryTestScenario NUCLEOTIDE_EQUALS_WITH_SYMBOL = { diff --git a/src/silo/test/query_fixture.test.cpp b/src/silo/test/query_fixture.test.cpp index cf01efa35..238fcd4a1 100644 --- a/src/silo/test/query_fixture.test.cpp +++ b/src/silo/test/query_fixture.test.cpp @@ -1,16 +1,32 @@ #include "silo/test/query_fixture.test.h" +#include +#include +#include + +#include "silo/query_engine/exec_node/ndjson_sink.h" + namespace silo::test { std::string printScenarioName(const ::testing::TestParamInfo& scenario) { return scenario.param.name; } -nlohmann::json negateFilter(const nlohmann::json& query) { - return nlohmann::json{ - {"action", query["action"]}, - {"filterExpression", {{"type", "Not"}, {"child", query["filterExpression"]}}} - }; +nlohmann::json executeQueryToJsonArray( + query_engine::QueryPlan& query_plan, + uint64_t timeout_in_seconds +) { + std::stringstream buffer; + query_engine::exec_node::NdjsonSink output_sink{&buffer, query_plan.results_schema}; + query_plan.executeAndWrite(output_sink, timeout_in_seconds); + nlohmann::json result = nlohmann::json::array(); + std::string line; + while (std::getline(buffer, line)) { + auto line_object = nlohmann::json::parse(line); + std::cout << line_object.dump() << '\n'; + result.push_back(line_object); + } + return result; } } // namespace silo::test diff --git a/src/silo/test/query_fixture.test.h b/src/silo/test/query_fixture.test.h index 5292d4e9c..249467043 100644 --- a/src/silo/test/query_fixture.test.h +++ b/src/silo/test/query_fixture.test.h @@ -15,11 +15,11 @@ #include "silo/config/database_config.h" #include "silo/database.h" #include "silo/initialize/initializer.h" -#include "silo/query_engine/action_query.h" -#include "silo/query_engine/binder.h" #include "silo/query_engine/exec_node/ndjson_sink.h" #include "silo/query_engine/planner.h" #include "silo/query_engine/query_plan.h" +#include "silo/query_engine/saneql/ast_to_query.h" +#include "silo/query_engine/saneql/parser.h" #include "silo/storage/reference_genomes.h" namespace silo::test { @@ -74,6 +74,13 @@ struct QueryTestScenario { std::string printScenarioName(const ::testing::TestParamInfo& scenario); +nlohmann::json executeQueryToJsonArray( + query_engine::QueryPlan& query_plan, + uint64_t timeout_in_seconds = 3 +); + +nlohmann::json negateFilter(const nlohmann::json& query); + template class QueryTestFixture : public ::testing::TestWithParam { public: @@ -110,43 +117,25 @@ class QueryTestFixture : public ::testing::TestWithParam { } const auto query_options = scenario.query_options.value_or(config::RuntimeConfig::withDefaults().query_options); + const auto query_string = scenario.query.get(); if (!scenario.expected_error_message.empty()) { try { - auto query = query_engine::ActionQuery::parseQuery(scenario.query.dump()); - auto bound_query = - query_engine::Binder::bindQuery(std::move(query), shared_database->tables); - auto query_plan = query_engine::Planner::planQuery( - std::move(bound_query), shared_database->tables, query_options, "some_id" + auto query_plan = query_engine::Planner::planSaneqlQuery( + query_string, shared_database->tables, query_options, "some_id" ); - std::stringstream buffer; - query_engine::exec_node::NdjsonSink output_sink{&buffer, query_plan.results_schema}; - query_plan.executeAndWrite(output_sink, /*timeout_in_seconds=*/3); + executeQueryToJsonArray(query_plan); FAIL() << "Expected an error in test case, but nothing was thrown"; } catch (const std::exception& e) { EXPECT_EQ(std::string(e.what()), scenario.expected_error_message); } } else { - auto query = query_engine::ActionQuery::parseQuery(scenario.query.dump()); - auto bound_query = - query_engine::Binder::bindQuery(std::move(query), shared_database->tables); - auto query_plan = query_engine::Planner::planQuery( - std::move(bound_query), shared_database->tables, query_options, "some_id" + auto query_plan = query_engine::Planner::planSaneqlQuery( + query_string, shared_database->tables, query_options, "some_id" ); - std::stringstream buffer; - query_engine::exec_node::NdjsonSink output_sink{&buffer, query_plan.results_schema}; - query_plan.executeAndWrite(output_sink, /*timeout_in_seconds=*/3); - nlohmann::json actual_ndjson_result_as_array = nlohmann::json::array(); - std::string line; - while (std::getline(buffer, line)) { - auto line_object = nlohmann::json::parse(line); - std::cout << line_object.dump() << '\n'; - actual_ndjson_result_as_array.push_back(line_object); - } + auto actual_ndjson_result_as_array = executeQueryToJsonArray(query_plan); ASSERT_EQ(actual_ndjson_result_as_array, scenario.expected_query_result); } } }; -nlohmann::json negateFilter(const nlohmann::json& query); - } // namespace silo::test diff --git a/src/silo/test/randomize.test.cpp b/src/silo/test/randomize.test.cpp index 565cd0295..25aabecb1 100644 --- a/src/silo/test/randomize.test.cpp +++ b/src/silo/test/randomize.test.cpp @@ -82,10 +82,7 @@ const QueryTestData TEST_DATA{ const QueryTestScenario RANDOMIZE_SEED = { .name = "RANDOMIZE_SEED", - .query = json::parse( - R"({"action": {"type": "Details", "fields": ["key"], "randomize": {"seed": 1231}}, - "filterExpression": {"type": "True"}})" - ), + .query = "default.project(key).randomize(seed:=1231)", .expected_query_result = json::parse( R"([{"key": "id5"}, {"key": "id1"}, @@ -97,10 +94,7 @@ const QueryTestScenario RANDOMIZE_SEED = { const QueryTestScenario RANDOMIZE_INDEPENDENT_ON_COL_NUMS = { .name = "RANDOMIZE_INDEPENDENT_ON_COL_NUMS", - .query = json::parse( - R"({"action": {"type": "Details", "fields": ["key", "col"], "randomize": {"seed": 1231}}, - "filterExpression": {"type": "True"}})" - ), + .query = "default.project({key, col}).randomize(seed:=1231)", .expected_query_result = json::parse( R"( [{"col":"A","key":"id5"}, @@ -114,10 +108,7 @@ const QueryTestScenario RANDOMIZE_INDEPENDENT_ON_COL_NUMS = { const QueryTestScenario RANDOMIZE_INDEPENDENT_ON_BATCH_SIZE = { .name = "RANDOMIZE_INDEPENDENT_ON_BATCH_SIZE", - .query = json::parse( - R"({"action": {"type": "Details", "fields": ["key"], "randomize": {"seed": 1231}}, - "filterExpression": {"type": "True"}})" - ), + .query = "default.project(key).randomize(seed:=1231)", .expected_query_result = json::parse( R"([{"key": "id5"}, {"key": "id1"}, @@ -130,10 +121,7 @@ const QueryTestScenario RANDOMIZE_INDEPENDENT_ON_BATCH_SIZE = { const QueryTestScenario DIFFERENT_RANDOMIZE_SEED_DIFFERENT_RESULT = { .name = "DIFFERENT_RANDOMIZE_SEED_DIFFERENT_RESULT", - .query = json::parse( - R"({"action": {"type": "Details", "fields": ["key"], "randomize": {"seed": 12312}}, - "filterExpression": {"type": "True"}})" - ), + .query = "default.project(key).randomize(seed:=12312)", .expected_query_result = json::parse( R"([{"key": "id1"}, {"key": "id3"}, @@ -145,10 +133,7 @@ const QueryTestScenario DIFFERENT_RANDOMIZE_SEED_DIFFERENT_RESULT = { const QueryTestScenario EXPLICIT_DO_NOT_RANDOMIZE = { .name = "EXPLICIT_DO_NOT_RANDOMIZE", - .query = json::parse( - R"({"action": {"type": "Details", "fields": ["key"], "randomize": false}, - "filterExpression": {"type": "True"}})" - ), + .query = "default.project(key)", .expected_query_result = json::parse( R"([{"key": "id1"}, {"key": "id2"}, @@ -160,10 +145,7 @@ const QueryTestScenario EXPLICIT_DO_NOT_RANDOMIZE = { const QueryTestScenario AGGREGATE_RANDOMIZE = { .name = "AGGREGATE_RANDOMIZE", - .query = json::parse( - R"({"action": {"type": "Aggregated", "groupByFields": ["key"], "randomize": {"seed": 12321}}, - "filterExpression": {"type": "True"}})" - ), + .query = "default.groupBy({count:=count()},{key}).randomize(seed:=12321)", .expected_query_result = json::parse( R"([ {"count": 1, "key": "id4"}, @@ -177,10 +159,7 @@ const QueryTestScenario AGGREGATE_RANDOMIZE = { const QueryTestScenario ORDER_BY_PRECEDENCE = { .name = "orderByTakePrecedenceOverRandomize", - .query = json::parse( - R"({"action": {"type": "Details", "fields": ["key", "col"], "randomize": {"seed": 12321}, "orderByFields": ["col"]}, - "filterExpression": {"type": "True"}})" - ), + .query = "default.project({key, col}).randomize(seed:=12321).orderBy({col})", .expected_query_result = json::parse( R"([ {"key": "id5", "col": "A"}, @@ -194,10 +173,7 @@ const QueryTestScenario ORDER_BY_PRECEDENCE = { const QueryTestScenario ORDER_BY_AGGREGATE_RANDOMIZE = { .name = "orderingByAggregatedCount", - .query = json::parse( - R"({"action": {"type": "Aggregated", "groupByFields": ["col"], "randomize": true, "orderByFields": ["count"]}, - "filterExpression": {"type": "True"}})" - ), + .query = "default.groupBy({count:=count()},{col}).randomize().orderBy({count})", .expected_query_result = json::parse( R"([{"count": 2, "col": "B"}, {"count": 3, "col": "A"}])" @@ -206,38 +182,28 @@ const QueryTestScenario ORDER_BY_AGGREGATE_RANDOMIZE = { const QueryTestScenario LIMIT_2_RANDOMIZE = { .name = "detailsWithLimit2AndOffsetRandomized", - .query = json::parse( - R"({"action": {"type": "Details", "fields": ["key", "col"], "randomize": true, - "orderByFields": ["col", "key"], "limit": 2, "offset": 2}, - "filterExpression": {"type": "True"}})" - ), + .query = + "default.project({key, col}).randomize(seed:=42).offset(2).limit(2).orderBy({col, key})", .expected_query_result = json::parse( - R"([{"key": "id5", "col": "A"}, - {"key": "id2", "col": "B"}])" + R"([{"key": "id1", "col": "A"}, + {"key": "id5", "col": "A"}])" ) }; const QueryTestScenario LIMIT_3_RANDOMIZE = { .name = "detailsWithLimit3AndOffsetRandomized", - .query = json::parse( - R"({"action": {"type": "Details", "fields": ["key", "col"], "randomize": true, - "orderByFields": ["col", "key"], "limit": 3, "offset": 2}, - "filterExpression": {"type": "True"}})" - ), + .query = + "default.project({key, col}).randomize(seed:=42).offset(2).limit(3).orderBy({col, key})", .expected_query_result = json::parse( - R"([{"key": "id5", "col": "A"}, - {"key": "id2", "col": "B"}, + R"([{"key": "id1", "col": "A"}, + {"key": "id5", "col": "A"}, {"key": "id4", "col": "B"}])" ) }; const QueryTestScenario AGGREGATE_LIMIT_RANDOMIZE = { .name = "aggregateWithLimitAndOffsetRandomized", - .query = json::parse( - R"({"action": {"type": "Aggregated", "groupByFields": ["key"], "randomize": {"seed": 12321}, -"limit": 2, "offset": 1}, - "filterExpression": {"type": "True"}})" - ), + .query = "default.groupBy({count:=count()},{key}).randomize(seed:=12321).offset(1).limit(2)", .expected_query_result = json::parse( R"([{"count": 1, "key": "id5"}, {"count": 1, "key": "id1"}])" diff --git a/src/silo/test/string_search.test.cpp b/src/silo/test/string_search.test.cpp index 75aa96924..8be9ddf41 100644 --- a/src/silo/test/string_search.test.cpp +++ b/src/silo/test/string_search.test.cpp @@ -66,14 +66,6 @@ const QueryTestData TEST_DATA{ .reference_genomes = REFERENCE_GENOMES }; -nlohmann::json createStringSearchQuery(const std::string& column, const nlohmann::json value) { - return { - {"action", {{"type", "Details"}, {"fields", {"primaryKey"}}}}, - {"filterExpression", - {{"type", "StringSearch"}, {"column", column}, {"searchExpression", value}}} - }; -} - nlohmann::json createExpectedResult(const std::vector& primary_keys) { nlohmann::json result = nlohmann::json::array(); for (const auto& primary_key : primary_keys) { @@ -84,62 +76,63 @@ nlohmann::json createExpectedResult(const std::vector& primary_keys const QueryTestScenario FILTER_FOR_AA = { .name = "filterForAA", - .query = createStringSearchQuery(TEST_COLUMN, "AA"), + .query = "default.filter(test_column.like('AA')).project(primaryKey)", .expected_query_result = createExpectedResult({"id1", "id2", "id3", "id5"}) }; const QueryTestScenario FILTER_FOR_AA_AT_THE_BEGINNING = { .name = "filterForAAatTheBeginning", - .query = createStringSearchQuery(TEST_COLUMN, "^AA"), + .query = "default.filter(test_column.like('^AA')).project(primaryKey)", .expected_query_result = createExpectedResult({"id1", "id3", "id5"}) }; const QueryTestScenario FILTER_FOR_SOMETHING_THAT_DOES_NOT_OCCUR = { .name = "filterForSomethingThatDoesNotOccur", - .query = createStringSearchQuery(TEST_COLUMN, "should not match on anything"), + .query = "default.filter(test_column.like('should not match on anything')).project(primaryKey)", .expected_query_result = createExpectedResult({}) }; const QueryTestScenario FILTER_FOR_AA_ON_INDEXED_COLUMN = { .name = "filterForAAOnIndexedColumn", - .query = createStringSearchQuery(INDEXED_TEST_COLUMN, "AA"), + .query = "default.filter(indexed_test_column.like('AA')).project(primaryKey)", .expected_query_result = createExpectedResult({"id1", "id2", "id3", "id5"}) }; const QueryTestScenario FILTER_FOR_AA_AT_THE_BEGINNING_ON_INDEXED_COLUMN = { .name = "filterForAAatTheBeginningOnIndexedColumn", - .query = createStringSearchQuery(INDEXED_TEST_COLUMN, "^AA"), + .query = "default.filter(indexed_test_column.like('^AA')).project(primaryKey)", .expected_query_result = createExpectedResult({"id1", "id3", "id5"}) }; const QueryTestScenario FILTER_FOR_SOMETHING_THAT_DOES_NOT_OCCUR_ON_INDEXED_COLUMN = { .name = "filterForSomethingThatDoesNotOccurOnIndexedColumn", - .query = createStringSearchQuery(INDEXED_TEST_COLUMN, "should not match on anything"), + .query = + "default.filter(indexed_test_column.like('should not match on anything'))" + ".project(primaryKey)", .expected_query_result = createExpectedResult({}) }; const QueryTestScenario INVALID_REGULAR_EXPRESSION = { .name = "invalidRegularExpressionShouldReturnProperError", - .query = createStringSearchQuery(TEST_COLUMN, "^("), + .query = "default.filter(test_column.like('^(')).project(primaryKey)", .expected_error_message = "Invalid Regular Expression. The parsing of the regular expression failed with the error " "'missing ): ^('. See https://github.com/google/re2/wiki/Syntax for a Syntax specification." }; -const QueryTestScenario FILTER_FOR_NULL_IS_NOT_POSSIBLE = { - .name = "filterForNullIsNotPossible", - .query = createStringSearchQuery(TEST_COLUMN, nullptr), - .expected_error_message = - "The field 'searchExpression' in an StringSearch expression needs to be a string" -}; - const QueryTestScenario FILTER_FOR_COLUMN_THAT_DOES_NOT_EXIST = { .name = "filterForColumnThatDoesNotExist", - .query = createStringSearchQuery("column_that_does_not_exist", "some value"), + .query = "default.filter(column_that_does_not_exist.like('some value')).project(primaryKey)", .expected_error_message = "The database does not contain the string column 'column_that_does_not_exist'" }; +const QueryTestScenario TABLE_NOT_FOUND = { + .name = "tableNotFound", + .query = "nonexistent.filter(test_column.like('AA')).project(primaryKey)", + .expected_error_message = "table 'nonexistent' not found in database" +}; + } // namespace QUERY_TEST( @@ -153,7 +146,7 @@ QUERY_TEST( FILTER_FOR_AA_AT_THE_BEGINNING_ON_INDEXED_COLUMN, FILTER_FOR_SOMETHING_THAT_DOES_NOT_OCCUR_ON_INDEXED_COLUMN, INVALID_REGULAR_EXPRESSION, - FILTER_FOR_NULL_IS_NOT_POSSIBLE, - FILTER_FOR_COLUMN_THAT_DOES_NOT_EXIST + FILTER_FOR_COLUMN_THAT_DOES_NOT_EXIST, + TABLE_NOT_FOUND ) ); From 9d479852ea14a6b101901e7dec3d150dd9a2de99 Mon Sep 17 00:00:00 2001 From: Alexander Taepper Date: Mon, 27 Apr 2026 12:16:39 +0200 Subject: [PATCH 2/4] fixup! feat(silo)!: change query interface to saneql --- CMakeLists.txt | 2 ++ conanfile.py | 1 + src/silo/query_engine/saneql/lexer.cpp | 4 +++- 3 files changed, 6 insertions(+), 1 deletion(-) diff --git a/CMakeLists.txt b/CMakeLists.txt index cdbb06d0c..993e8540c 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -60,6 +60,7 @@ add_compile_definitions(SIMDJSON_EXCEPTIONS=0) find_package(Arrow REQUIRED COMPONENTS Acero XYZ) find_package(Boost REQUIRED COMPONENTS system serialization iostreams) +find_package(FastFloat REQUIRED) find_package(LibLZMA REQUIRED) find_package(nlohmann_json REQUIRED) find_package(Poco REQUIRED COMPONENTS Net Util JSON) @@ -137,6 +138,7 @@ target_link_libraries( Arrow::arrow_static ArrowAcero::arrow_acero_static ${Boost_LIBRARIES} + ${FastFloat_LIBRARIES} ${duckdb_LIBRARIES} nlohmann_json::nlohmann_json ${roaring_LIBRARIES} diff --git a/conanfile.py b/conanfile.py index f0f84ff80..609689a58 100644 --- a/conanfile.py +++ b/conanfile.py @@ -8,6 +8,7 @@ class SiloRecipe(ConanFile): requires = [ "arrow/22.0.0", "boost/1.85.0", + "fast_float/8.1.0", "gtest/1.17.0", "mimalloc/2.2.4", "nlohmann_json/3.12.0", diff --git a/src/silo/query_engine/saneql/lexer.cpp b/src/silo/query_engine/saneql/lexer.cpp index 675926de5..d799998c8 100644 --- a/src/silo/query_engine/saneql/lexer.cpp +++ b/src/silo/query_engine/saneql/lexer.cpp @@ -3,6 +3,8 @@ #include #include +#include + #include "silo/query_engine/saneql/parse_exception.h" namespace silo::query_engine::saneql { @@ -154,7 +156,7 @@ Token Lexer::readNumber() { if (is_float) { double val = 0; - auto [ptr, ec] = std::from_chars(num_str.data(), num_str.data() + num_str.size(), val); + auto [ptr, ec] = fast_float::from_chars(num_str.data(), num_str.data() + num_str.size(), val); if (ec != std::errc()) { throw ParseException("Invalid float literal", start); } From bab3183d3569d8e52edc1797d3c937a11f00bbcb Mon Sep 17 00:00:00 2001 From: Alexander Taepper Date: Mon, 27 Apr 2026 12:24:43 +0200 Subject: [PATCH 3/4] fixup! feat(silo)!: change query interface to saneql --- src/silo/query_engine/planner.cpp | 8 ++++++-- src/silo/query_engine/saneql/ast_to_query.cpp | 7 ++++--- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/src/silo/query_engine/planner.cpp b/src/silo/query_engine/planner.cpp index 23df3ffa3..ef394f2f0 100644 --- a/src/silo/query_engine/planner.cpp +++ b/src/silo/query_engine/planner.cpp @@ -79,13 +79,17 @@ struct ExtractedScanInfo { std::optional extractScanInfo(operators::QueryNodePtr& node) { auto* scan = dynamic_cast(node.get()); if (scan != nullptr) { - return ExtractedScanInfo{.table_name=scan->table_name, .filter=std::make_unique()}; + return ExtractedScanInfo{ + .table_name = scan->table_name, .filter = std::make_unique() + }; } auto* filter = dynamic_cast(node.get()); if (filter != nullptr) { auto* inner_scan = dynamic_cast(filter->child.get()); if (inner_scan != nullptr) { - return ExtractedScanInfo{.table_name=inner_scan->table_name, .filter=std::move(filter->filter)}; + return ExtractedScanInfo{ + .table_name = inner_scan->table_name, .filter = std::move(filter->filter) + }; } } return std::nullopt; diff --git a/src/silo/query_engine/saneql/ast_to_query.cpp b/src/silo/query_engine/saneql/ast_to_query.cpp index 0099c5afd..baeab3928 100644 --- a/src/silo/query_engine/saneql/ast_to_query.cpp +++ b/src/silo/query_engine/saneql/ast_to_query.cpp @@ -637,9 +637,10 @@ operators::QueryNodePtr handleProject( const ChildConverter& convert_child ) { const auto& field_argument = args.at("fields"); - const std::vector field_names = holds_alternative(field_argument.value) - ? std::vector{extractIdentifierName(field_argument)} - : extractSetOfIdentifiers(field_argument); + const std::vector field_names = + holds_alternative(field_argument.value) + ? std::vector{extractIdentifierName(field_argument)} + : extractSetOfIdentifiers(field_argument); auto child = convert_child(args.at("input"), tables); auto child_schema = child->getOutputSchema(); std::vector fields; From 026370f57462d0eace25ddadd9352f3315ea1e8c Mon Sep 17 00:00:00 2001 From: Fabian Engelniederhammer Date: Thu, 30 Apr 2026 12:02:21 +0200 Subject: [PATCH 4/4] commit AI review results --- 1238-review/aggregate_node.md | 40 ++++++++++++ 1238-review/ast.md | 56 ++++++++++++++++ 1238-review/ast_to_query.md | 52 +++++++++++++++ 1238-review/database.md | 51 +++++++++++++++ 1238-review/database_pyx.md | 51 +++++++++++++++ 1238-review/function_registry.md | 49 ++++++++++++++ 1238-review/lexer.md | 55 ++++++++++++++++ 1238-review/new_operator_nodes.md | 61 +++++++++++++++++ 1238-review/parser.md | 50 ++++++++++++++ 1238-review/performance.md | 49 ++++++++++++++ 1238-review/phylo_mrca_nodes.md | 28 ++++++++ 1238-review/planner.md | 69 ++++++++++++++++++++ 1238-review/preprocessing_test.md | 59 +++++++++++++++++ 1238-review/query_fixture.md | 25 +++++++ 1238-review/query_handler.md | 63 ++++++++++++++++++ 1238-review/query_test_js.md | 45 +++++++++++++ 1238-review/saneql_examples.md | 105 ++++++++++++++++++++++++++++++ 1238-review/unresolved_nodes.md | 80 +++++++++++++++++++++++ 18 files changed, 988 insertions(+) create mode 100644 1238-review/aggregate_node.md create mode 100644 1238-review/ast.md create mode 100644 1238-review/ast_to_query.md create mode 100644 1238-review/database.md create mode 100644 1238-review/database_pyx.md create mode 100644 1238-review/function_registry.md create mode 100644 1238-review/lexer.md create mode 100644 1238-review/new_operator_nodes.md create mode 100644 1238-review/parser.md create mode 100644 1238-review/performance.md create mode 100644 1238-review/phylo_mrca_nodes.md create mode 100644 1238-review/planner.md create mode 100644 1238-review/preprocessing_test.md create mode 100644 1238-review/query_fixture.md create mode 100644 1238-review/query_handler.md create mode 100644 1238-review/query_test_js.md create mode 100644 1238-review/saneql_examples.md create mode 100644 1238-review/unresolved_nodes.md diff --git a/1238-review/aggregate_node.md b/1238-review/aggregate_node.md new file mode 100644 index 000000000..369a06f29 --- /dev/null +++ b/1238-review/aggregate_node.md @@ -0,0 +1,40 @@ +# Review: aggregate_node.h / aggregate_node.cpp (PR #1238) + +## Overall + +Good generalization from hardcoded COUNT β†’ extensible `AggregateFunction` enum + `AggregateDefinition`. Clean Arrow Acero integration. Follows project patterns (public members, `SILO_UNREACHABLE`, `SILO_ASSERT`). Few issues below. + +## Findings + +### aggregate_node.h + +`L22: 🟑 risk: AggregateDefinition::source_column` not validated for COUNT. COUNT ignores it, but caller can pass `source_column="bogus"` silently. When SUM/AVG added, forgetting validation here = silent wrong results. Add `SILO_ASSERT(!source_column.has_value())` in COUNT branch, or validate at construction. + +`L17: πŸ”΅ nit: enum AggregateFunction` β€” single-value enum fine for extensibility scaffolding, but add brief comment like `// Extended by future PRs (SUM, AVG, etc.)` so readers know it's intentional, not dead code. + +### aggregate_node.cpp + +`L40: 🟑 risk: source_refs` always empty. For COUNT this is correct (`count_all` takes no source), but when SUM/MIN/MAX added, `source_refs` must be populated from `agg.source_column`. Current structure doesn't make this obvious β€” the empty vector is constructed then moved without any branch populating it. Consider adding a comment or an assert: `SILO_ASSERT(source_refs.empty())` in COUNT branch to make the invariant explicit. + +`L32: πŸ”΅ nit: input_schema` param only used in L67 assert. In release builds with asserts compiled out, this becomes an unused parameter. Either `[[maybe_unused]]` or restructure so the schema validation is always active (return error instead of assert). + +`L67: 🟑 risk: SILO_ASSERT for schema validation.` `CanReferenceFieldByName` check is debug-only. If group_by field doesn't exist in input schema, release build silently passes bad field ref to Arrow β†’ runtime crash in Arrow internals with unhelpful error. Should be a proper error return (`arrow::Status::Invalid(...)`) not an assert. + +`L90: πŸ”΅ nit: uninitialized local` `schema::ColumnType type;` β€” technically fine because switch covers all enum values and compiler warns on missing cases, but initializing to a sentinel or using a helper function (like `arrowFunctionName` pattern) would be more defensive. If someone adds enum value and forgets this switch, UB from uninitialized read. + +`L4-6: πŸ”΅ nit: unused includes.` ``, ``, `` β€” these are already transitively included via the header. Not wrong, but the header already includes them. Project style seems to prefer explicit includes so this is fine, just noting. + +### Missing + +`❓ q: No unit tests for AggregateNode.` `aggregate_node.test.cpp` doesn't exist. The generalization from hardcoded COUNT to configurable aggregates is a behavioral change β€” should have at least: (1) test COUNT with no groups, (2) test COUNT with groups, (3) test empty aggregates vector, (4) test `getOutputSchema` returns correct types. Integration coverage via e2e tests may exist but unit tests catch regressions faster. + +## Summary + +| Severity | Count | +|----------|-------| +| πŸ”΄ Critical | 0 | +| 🟑 Risk | 3 | +| πŸ”΅ Nit | 3 | +| ❓ Question | 1 | + +Main concern: L67 assert-only validation of group_by fields against input schema. Release builds skip this β†’ bad field refs hit Arrow internals. Convert to proper error return. diff --git a/1238-review/ast.md b/1238-review/ast.md new file mode 100644 index 000000000..2966def70 --- /dev/null +++ b/1238-review/ast.md @@ -0,0 +1,56 @@ +# PR #1238 β€” ast.h / ast.cpp Review + +## Summary + +Clean AST design. Variant-based node types, consistent extract/check helpers, good error messages with source locations. Few real issues, mostly precision/consistency concerns. + +--- + +## Findings + +### ast.cpp + +`L143-153: 🟑 risk: extractFloatLiteral silently converts int64_tβ†’double. int64_t values >2^53 lose precision. Callers (minProportion etc.) unlikely to hit this, but no guard exists. Add range check or document assumption.` + +`L241-243 vs L143-153: 🟑 risk: isFloatLiteral() returns false for IntLiteral, but extractFloatLiteral() accepts IntLiteral. Semantic mismatch β€” caller doing if(isFloatLiteral(x)) extractFloatLiteral(x) works, but if(!isFloatLiteral(x)) doesn't mean extractFloatLiteral will throw. Consider isNumericLiteral() or rename to extractNumericAsFloat().` + +`L9-28: πŸ”΅ nit: binaryOpToString β€” L28 return "?" after switch covering all enum values. Compiler warns on missing enum case already. Replace with std::unreachable() (C++23) or SILO_ASSERT(false) to catch corruption instead of silently returning "?".` + +`L143-153: πŸ”΅ nit: extractFloatLiteral uses explicit throw while all other extract* fns use CHECK_SILO_QUERY macro. Inconsistent error-handling style. Use CHECK_SILO_QUERY with a combined holds_alternative check, or document why this one is different.` + +`L56-71: πŸ”΅ nit: FunctionCall::toString builds args string via repeated += concatenation. Fine for small arg lists. Consider fmt::join or std::ostringstream if arg counts grow. Not blocking.` + +### ast.h + +`L99-111: ❓ q: ExpressionVariant has 12 types. std::variant visit generates jump table β€” fine for correctness. Any profiling data on variant dispatch overhead in hot query paths? If toString() is debug-only, no concern. If extract* called per-row, might matter.` + +`L122-130: πŸ”΅ nit: extract* functions return by value (string, vector). Fine for move semantics. extractSetLiteral returns const ref β€” good. Consider returning std::string_view from extractIdentifierName/extractStringLiteral if callers don't need ownership (avoids copy).` + +### ast_to_query.cpp (related β€” not in review scope but worth noting) + +`ast_to_query.cpp:L74,184,284,287,376,401,411,737: 🟑 risk: static_cast(extractIntLiteral(...)) β€” int64_tβ†’uint32_t truncation. Negative values or values >UINT32_MAX silently wrap. Should validate range before cast. This is in the caller, not ast.cpp, but the pattern is pervasive and the AST could provide a safe extractUint32Literal().` + +### Testing + +`🟑 risk: No test file found for ast.h/ast.cpp (no ast.test.cpp). Extract functions have non-trivial logic (type coercion, date validation, set extraction). Unit tests needed β€” especially for extractFloatLiteral intβ†’double edge cases, extractDateValue with invalid dates, and the isX/extractX semantic contract.` + +--- + +## Good Practices + +- SourceLocation in every error message β€” excellent for user-facing diagnostics +- `[[nodiscard]]` on all query functions β€” prevents silent discard bugs +- `ExpressionPtr` = unique_ptr β€” clear ownership, no leaks +- extractDateValue validates via stringToDate32 and propagates error string β€” thorough +- extractOptionalDateValue cleanly composes with NullLiteral check β€” nice pattern +- Variant-based AST avoids inheritance hierarchy β€” good modern C++ choice + +--- + +## Verdict + +Solid code. Main actionable items: +1. **isFloatLiteral/extractFloatLiteral semantic mismatch** β€” rename or add isNumericLiteral +2. **int64_tβ†’double precision loss** β€” add guard in extractFloatLiteral +3. **No unit tests** β€” add ast.test.cpp +4. **binaryOpToString unreachable** β€” use std::unreachable() instead of return "?" diff --git a/1238-review/ast_to_query.md b/1238-review/ast_to_query.md new file mode 100644 index 000000000..03bb979f1 --- /dev/null +++ b/1238-review/ast_to_query.md @@ -0,0 +1,52 @@ +# Review: `src/silo/query_engine/saneql/ast_to_query.cpp` + +## Bugs + +L128-131: πŸ”΄ bug: `LESS_THAN` and `LESS_EQUAL` both produce identical `FloatBetween(col, nullopt, value)`. `FloatBetween::compile` uses `Comparator::LESS` (strict `<`), so `x <= 5.0` silently becomes `x < 5.0`. Same issue L133-136: `GREATER_THAN` and `GREATER_EQUAL` both produce `FloatBetween(col, value, nullopt)` which compiles to `>=`, so `x > 5.0` becomes `x >= 5.0`. Fix: either add a `bool inclusive` flag to `FloatBetween`, or use epsilon adjustment, or add separate `FloatLessThan`/`FloatLessEqual` expressions. + +L107: πŸ”΄ bug: `value > 0 ? std::optional(value - 1) : 0` β€” when `value == 0`, the ternary returns bare `0` (an `int`), not `std::optional(0)`. This means `x < 0` on unsigned produces `IntBetween(col, nullopt, 0)` which matches rows where `col == 0` β€” should match nothing. Fix: return `std::nullopt` and wrap in a `False` expression, or return `IntBetween(col, nullopt, std::optional{})` with both bounds empty then intersect with False. + +L112-113: πŸ”΄ bug: `GREATER_THAN` computes `value + 1` β€” unsigned overflow when `value == UINT32_MAX`. Produces `IntBetween(col, 0, nullopt)` which matches everything. Fix: check for `UINT32_MAX` and return `False` expression. + +## Narrowing casts (int64_t β†’ uint32_t without validation) + +L74: 🟑 risk: `static_cast(extractIntLiteral(value_expr))` β€” `extractIntLiteral` returns `int64_t`. Negative values or values > UINT32_MAX silently truncate. Fix: add range check + `IllegalQueryException`. + +L184: 🟑 risk: same pattern in `convertComparisonToFilter`. + +L284,287: 🟑 risk: same pattern in `handleBetween`. + +L376,401,411: 🟑 risk: same pattern in `handleSymbolEquals`, `handleHasMutation`, `handleInsertionContains`. + +L433: 🟑 risk: `static_cast(extractIntLiteral(...))` for `nOf` count β€” `int64_t` β†’ `int` truncation. Negative count also not validated. + +L737: 🟑 risk: same pattern in `handleLimit`. + +**Recommendation:** Extract a helper like `extractUint32Literal(expr, param_name)` that validates range `[0, UINT32_MAX]` and throws `IllegalQueryException` on out-of-range. Use everywhere. `getOptionalUint32` (function_registry.cpp:49) already checks `>= 0` but not `<= UINT32_MAX` β€” fix that too. + +## Design + +L471-472: ❓ q: bare identifier in filter context becomes `BoolEquals(name, true)`. Intentional? If user writes `.filter(some_column)` where `some_column` is a string column, they get a confusing runtime error instead of a clear "expected boolean expression" message. Consider checking column type or at least documenting this behavior. + +L680,701: 🟑 risk: `handleMutations` and `handleInsertions` dispatch `Nucleotide` vs `AminoAcid` via `args.functionName() == "mutations"` / `"insertions"` string comparison. Fragile β€” rename the registered function and this silently breaks. Fix: register separate handlers, or use an enum/tag passed at registration time. + +L723: πŸ”΅ nit: `static_cast(std::chrono::system_clock::now().time_since_epoch().count())` β€” truncates a 64-bit nanosecond count to 32 bits. Works as a seed but loses entropy. Consider `std::random_device` or at least cast from `steady_clock` which is monotonic. + +## Edge cases + +L152: 🟑 risk: `date_val.value() - 1` for `LESS_THAN` on dates β€” no underflow check. If `date_val` is the minimum representable date, this wraps. Same L160 for `date_val.value() + 1` overflow. + +L377: Good β€” `position > 0` check exists for 1-indexed positions. But L411 (`handleInsertionContains`) has no such check β€” insertions may be 0-indexed but this should be documented or validated consistently. + +## Style + +L12-14: πŸ”΅ nit: include order β€” `parser.h` (internal) appears before `` and `` (external). Per AGENTS.md include order: corresponding header β†’ system β†’ external β†’ internal. + +L958-959: πŸ”΅ nit: lines exceed 100-char column limit (clang-format should catch this). + +## Good practices + +- Registry pattern clean and extensible β€” adding new functions is one line each. +- `BoundArguments` abstraction keeps handlers focused on semantics. +- Error messages include source locations β€” good for user-facing diagnostics. +- `CHECK_SILO_QUERY` used consistently for validation. diff --git a/1238-review/database.md b/1238-review/database.md new file mode 100644 index 000000000..ca7f9f09c --- /dev/null +++ b/1238-review/database.md @@ -0,0 +1,51 @@ +# PR #1238 Review: `src/silo/database.cpp` + +## Summary + +Migration from JSON-based query interface to SaneQL. Three main changes: `getFilteredBitmap` uses SaneQL parser, `getPrevalentMutations` builds SaneQL query string via `fmt::format`, `executeQueryAsArrowIpc` simplified to single `query_string` param. Overall direction good β€” removes 11 action/binder includes, simplifies code paths. + +--- + +## Findings + +### `getPrevalentMutations` (L276–L312) + +L286-293: 🟑 risk: **SaneQL injection via `filter` parameter.** `filter` is user-provided string interpolated raw into SaneQL query via `fmt::format`. If `filter` contains `)` or `.` it can break out of `.filter(...)` and chain arbitrary pipeline operations. Example: `filter = "true).project({primaryKey})"` β†’ `table.filter(true).project({primaryKey}).mutations(...)`. The SaneQL parser will happily parse this as a valid pipeline. Unlike the old code path which parsed `filter` as a standalone JSON expression, here it's string-concatenated into a larger query. Fix: parse `filter` into an AST separately (like `getFilteredBitmap` does at L265-267), then pass the filter expression object directly to the planner instead of round-tripping through string concatenation. Alternatively, use `planSaneqlQuery` only for the full pipeline and construct the `MutationsNode` directly as the old code did. + +L286-293: πŸ”΅ nit: `table_name` and `sequence_name` also interpolated raw. These come from internal callers (L320-321, L331) so lower risk, but same injection class. If any caller ever passes user-controlled table/sequence names, same problem applies. + +L303: 🟑 risk: `result_stream >> json_line` reads whitespace-delimited tokens, not lines. Works today because `NdjsonSink` emits compact JSON (no spaces). But if NDJSON output ever includes spaces (e.g. string values with spaces like `"mutation":"A 123 T"`), `>>` will split one JSON object across multiple reads β†’ parse failures. Fix: use `std::getline(result_stream, json_line)` instead. + +L306: πŸ”΅ nit: **Type mismatch.** `count` field is `ColumnType::INT32` in `MutationsNode` (mutations_node.h:84), serialized via `Int32Array`. Parsed here as `uint64_t`. Works for positive values but semantically wrong. Should be `int32_t` or `uint32_t` to match Arrow schema. Pre-existing issue (old code used `SymbolMutations::COUNT_FIELD_NAME` but same `uint64_t` type), so not a regression β€” but worth fixing while touching this code. + +L305,307: πŸ”΅ nit: Hardcoded `"count"` and `"mutation"` strings replace `SymbolMutations::COUNT_FIELD_NAME` / `MUTATION_FIELD_NAME` constants. Loses compile-time coupling β€” if field names change in `MutationsNode`, these will silently break at runtime. Consider using `operators::MutationsNode::COUNT_FIELD_NAME` and `MUTATION_FIELD_NAME` constants (they're still defined in mutations_node.h:21,28). + +L283-284: βœ… Good: `constexpr std::string_view` with `if constexpr`-style ternary for selecting mutation function name. Clean pattern. + +### `getFilteredBitmap` (L254–L274) + +L265-267: ❓ q: `getFilteredBitmap` parses `filter` as standalone SaneQL expression via `Parser` + `convertToFilter`. This is the safe approach (no injection possible β€” parser validates syntax, `convertToFilter` only accepts filter-context AST nodes). Why doesn't `getPrevalentMutations` use the same pattern? The asymmetry is suspicious. + +L260-262: πŸ”΅ nit: Table-not-found returns empty `Roaring{}` with `SPDLOG_ERROR`. Other methods (L185, L208, L234) throw `std::runtime_error`. Inconsistent error handling. Pre-existing, not introduced by this PR. + +### `executeQueryAsArrowIpc` (L480–L496) + +L480: βœ… Good: Clean simplification. Single `query_string` param, delegates to `planSaneqlQuery`. No injection concern here because caller passes full query β€” no string interpolation. + +L489-493: βœ… Good: Proper Arrow status checking with descriptive error message. + +### Includes (L1–L38) + +L35-36: βœ… Good: Clean swap β€” 11 old action/binder includes replaced by 2 saneql includes. Reduces coupling. + +L30: ❓ q: `filter/expressions/true.h` still included for `printAllData` (L192). Correct, just noting it's not dead. + +--- + +## Verdict + +**Main concern:** SaneQL injection in `getPrevalentMutations` (L286-293). `filter` is user-provided and interpolated raw into query string. Should parse filter separately like `getFilteredBitmap` does, or construct operator tree directly. + +**Secondary:** `>>` vs `getline` for NDJSON parsing (L303) is fragile. Low risk today, time bomb tomorrow. + +Rest of changes clean and well-structured. diff --git a/1238-review/database_pyx.md b/1238-review/database_pyx.md new file mode 100644 index 000000000..7e7133b60 --- /dev/null +++ b/1238-review/database_pyx.md @@ -0,0 +1,51 @@ +# Review: `database.pyx` + `database.pxd` β€” PR #1238 + +## .pxd ↔ C++ Header Signature Match + +βœ… **`executeQueryAsArrowIpc`**: `.pxd` L31 declares `string executeQueryAsArrowIpc(string query_string)` β€” matches C++ `std::string executeQueryAsArrowIpc(const std::string& query_string) const` at `database.h:107`. Single-arg signature correct after `table_name` removal. + +βœ… **`getFilteredBitmap`**: `.pxd` L29 matches C++ L71. Two args: `table_name`, `filter`. + +βœ… **`getPrevalentNucMutations` / `getPrevalentAminoAcidMutations`**: `.pxd` L27-28 match C++ L81-93. Four args each. + +βœ… **Exception handling**: `.pxd` uses `except +handle_silo_exception` for query methods β€” correct pattern. + +## Findings + +### 🟑 risk β€” `database.pyx:L520`: Inconsistent IPC buffer conversion + +`execute_query` passes `ipc_buffer` (C++ `std::string`) directly to `pa.BufferReader(ipc_buffer)`. Cython auto-converts `std::string` β†’ Python `bytes`, so this *works*, but `get_tables` at L76 does explicit `( ipc_buffer.data())[:ipc_buffer.size()]` conversion. Inconsistent pattern. The explicit cast avoids an extra copy in some Cython versions. Pick one pattern, use everywhere. + +### πŸ”΅ nit β€” `database.pyx:L295`: Wrong return type in docstring + +`get_amino_acid_reference_sequence` docstring says `Returns: str β€” The nucleotide reference sequence`. Should say "amino acid reference sequence". Copy-paste from `get_nucleotide_reference_sequence`. + +### πŸ”΅ nit β€” `database.pyx:L315-329,L373-391`: Docstrings don't mention SaneQL + +`get_prevalent_nucleotide_mutations` and `get_prevalent_amino_acid_mutations` docstrings say `filter_expression : str, optional β€” Filter expression to apply (default: "")` but don't specify this is now SaneQL syntax. Compare with `get_filtered_bitmap` L440 which correctly says "SaneQL filter expression". Should be consistent across all filter-accepting methods. + +### πŸ”΅ nit β€” `database.pyx:L347,L405`: Comment says "True filter" β€” ambiguous + +Comments say `# Default to True filter (returns all rows) if no filter specified`. After SaneQL migration, worth clarifying this is SaneQL `true` literal, not JSON `{"type": "True"}`. Same comment at L456. + +### βœ… Good β€” Default filter `'true'` is valid SaneQL + +`'true'` is valid SaneQL boolean literal. Default behavior preserved across JSONβ†’SaneQL transition. No breaking change for callers using default. + +### βœ… Good β€” `execute_query` docstring updated + +L493-494 correctly documents SaneQL syntax with examples like `'sequences.filter(true)'`. This is the right pattern. + +### ❓ q β€” `database.pyx:L431`: `get_filtered_bitmap` takes `table_name` separately + +`get_filtered_bitmap` takes `table_name` as separate param + SaneQL `filter_expression`. But `execute_query` takes full SaneQL query where table name is embedded (e.g. `"sequences.filter(true)"`). Is this intentional asymmetry? `getFilteredBitmap` C++ signature confirms it takes separate `table_name` + `filter` β€” so this is correct, but the two APIs have different mental models. Worth a note in docstring that `filter_expression` here is just the filter part, not a full SaneQL query. + +### βœ… Good β€” Breaking change handling + +`executeQueryAsArrowIpc` signature change (removed `table_name`) is correctly reflected. Python `execute_query` now takes single `query_string` with table name embedded in SaneQL. This is a breaking change for Python API users who were passing table_name separately β€” but since this is a `!` (breaking) PR, that's expected. + +## Summary + +**No bugs found.** Signatures match. Default `'true'` valid. Two nits on docstrings (copy-paste error, missing SaneQL mention). One minor inconsistency in IPC buffer handling pattern. Code is clean and well-structured. + +Verdict: **Ship it** (after docstring fixes). diff --git a/1238-review/function_registry.md b/1238-review/function_registry.md new file mode 100644 index 000000000..17353648a --- /dev/null +++ b/1238-review/function_registry.md @@ -0,0 +1,49 @@ +# Review: function_registry.h / function_registry.cpp + +**Files:** `src/silo/query_engine/saneql/function_registry.{h,cpp}` +**PR:** #1238 β€” SaneQL function/filter registries + argument binding + +--- + +## Findings + +### function_registry.cpp + +`L49-55: 🟑 risk: getOptionalUint32 checks value >= 0 but not <= UINT32_MAX. int64_t up to 2^63 silently truncated to uint32_t. Add upper-bound check: value <= std::numeric_limits::max().` + +`L53: πŸ”΅ nit: error message "If the action contains an {}" is copy-pasted from old JSON API style. Doesn't match SaneQL context. Use "{}(): '{}' must be a non-negative integer" with function_name_ and name.` + +`L60-122: bindArguments β€” overall logic correct. Positional-then-named matching, skip non-positional params, detect duplicates, check required. Clean.` + +`L95-108: 🟑 risk: duplicate named args from parser not caught. Parser's parseArgList() doesn't reject f(x := 1, x := 2). bindArguments checks bound.contains() but only against positional bindings β€” second named arg with same name silently overwrites first via bound[named_arg.name]. Should CHECK_SILO_QUERY(!bound.contains(named_arg.name)) before insert, with error message covering both positional-duplicate AND named-duplicate cases. Current message says "already bound positionally" which is wrong when both are named.` + +`L102-106: πŸ”΅ nit: error message says "already bound positionally" but could also be duplicate named arg. Change to "already bound" or detect which case.` + +### function_registry.h + +`L30-51: BoundArguments β€” good API. at() for required, get() for optional, has() for existence. Clear contract.` + +`L46: πŸ”΅ nit: getOptionalUint32 exists but no getRequiredUint32. Callers in ast_to_query.cpp do raw static_cast(extractIntLiteral(args.at(...))) without range checks (L74, L184, L284, L287, L376, L401, L411, L433, L737). Consider adding getRequiredUint32() to centralize the range validation. Not blocking but would eliminate ~10 unsafe casts.` + +`L87,112: instance() β€” static local in .cpp, C++11 guarantees thread-safe init. Construction calls registerFunction() which only touches own entries_ map. Safe. Good.` + +`L74-91, L99-116: πŸ”΅ nit: FunctionRegistry and FilterFunctionRegistry are structurally identical except handler type. Could be a template class GenericRegistry. Not blocking β€” two classes is fine for now, but worth noting if more registry types appear.` + +`L16: using Tables = std::map<...> β€” declared in header but only used by FunctionHandler typedef. Fine, but couples header to storage/table.h. Minor.` + +### Cross-file (ast_to_query.cpp callers) + +`ast_to_query.cpp:L74,L284,L287,L376,L401,L411: 🟑 risk: same truncation bug as getOptionalUint32 β€” raw static_cast(extractIntLiteral(...)) with no upper-bound check. Negative values also unchecked in some (L74, L284, L287). Systematic issue. getOptionalUint32 at least checks >= 0 but misses upper bound; these callers check neither.` + +`ast_to_query.cpp:L433: static_cast(extractIntLiteral(...)) for nOf count β€” same truncation risk for int (though practically unlikely with small counts).` + +--- + +## Summary + +| Severity | Count | Description | +|----------|-------|-------------| +| 🟑 risk | 3 | uint32 truncation (getOptionalUint32 + callers), duplicate named arg silent overwrite | +| πŸ”΅ nit | 4 | error message style, missing getRequiredUint32, registry dedup, error text accuracy | + +**Overall:** Solid design. bindArguments logic correct and well-structured. Main concern is int64β†’uint32 truncation without upper-bound check β€” values > 4B silently wrap. Duplicate named arg edge case unlikely from normal usage but parser doesn't prevent it. No blockers, but the truncation risk should be fixed. diff --git a/1238-review/lexer.md b/1238-review/lexer.md new file mode 100644 index 000000000..478a646ed --- /dev/null +++ b/1238-review/lexer.md @@ -0,0 +1,55 @@ +# Lexer Review β€” PR #1238 + +**Files:** `lexer.cpp` (321L), `lexer.h` (38L), `lexer.test.cpp` (264L) + +## Summary + +Solid lexer. Clean structure, good error messages, proper `unsigned char` casts for `isdigit`/`isalpha`. Tests cover happy paths well. Few issues below. + +--- + +## Findings + +### lexer.cpp + +`L178-179`: πŸ”΅ nit: `peek() == '.'` in while-condition is dead β€” L181-183 immediately `break`s on dot. Remove `|| peek() == '.'` from condition, it's misleading. Reader thinks dots get consumed into identifiers. + +`L34`: 🟑 risk: `advance()` has no guard against `isAtEnd()`. If called when `position >= input.size()`, `input[position]` is UB. Currently all callers check before calling, but one missed check = crash. Add `SILO_ASSERT(!isAtEnd())` or a bounds check. + +`L226-228`: πŸ”΄ bug: `-42` after any token becomes negative number, not minus + number. Input `x -42` β†’ `[IDENTIFIER("x"), INT_LITERAL(-42)]` instead of `[IDENTIFIER("x"), ???, INT_LITERAL(42)]`. No MINUS token exists, so parser can't distinguish `a - b` from `a` followed by `-b`. This is fine **only if** subtraction is not in the grammar. If subtraction ever gets added, this will silently misparse. Add a comment documenting this design decision, or add a MINUS token now. + +`L147`: πŸ”΅ nit: `1.` (trailing dot, no digit after) parses as `INT_LITERAL(1)` then `DOT`. This is correct for method-call syntax (`1.toString()`), but worth a test to lock in the behavior. + +`L305`: πŸ”΅ nit: `fmt::format` used but `` not directly included β€” relies on transitive include through `parse_exception.h`. Add explicit include per project style (include what you use). + +`L173`: 🟑 risk: `throwsOnUnterminatedString` test creates unused `const Lexer lexer` at L174 before the lambda. Dead code in test β€” harmless but confusing. + +`L93-95`: ❓ q: Unknown escape sequences like `\x` produce literal `\x` (backslash preserved). Is this intentional? SQL standard doesn't have C-style escapes at all. Consider throwing on unknown escapes to catch typos early, or document the pass-through behavior. + +`L50-53`: πŸ”΅ nit: Line comment (`--`) consumed inside `skipWhitespace`. Works, but comment at end of input (no trailing newline) silently works because `isAtEnd()` terminates the inner while. Good β€” just noting it's correct. + +### lexer.h + +`L14`: πŸ”΅ nit: `SourceLocation current_location;` β€” default-initialized via aggregate default `{1,1}`. Correct but could add `{}` or `{1,1}` for explicitness since `position` has `= 0`. + +### lexer.test.cpp β€” Missing test coverage + +- No test for negative numbers (`-42`, `-3.14`) +- No test for `1.` (trailing dot) behavior +- No test for bare `-` (should it throw? currently throws "Unexpected character") +- No test for empty quoted identifier `""` (produces empty string IDENTIFIER β€” intentional?) +- No test for `0` or `INT64_MIN`/`INT64_MAX` overflow +- No test for float overflow/underflow +- No test for comment at end of input with no trailing newline (`"a -- comment"`) +- No test for `\r\n` line endings (column tracking) +- No test for `!=` (should throw β€” documents that `<>` is the only not-equals) + +--- + +## Good stuff + +- `static_cast` on all `isdigit`/`isalpha` calls β€” prevents UB with signed char. πŸ‘ +- `readQuotedIdentifier` correctly handles SQL `""` escaping convention. πŸ‘ +- Error messages include source location β€” great for user-facing errors. πŸ‘ +- `[[nodiscard]]` on all query methods in header. πŸ‘ +- Test for full method-call chain (`tokenizesMethodCallChain`) is thorough. πŸ‘ diff --git a/1238-review/new_operator_nodes.md b/1238-review/new_operator_nodes.md new file mode 100644 index 000000000..49457889b --- /dev/null +++ b/1238-review/new_operator_nodes.md @@ -0,0 +1,61 @@ +# PR #1238 β€” New Operator Nodes Review + +## Summary + +Three new logical operator nodes: `FilterNode`, `ProjectNode`, `ScanNode`. Intermediate representations between AST conversion and pushdown. Collapsed into `TableScanNode` during pushdown via `pushdownScanFilterProject()`. + +Overall: clean, well-structured, follows existing patterns. Few issues below. + +--- + +## Findings + +### ProjectNode + +`project_node.h:L16`: 🟑 risk: Comment says "Must be eliminated during pushdown" but `ProjectNode` has a real `toQueryPlan()` impl and CAN survive pushdown (planner.cpp:L424-426 recurses into it without collapsing when child isn't Scan/Filter). Comment is misleading. Either remove "Must be eliminated" or say "Collapsed into TableScanNode when possible; otherwise executes as Arrow project." + +`project_node.cpp:L27-29`: πŸ”΅ nit: `field.name` used for both expression and output name. Correct for simple column refs, but if `ColumnIdentifier` ever gains qualified names (table.column), `field_ref(field.name)` may break. Fine for now, just noting. + +`project_node.cpp:L17-38`: ❓ q: No guard for empty `fields`. Arrow `ProjectNodeOptions` with zero expressions β€” is that valid? If `fields` is empty, this creates a zero-column projection. Probably unreachable from parser, but a defensive check or assert would be cheap insurance. + +### FilterNode + +`filter_node.h:L17`: 🟑 risk: Same "Must be eliminated during pushdown" claim, but planner.cpp:L428-431 recurses into `FilterNode` without collapsing when child isn't Scan/Project. If a `FilterNode` survives with e.g. an `AggregateNode` child, `toQueryPlan()` throws `std::runtime_error` at query time. Currently unreachable from parser (filter always wraps scan), but the planner structure allows it. Two options: (a) implement a real `toQueryPlan` like `ProjectNode` does (delegate to child, apply Arrow filter), or (b) add a `SILO_ASSERT` in pushdown that no FilterNode/ScanNode survives, so the invariant is enforced rather than silently assumed. + +`filter_node.h:L20-21`: πŸ”΅ nit: Public data members. Consistent with other nodes (`UnresolvedMutationsNode`, `TableScanNode`), so follows project convention. Just noting β€” if these were new patterns, would suggest accessor methods. + +`filter_node.cpp:L19`: πŸ”΅ nit: Throws `std::runtime_error`. Other "must be eliminated" nodes (`UnresolvedMutationsNode`) also throw `std::runtime_error`. Consistent. But `ScanNode` uses `fmt::format` in its throw while `FilterNode` uses a plain string literal β€” minor inconsistency. Consider adding the node type or filter info to the error message for debuggability, like ScanNode does with table name. + +### ScanNode + +`scan_node.h:L5`: πŸ”΅ nit: `#include ` β€” `string` not directly used in header. `schema::TableName` and `schema::ColumnIdentifier` come from `database_schema.h`. Remove unless needed for transitive reasons. + +`scan_node.cpp:L24-27`: βœ… Good: includes table name in error message via `fmt::format`. Better debuggability than FilterNode's plain string. + +`scan_node.h:L21`: ❓ q: `output_schema` stores full column list at construction. This is a snapshot of the table schema at AST-conversion time. If table schema could change between parse and execute (hot reload?), this could go stale. Probably not an issue in current architecture, but worth confirming. + +### Cross-cutting + +**No tests**: No `*.test.cpp` for any of these three nodes. `FilterNode` and `ScanNode` throw in `toQueryPlan` so unit-testing them directly is limited, but `ProjectNode` has real logic worth testing. At minimum: +- `ProjectNode::toQueryPlan` with a mock/stub child +- `getOutputSchema()` returns correct fields for all three +- `FilterNode::toQueryPlan` throws as expected + +**Memory ownership**: βœ… Good. `unique_ptr` for child nodes and filter expressions, moved in constructors. No raw owning pointers. Consistent with codebase. + +**Include completeness**: `filter_node.h` includes ``, ``, ``, `arrow/result.h`, expression.h, query_node.h, database_schema.h, table.h. The `` and `table.h` are needed for `toQueryPlan` signature inherited from base. `expression.h` needed for `filter` member. All correct. Same analysis holds for other headers. + +**Style**: βœ… 3-space indent, `#pragma once`, Chromium braces, `camelBack` methods, `snake_case` members. All correct per AGENTS.md. + +--- + +## Verdict + +Solid implementation. Main concern: `FilterNode` comment/behavior mismatch with planner (latent crash path). `ProjectNode` comment inaccurate. Missing tests for `ProjectNode::toQueryPlan` logic. + +| Severity | Count | +|----------|-------| +| πŸ”΄ Critical | 0 | +| 🟑 Risk | 2 | +| πŸ”΅ Nit | 4 | +| ❓ Question | 2 | diff --git a/1238-review/parser.md b/1238-review/parser.md new file mode 100644 index 000000000..acb67d934 --- /dev/null +++ b/1238-review/parser.md @@ -0,0 +1,50 @@ +# PR #1238 β€” Parser Review (`parser.cpp` / `parser.h`) + +Overall: clean recursive descent parser, well-structured, good test coverage. Findings below. + +--- + +## Bugs / Risks + +`parser.cpp:L243-L250`: 🟑 risk: `parseSetOrRecordExpression` disambiguation is fragile. When first token is IDENTIFIER, it calls `parseExpression()` which can consume far beyond the identifier (e.g. `{a.foo() := 1}` β€” `parseExpression` parses `a.foo()` which is a FunctionCall, not an Identifier, so `holds_alternative` is false β†’ falls through to SetLiteral path β†’ `:=` becomes trailing garbage β†’ error). This is correct behavior for rejecting bad input, but the error message will be confusing ("Expected RightBrace but got ColonEquals" or similar). Consider: if token after IDENTIFIER is `:=`, peek directly at the two tokens instead of parsing a full expression first. Or: add a targeted error message when `:=` follows a non-identifier expression inside `{}`. + +`parser.cpp:L348-L357`: 🟑 risk: same pattern in `parseArgList`. `parseExpression()` called speculatively, then checked for `Identifier` variant. If user writes `f(a.b := 1)`, `a.b` parses as FunctionCall, `:=` not consumed β†’ becomes positional arg `a.b` followed by parse error on `:=`. Error message won't mention "named argument syntax requires simple identifier". Add targeted diagnostic. + +`parser.cpp:L160-L161`: πŸ”΅ nit: receiver's `location` set to `method_name.location` not receiver expression's own location. For `x.foo()`, the first positional arg (the receiver `x`) gets `foo`'s location. Should use `expr->location` before it's moved. Same issue L184-L185. + +## Design + +`parser.cpp:L108-L146`: ❓ q: comparison is `if` not `while` β€” intentionally disallows chaining like `a < b < c`. This is correct for SaneQL semantics (no Python-style chained comparisons), but worth a comment explaining the deliberate choice since every other binary level uses `while`. + +`parser.cpp:L115-L137`: πŸ”΅ nit: switch-on-token-type duplicates the check already done in the `if` guard (L111-L113). Could use a helper `std::optional tryParseComparisonOp()` that returns the op and advances, or returns nullopt. Eliminates the `default: SILO_UNREACHABLE()` branch. + +`parser.cpp:L260-L276`: πŸ”΅ nit: SetLiteral parsing for non-identifier-first-element (L269-L275) is identical to the fallthrough path (L260-L266). Could merge: after the record-literal check fails, fall through to a single set-literal loop regardless of whether first element was identifier-started. + +## Tests + +`parser.test.cpp`: Good coverage of happy paths. Missing: +- 🟑 No test for `{a.foo() := 1}` or `{42 := 1}` β€” malformed record literal error paths. +- 🟑 No test for positional-after-named: `f(a:=1, 2)` should throw. +- 🟑 No test for `::` type cast chaining: `x::int::string`. +- 🟑 No test for property access without parens: `x.foo` (no `()`) β€” should produce `foo(x)`. +- πŸ”΅ No test for deeply nested NOT: `!!!x`. +- πŸ”΅ No test for empty arg list edge: `f()`. + +## Style + +`parser.h:L25`: πŸ”΅ nit: `expect()` mutates state (advances token) but lacks `[[nodiscard]]` β€” inconsistent with other methods. Return value (the consumed token) is often used, so `[[nodiscard]]` appropriate. + +`parser.h:L27`: πŸ”΅ nit: `match()` also returns useful bool but no `[[nodiscard]]`. Less critical since callers sometimes ignore return. + +`parser.cpp:L344`: πŸ”΅ nit: `NOLINTNEXTLINE` on lambda `parse_one` is redundant β€” the one on L339 already covers `parseArgList`. Verify clang-tidy actually flags the lambda separately; if not, remove. + +## Summary + +| Severity | Count | +|----------|-------| +| πŸ”΄ bug | 0 | +| 🟑 risk | 4 | +| πŸ”΅ nit | 6 | +| ❓ question | 1 | + +Parser is solid. Main risks: poor error messages when disambiguation fails in `parseSetOrRecordExpression` and `parseArgList`. Location tracking on desugared receiver args is slightly wrong. Test gaps around error paths and edge cases. diff --git a/1238-review/performance.md b/1238-review/performance.md new file mode 100644 index 000000000..69f0fbaff --- /dev/null +++ b/1238-review/performance.md @@ -0,0 +1,49 @@ +# PR #1238 β€” Performance Benchmark Files Review + +## many_short_read_filters.cpp + +`L286: πŸ”΅ nit:` Unused variable `input_data_stream`. `NdjsonLineReader` constructed but never used β€” `appendData` called directly on `input_buffer` at L287. Remove dead code. + +`L114-136: 🟑 risk:` `current_generation` and `next_generation` are `vector` pointing into `all_generated_sequences` (a `vector`). When `push_back` at L125 triggers reallocation, moved `string` objects preserve heap pointers for non-SSO strings β€” so this works for 200-char reads. But it's **fragile**: if `DEFAULT_READ_LENGTH` is ever reduced below SSO threshold (~22 chars), all `string_view`s silently dangle β†’ UB. Pre-existing, not introduced by this PR, but worth a `reserve()` or switching to `vector` indices. + +`L369-376: ❓ q:` `while(true)` infinite loop β€” intentional for profiling (attach perf/vtune and Ctrl-C)? If so, a comment explaining purpose would help. Otherwise looks like missing termination condition. + +`L311-323: βœ…` `{{count:=count()}}` in `fmt::format` correctly escapes to `{count:=count()}` β€” matches SaneQL `groupBy` syntax. Verified against parser tests. + +`L311-323: βœ…` SaneQL query syntax correct: `nucleotideEquals(position:=N, symbol:='X')` matches registered function signature `{named("position"), named("symbol"), named("sequenceName", false)}`. + +`L313-320: πŸ”΅ nit:` Duplicate `samplingDate.between(...)` clause at L314 and L320 β€” same filter applied twice in AND. Functionally harmless (optimizer likely deduplicates) but reads like copy-paste artifact. Pre-existing from JSON version. + +`L37: πŸ”΅ nit:` `std::filesystem::current_path().string()` β†’ unnecessary round-trip through string. `std::filesystem::path` can be assigned directly. Pre-existing. + +## mutation_benchmark.cpp + +`L105: βœ…` `"default.mutations(minProportion:=0.05, sequenceNames:={main})"` β€” correct SaneQL. `{main}` is a set literal, matches parser. `minProportion` is named arg matching `ast_to_query.cpp` L888. + +`L118: βœ…` `"default.filter(!(key = '3')).mutations(minProportion:=0.05, sequenceNames:={main})"` β€” negation syntax `!(expr)` correct. Clean migration from `Negation(StringEquals(...))`. + +`L37: 🟑 risk:` `current_id` is a file-scope mutable global in anonymous namespace. Not thread-safe, and state persists across calls. Fine for single-threaded benchmark, but fragile if someone adds parallelism. Pre-existing. + +`L143: πŸ”΅ nit:` Line exceeds 100-char column limit (project style). Pre-existing. + +## many_string_equals.cpp + +`L131-150: ❓ q:` This file constructs operator trees directly (`ScanNode` β†’ `FilterNode` β†’ `AggregateNode`) via `Planner::planQuery()` instead of using `Planner::planSaneqlQuery()`. Looking at the diff, this is **intentional** β€” the benchmark specifically tests `StringEquals` vs `StringInSet` filter expression performance at the operator level, bypassing the parser. The `Expression` objects are constructed programmatically. This is valid and correct β€” `planQuery()` still exists and accepts operator trees. + +`L146: βœ…` `Planner::planQuery(std::move(root), ...)` β€” correct API usage per `planner.h` L19-24. + +`L236: πŸ”΅ nit:` Line exceeds 100-char column limit. Same for L293. + +## Summary + +Migration looks correct. SaneQL syntax verified against parser tests and `ast_to_query.cpp` function registrations. Key findings: + +| Severity | Count | Notes | +|----------|-------|-------| +| πŸ”΄ Critical | 0 | β€” | +| 🟑 Risk | 2 | string_view fragility (pre-existing), global mutable state (pre-existing) | +| πŸ”΅ Nit | 4 | dead code, line length, duplicate filter, unnecessary conversion | +| ❓ Question | 1 | `while(true)` loop intent | +| βœ… Good | 4 | SaneQL syntax correct, operator tree approach intentional | + +No blockers for merge from these files. diff --git a/1238-review/phylo_mrca_nodes.md b/1238-review/phylo_mrca_nodes.md new file mode 100644 index 000000000..2de7e23b3 --- /dev/null +++ b/1238-review/phylo_mrca_nodes.md @@ -0,0 +1,28 @@ +# PR #1238 Review: phylo_subtree_node / most_recent_common_ancestor_node + +## Critical + +`phylo_subtree_node.cpp:L103-109` / `most_recent_common_ancestor_node.cpp:L101-107`: πŸ”΄ bug: `resolved_table` null-deref if constructed with neither `table` nor `table_name`. Both constructors guarantee one-or-the-other, but nothing prevents default-constructing `table` (nullptr) + `table_name` (nullopt) via aggregate init or future refactor. The `if (!resolved_table && table_name.has_value())` guard silently falls through β†’ `*resolved_table` on L109/L107 = UB. Add `SILO_ASSERT(resolved_table)` or `CHECK_SILO_QUERY(resolved_table, "no table resolved")` after the if-block. + +## Major + +`phylo_subtree_node.cpp:L26-56` / `most_recent_common_ancestor_node.cpp:L26-56`: 🟑 risk: `NodeValuesResult` struct + `getNodeValuesFromTable()` are **identical** copy-paste across both files (same anonymous namespace, same 30 lines). Extract to shared header/cpp (e.g., `compute_filter.h` or new `node_values_util.h`). Duplication = divergence risk when one gets fixed and other doesn't. + +## Minor + +`phylo_subtree_node.h:L7` / `most_recent_common_ancestor_node.h:L7`: πŸ”΅ nit: `#include ` unused in both headers. Neither file uses `std::string_view`. Remove. + +`phylo_subtree_node.h:L22-27` / `most_recent_common_ancestor_node.h:L22-26`: πŸ”΅ nit: All members public. `table`, `filter`, `column_name`, `table_name` are construction-time invariants never mutated after ctor. Consider making them private or at least documenting why public (serialization? test access?). + +`phylo_subtree_node.cpp:L143` / `most_recent_common_ancestor_node.cpp:L142`: 🟑 risk: `&phylo_tree` captured by reference in lambda. Safe only because `resolved_table` (shared_ptr, captured by value as `table_handle`) keeps the schema alive. Correct but fragile β€” if someone removes `table_handle` capture, `phylo_tree` dangles. Add comment explaining lifetime dependency. + +## Questions + +`phylo_subtree_node.cpp:L75-86` / `most_recent_common_ancestor_node.cpp:L73-82`: ❓ q: `TableName` constructor exists but is never called anywhere in codebase (only `shared_ptr` ctor used from planner.cpp:L200/L218). Dead code? If planned for future use, add a test exercising this path. If not, remove to reduce surface area. + +## Summary + +- 1 critical null-deref risk in table resolution +- 30 lines of exact duplication across files +- TableName ctor path untested/unused +- Minor: unused include, public members, fragile ref capture diff --git a/1238-review/planner.md b/1238-review/planner.md new file mode 100644 index 000000000..ae3de3a54 --- /dev/null +++ b/1238-review/planner.md @@ -0,0 +1,69 @@ +# Review: `planner.cpp` / `planner.h` (PR #1238) + +Overall: solid design. Two-phase planner (pushdown β†’ optimize) clean and readable. Findings below. + +--- + +## Bugs / Risks + +`planner.cpp:L286-287`: 🟑 risk: double `dynamic_cast` β€” first in condition check (L286), second to get pointer (L287). Wasteful + fragile if condition changes. Assign once. +```cpp +auto* table_scan_child = dynamic_cast(node->child.get()); +if (node->group_by_fields.empty() && node->aggregates.size() == 1 && + node->aggregates[0].function == operators::AggregateFunction::COUNT && + table_scan_child != nullptr) { +``` + +`planner.cpp:L438-449`: 🟑 risk: same double-cast pattern in `optimize()`. Each branch does `dynamic_cast` to check null, then immediately casts again. Combine into single `if (auto* x = dynamic_cast<...>(...))`. +```cpp +if (auto* aggregate = dynamic_cast(node.get())) { + return optimizeInstance(aggregate); +} +// same for OrderByNode, FetchNode +``` + +`planner.cpp:L361-364`: 🟑 risk: dangling pointer comparison. `before` points into old `node`. After `tryReorderProject(std::move(node))`, `node` is reassigned. Comparing `node.get() != before` works only because `tryReorderProject` returns original `node` unchanged (same unique_ptr) when no reorder happens. If `tryReorderProject` ever wraps/copies the node without reordering, this breaks silently. Fragile contract. Consider returning a `bool` or `std::optional` from `tryReorderProject` instead. + +`planner.cpp:L79-96`: ❓ q: `extractScanInfo` only handles `ScanNode` and `FilterNode(ScanNode)` β€” not `FilterNode(FilterNode(ScanNode))` or deeper chains. Is this guaranteed by AST construction? If SaneQL parser can produce nested filters, mutations/insertions pushdown silently fails with "must be applied to a table scan". Worth a comment documenting this invariant. + +`planner.cpp:L370-383`: 🟑 risk: `pushdown` handles `FilterNode(ScanNode)` and `FilterNode(ProjectNode)` for collapse, but not `FilterNode(FilterNode(ScanNode))`. If two WHERE clauses get stacked, neither collapses into `TableScanNode`. `pushdownScanFilterProject` loop (L239-253) only peels one of each type. Multiple filters β†’ falls through, returns original node β†’ `FilterNode.toQueryPlan()` throws at runtime. Either merge stacked filters during pushdown or document this as impossible from parser. + +--- + +## Major + +`planner.cpp:L437-451`: 🟑 `optimize()` doesn't recurse into `ProjectNode`, `FilterNode`, `ZstdDecompressNode`, `TableScanNode`, or any unresolved nodes. After pushdown, most should be gone, but `ZstdDecompressNode` wraps `TableScanNode` (L59) and `optimize()` won't look inside it. If `Aggregate(ZstdDecompress(TableScan))` appears, COUNT(*) optimization misses. Probably fine today (COUNT doesn't need decompression), but fragile β€” add a comment or a catch-all recurse. + +`planner.cpp:L462-468`: 🟑 `planQuery` catches Arrow errors via `result.ok()` but throws `std::runtime_error`. `planSaneqlQuery` (L470-478) doesn't catch `IllegalQueryException` from pushdown. Caller gets two different exception types for query errors. Inconsistent error contract. Consider unifying or documenting. + +--- + +## Minor + +`planner.cpp:L265`: πŸ”΅ nit: `std::unordered_set seen_names` β€” dedup by name only. If two `ColumnIdentifier`s have same name but different types (shouldn't happen but defensive), first one wins silently. Fine if schema guarantees uniqueness. + +`planner.cpp:L126`: πŸ”΅ nit: `std::vector fields_to_use` stores views into static data (L129-137) and into `unresolved->fields` (L147). If `unresolved` is moved/destroyed before `fields_to_use` consumed, dangling views. Currently safe because `fields_to_use` consumed immediately at L151, but fragile if refactored. + +`planner.h:L14`: πŸ”΅ nit: `std::map` and `std::shared_ptr` used in header without `#include ` and `#include `. Compiles because `query_node.h` transitively includes them, but violates include-what-you-use. + +`planner.h:L10-31`: πŸ”΅ nit: `Planner` is all static methods, no state. Could be a namespace with free functions instead of a class. Class is fine if you plan to add state later (e.g., optimizer config). + +`planner.cpp:L467`: πŸ”΅ nit: `std::move(result.ValueUnsafe())` β€” `ValueUnsafe()` skips status check. Already checked `result.ok()` above so safe, but `result.MoveValueUnsafe()` is more idiomatic Arrow for this pattern. + +--- + +## Design + +`planner.cpp:L239-253`: The 3-iteration loop for peeling Project/Filter/Scan is clever but non-obvious. Reader must reason about all 6 permutations. The doc comment (L226-228) helps. Consider: if a 4th node type ever needs collapsing, loop bound `3` becomes wrong silently. A `while(true)` with explicit break conditions would be more robust. + +`planner.cpp:L355-433`: `pushdown()` is a long if-else chain of `dynamic_cast`. Classic visitor pattern candidate. Not blocking β€” current approach works and is explicit. But if node types keep growing, consider `std::variant` or virtual `accept()`. + +--- + +## Good + +- `pushdownScanFilterProject` loop handles all Scan/Filter/Project orderings cleanly +- `tryReorderProject` safety check for sort keys is correct and well-documented +- `wrapWithDecompressIfNeeded` is clean separation of concerns +- `CHECK_SILO_QUERY` gives good error messages with context +- Template pushdown for `Nucleotide`/`AminoAcid` avoids duplication diff --git a/1238-review/preprocessing_test.md b/1238-review/preprocessing_test.md new file mode 100644 index 000000000..40a680441 --- /dev/null +++ b/1238-review/preprocessing_test.md @@ -0,0 +1,59 @@ +# Review: `src/silo/preprocessing/preprocessing.test.cpp` (PR #1238) + +## Summary + +Migration from JSON query format to SaneQL is **correct**. All 12 success scenarios and 8 error scenarios reviewed. No bugs found. SaneQL syntax valid, semantic equivalence preserved, expected results match. + +## Detailed Findings + +### SaneQL Query Translations β€” All Correct βœ… + +L165-166: `default.project({accessionVersion, someShortGene, secondSegment, country}).orderBy({accessionVersion})` β€” old `FastaAligned` with `sequenceNames` + `additionalFields` + `orderByFields` correctly mapped to `project` + `orderBy`. Columns match expected result. βœ… + +L231: `default.groupBy({count:=count()},{group}).orderBy({group})` β€” `group` unquoted. Verified: SaneQL lexer (`readIdentifierOrKeyword`) only reserves `true`/`false`/`null`. `group` is plain IDENTIFIER, no quoting needed. βœ… + +L304: `default.groupBy({count:=count()},{"2"}).orderBy({"2"})` β€” numeric column name correctly quoted with `"2"`. Lexer's `readQuotedIdentifier` handles this. βœ… + +L356: `default` β€” bare table reference for empty dataset. Returns all rows (none) with all columns. Expected `[]`. βœ… + +L401: `default` β€” same pattern, unpartitioned variant. βœ… + +L446: `default.groupBy({count:=count()})` β€” simple aggregation, no group-by columns. βœ… + +L490: `default.groupBy({count:=count()})` β€” same, no nucleotide sequences. βœ… + +L529: `default.groupBy({count:=count()})` β€” same, no sequences at all. βœ… + +L654: `default.groupBy({count:=count()})` β€” diverse sequence names scenario. Only tests count, avoids needing to reference exotic names in SaneQL. Smart choice. βœ… + +L692: `default.orderBy({accessionVersion})` β€” old `Details` with `orderByFields`. Returns all metadata (only `accessionVersion` exists). βœ… + +L722: `default.orderBy({accessionVersion})` β€” date column scenario. Expected results include `theDate` column correctly. βœ… + +L783-784: `default.filter(lineage_1.lineage('root_1', includeSublineages:=true)).orderBy({accessionVersion})` β€” method call syntax: `lineage_1` becomes first positional arg (column), `'root_1'` second positional (value), `includeSublineages:=true` named arg. Matches `FilterFunctionRegistry` signature `(column, value, includeSublineages:=)`. βœ… + +### Include Changes βœ… + +L1-17: Removed `action_query.h`, `binder.h`, `exec_node/ndjson_sink.h`. Added `planner.h`, `query_plan.h`, `query_fixture.test.h`. Correct β€” old JSON parsing/binding/sink code replaced by `planSaneqlQuery` + `executeQueryToJsonArray`. + +### Test Fixture (L823) βœ… + +`Planner::planSaneqlQuery(scenario.assertion.query, database->tables, ...)` β€” correct API. Uses `executeQueryToJsonArray` from `query_fixture.test.h` instead of manual NDJSON stream parsing. Cleaner. + +### Error Scenarios (L838-1131) βœ… + +All 8 error scenarios unchanged β€” they test preprocessing failures, not queries. No `.query` field in `Error` struct. Correct. + +### Style βœ… + +L809, L1120: Long `INSTANTIATE_TEST_SUITE_P` lines exceed 100-char limit. Pre-existing, not introduced by this PR. + +## Potential Improvements (Optional/FYI) + +L231: πŸ”΅ nit: Even though `group` works unquoted in this SaneQL dialect, quoting it as `"group"` would be defensive against future keyword additions and signal intent to readers familiar with SQL. Not required β€” current code is correct. + +L654: ❓ q: `DIVERSE_SEQUENCE_NAMES_NDJSON` only tests `count()`. Could a `project` or `Details`-equivalent query exercise the exotic sequence names (quotes, dots, unicode) through the SaneQL parser? Not a regression β€” old JSON test also only did `Aggregated`. But a missed opportunity for coverage of quoted identifiers with special chars. + +## Verdict + +**No bugs. No blocking issues.** Translation is faithful and complete. Code is cleaner post-migration (removed manual NDJSON stream parsing, reuses `executeQueryToJsonArray`). diff --git a/1238-review/query_fixture.md b/1238-review/query_fixture.md new file mode 100644 index 000000000..dae587658 --- /dev/null +++ b/1238-review/query_fixture.md @@ -0,0 +1,25 @@ +# PR #1238 β€” Review: `query_fixture.test.h` / `query_fixture.test.cpp` + +## query_fixture.test.cpp + +`L26: πŸ”΄ bug: std::cout << line_object.dump() left in. Dumps every query result line to stdout on every test run. Remove or gate behind SPDLOG_DEBUG.` + +## query_fixture.test.h + +`L82: 🟑 risk: negateFilter() declared but definition removed from .cpp. Dead declaration β†’ linker error if anyone calls it. Remove declaration.` + +`L69: 🟑 risk: QueryTestScenario.query typed as nlohmann::json but every caller assigns a std::string (plain SaneQL). L120 immediately calls .get(). Change type to std::string β€” removes implicit json wrapping, avoids runtime type mismatch if someone passes a json object by mistake, and drops the nlohmann dependency from the struct.` + +`L4: πŸ”΅ nit: #include unused in header (no std::cout/cin/cerr here). Remove.` + +`L8-9: πŸ”΅ nit: #include and #include unused in this header. Nothing from simdjson or spdlog referenced. Remove β€” reduces transitive include bloat for all test files.` + +`L128: πŸ”΅ nit: catch(const std::exception&) is very broad for error-path tests. If planSaneqlQuery or executeQueryToJsonArray throw an unexpected exception type (e.g. std::bad_alloc), test still passes as long as .what() matches. Consider catching the specific silo exception type(s) and letting unexpected exceptions propagate as test failures.` + +## has_mutation.test.cpp (bonus β€” spotted while tracing callers) + +`L79-80: πŸ”΄ bug: Variable HAS_NUCLEOTIDE_MUTATION_OUT_OF_RANGE_EDGE_LOW has .name = "HAS_NUCLEOTIDE_MUTATION_OUT_OF_RANGE_EDGE_HIGH" β€” names swapped with L87-88 (and vice versa). Tests still pass because name is only used for display, but confusing when debugging failures.` + +## Summary + +Two bugs (debug stdout, swapped test names), one dead declaration, one type mismatch (jsonβ†’string), three unnecessary includes. No architectural concerns β€” fixture design is clean and macro approach is reasonable for parameterized query tests. diff --git a/1238-review/query_handler.md b/1238-review/query_handler.md new file mode 100644 index 000000000..d70bab6ce --- /dev/null +++ b/1238-review/query_handler.md @@ -0,0 +1,63 @@ +# PR #1238 β€” `query_handler.cpp` / `query_handler.h` Review + +## Summary + +Endpoint reads SaneQL body, plans query, streams results as NDJSON or Arrow IPC based on Accept header. Error handling delegates to `ErrorRequestHandler` wrapper. Overall structure solid β€” findings below. + +--- + +## Findings + +### query_handler.cpp + +`L3: πŸ”΅ nit: #include unused. Only used in error_request_handler.cpp. Remove.` + +`L11: πŸ”΅ nit: #include unused. No JSON usage in this file. Remove.` + +`L12: πŸ”΅ nit: #include misplaced β€” system include in external-includes group. Move to group 2 (after ). Per AGENTS.md include order: system β†’ external β†’ internal.` + +`L35: πŸ”΅ nit: DEFAULT_TIMEOUT_TWO_MINUTES is uint64_t but name says "two minutes" β€” magic constant. Consider making configurable via QueryOptions or at least add unit suffix: DEFAULT_TIMEOUT_SECONDS.` + +`L48: 🟑 risk: response.get("X-Request-Id") throws Poco::NotFoundException if header missing. Other handlers (logging_request_handler.cpp:L16) do same pattern, so presumably guaranteed by RequestIdHandler middleware β€” but no defensive check. If middleware chain changes, this crashes. Consider request.has() guard or document the invariant.` + +`L52: 🟑 risk: Poco::StreamCopier::copyToString reads entire body into memory with no size limit. Malicious client can send multi-GB body β†’ OOM. Add Content-Length check or cap read size. Even a 10MB limit would prevent trivial DoS.` + +`L54: 🟑 risk: Full query string logged at INFO level. If queries contain sensitive data (patient IDs, etc.), this leaks to logs. Consider SPDLOG_DEBUG or truncating.` + +`L64-65: 🟑 risk: Accept header parsing via string::find is naive. "application/vnd.apache.arrow.stream" could match inside quality params or comments. RFC 7231 Accept headers can be "application/vnd.apache.arrow.stream;q=0.5, application/x-ndjson;q=1.0" β€” find() would match arrow even though ndjson has higher quality. For now probably fine since clients are controlled, but worth a TODO or comment acknowledging the limitation.` + +`L69,81: 🟑 risk: response.send() commits HTTP 200 status + headers to wire. If executeAndWrite() then throws (L143 in query_plan.cpp throws std::runtime_error on non-IO errors), ErrorRequestHandler catches it but CANNOT change status code β€” headers already sent. Client gets HTTP 200 with truncated/corrupt body. This is inherent to streaming and may be acceptable, but:` +- `For NDJSON: consider writing an error object as final line so clients can detect it.` +- `For Arrow IPC: stream will be truncated β€” Arrow readers should detect incomplete stream.` +- `At minimum: document this behavior so API consumers know to check for complete responses.` + +`L72-74: 🟑 risk: ArrowIpcSink::make failure throws std::runtime_error AFTER response.send() already committed 200. Same problem as above β€” client gets 200 + empty body. Move result check before response.send(), or restructure to validate sink creation first.` + +`L80: ❓ q: Default fallback is NDJSON for any Accept header (including "application/json", "text/html", "*/*"). Is this intentional? Might want to return 406 Not Acceptable for explicitly unsupported types, or at least match "*/*" and "application/x-ndjson" specifically.` + +`L87-91: Minor β€” catch blocks only handle ParseException and IllegalQueryException. Other exceptions (std::runtime_error from L73, Arrow errors) propagate to ErrorRequestHandler which handles them as 500. This is correct given the middleware design. No issue here.` + +### query_handler.h + +`L3-4: πŸ”΅ nit: HTTPServerRequest.h and HTTPServerResponse.h included but only used as reference params in post() declaration. Forward declarations would suffice, reducing header coupling. Though this matches existing codebase style (rest_resource.h does same), so low priority.` + +--- + +## Architecture Notes + +**Error handling chain is sound**: QueryHandler throws BadRequest for user errors β†’ ErrorRequestHandler catches and returns 400. Unexpected exceptions β†’ 500. The `cxxabi.h` usage in error_request_handler.cpp for catch(...) type introspection is clever. + +**Streaming-after-commit is the main design tension**: Once `response.send()` is called, HTTP status is locked. Mid-query failures produce corrupt 200 responses. This is a known tradeoff in streaming APIs. Recommend documenting this in API docs and considering error sentinel values in NDJSON output. + +--- + +## Severity Summary + +| Severity | Count | +|----------|-------| +| πŸ”΄ bug | 0 | +| 🟑 risk | 6 | +| πŸ”΅ nit | 4 | +| ❓ question | 1 | + +**Most impactful**: L52 (no body size limit) and L69/81 (error-after-commit). diff --git a/1238-review/query_test_js.md b/1238-review/query_test_js.md new file mode 100644 index 000000000..5e9b9491e --- /dev/null +++ b/1238-review/query_test_js.md @@ -0,0 +1,45 @@ +# Review: `endToEndTests/test/query.test.js` (PR #1238) + +Overall: solid rewrite. Data-driven test structure good. Few issues below. + +--- + +## Findings + +`L43: 🟑 risk: Number(bigint) silently loses precision for values > 2^53. Genomic counts unlikely to hit this, but no guard. Add assertion or comment documenting safe range assumption.` + +`L24: πŸ”΅ nit: readFilesRecursively filters .json by extension but invalidQueries at L85 uses readdirSync without filter β€” non-.json files in that dir would cause parse crash. Inconsistent.` + +`L85: 🟑 risk: readdirSync(invalidQueriesPath) returns ALL files (no .json filter, no recursive). Stray file (e.g. .DS_Store) β†’ JSON.parse crash with unhelpful error. Filter to .json like readFilesRecursively does.` + +`L125-134: 🟑 risk: invalid queries only tested with NDJSON format. Valid queries test both NDJSON and Arrow IPC. If server returns different error shape for Arrow Accept header, that path is untested. Should loop over formats like valid queries do, or add explicit Arrow IPC error test.` + +`L152-158: 🟑 risk: tests invalid SaneQL returns 400 + correct Content-Type, but does NOT verify response body. Malformed error message or empty body would pass. Add .expect(body => ...) asserting error structure matches {error, message} shape.` + +`L161-167: 🟑 risk: same as L152 β€” empty query test checks status+content-type but not body. Should verify error payload.` + +`L14-30: πŸ”΅ nit: readFilesRecursively only used once (L83). queries/ has one subdir (symbolEquals/). Could simplify with glob or flat list. Not blocking, but adds complexity for minimal gain.` + +`L56-59: πŸ”΅ nit: split(/\n/) on NDJSON β€” if server sends \r\n line endings, empty strings between lines survive the filter but parsed JSON would still work. Fine in practice, just noting.` + +`L65: ❓ q: Content-Type 'text/plain' for SaneQL β€” is this the agreed-upon MIME type? No custom media type like 'application/x-saneql'? If server validates Content-Type strictly, this is correct. If server accepts anything, test doesn't verify rejection of wrong Content-Type.` + +`L97-111: πŸ‘ good: parameterized test over formats with shared test fixtures. Clean pattern, easy to extend.` + +`L115-120: πŸ‘ good: uniqueness check on test case names prevents silent shadowing.` + +--- + +## Summary + +| Severity | Count | +|----------|-------| +| πŸ”΄ bug | 0 | +| 🟑 risk | 4 | +| πŸ”΅ nit | 3 | +| ❓ question | 1 | + +**Key action items:** +1. Add .json filter to `invalidQueries` file reading (L85) β€” crash waiting to happen +2. Invalid query tests should verify error body content, not just status code (L152, L161) +3. Consider testing invalid queries with Arrow IPC Accept header too (L125) diff --git a/1238-review/saneql_examples.md b/1238-review/saneql_examples.md new file mode 100644 index 000000000..c035fbcc6 --- /dev/null +++ b/1238-review/saneql_examples.md @@ -0,0 +1,105 @@ +# Review: `saneql.examples` + +## πŸ”΄ Critical: Wrong file β€” TPC-H queries, not SILO SaneQL + +**Entire file (L1-250) contains TPC-H benchmark queries that cannot parse or execute with the SILO SaneQL parser.** These are standard SQL-style analytical queries (TPC-H Q1–Q22) written in a SaneQL dialect from the original academic SaneQL paper, NOT the SILO implementation. + +None of these 250 lines will parse with the SILO parser. The file is misleading as documentation. + +### Functions used in file but NOT in SILO parser: + +| Function | Lines used | Status | +|----------|-----------|--------| +| `join` | L22-24, L27-30, L39-40, L49, L56-60, L74-78, L88-94, L103-107, L117-119, L126-127, L138, L145, L155, L167, L176-177, L187-188, L193-196, L203, L219, L222-224, L231-235, L248 | ❌ Not registered | +| `aggregate` | L24, L69, L130, L156, L168, L184, L208, L244 | ❌ Not registered | +| `map` | L80, L95, L108, L246 | ❌ Not registered | +| `let` | L19, L114, L125, L135, L152, L160-161, L183, L212-213, L217, L241 | ❌ Not a keyword/construct | +| `case` | L96, L139, L156 | ❌ Not in parser | +| `as` | L77-78, L92-93, L231, L234-235 | ❌ Not registered | +| `extract` | L80, L95 | ❌ Not registered | +| `substr` | L243, L246 | ❌ Not registered | +| `sum` | L6-9, L61, L81, L96, L109, L120, L129-130, L139, L156, L164, L196, L208, L249 | ❌ Only `count` supported | +| `avg` | L10-12, L184, L244 | ❌ Only `count` supported | +| `min` | L24 | ❌ Only `count` supported | +| `max` | L168 | ❌ Only `count` supported | +| `count(distinct:=true)` | L178 | ❌ `count` takes no args | +| `::interval` | L4, L48, L56, L68, L116, L137, L154, L163, L215 | ❌ Only `::date` supported | +| `orderby` with `limit:=` | L32, L42, L121, L197, L237 | ❌ `orderBy` has no `limit` param | +| `.desc()` on fields | L32, L42, L62, L110, L121, L131, L148, L197, L237 | ⚠️ Works differently β€” SILO uses `desc(field)` inside `orderBy({...})` | + +### SILO-specific functions NOT demonstrated anywhere: + +**Pipeline functions (0 of 12 covered):** +- `filter` β€” used but only with TPC-H predicates +- `groupBy` β€” used but with unsupported aggregates (sum/avg) +- `project` β€” used but in TPC-H context +- `mutations` ❌ +- `aminoAcidMutations` ❌ +- `insertions` ❌ +- `aminoAcidInsertions` ❌ +- `randomize` ❌ +- `limit` β€” not as separate function (only as `limit:=` param on orderby) +- `offset` ❌ +- `orderBy` β€” used but with wrong syntax +- `mostRecentCommonAncestor` ❌ +- `phyloSubtree` ❌ + +**Filter functions (0 of 15 covered in SILO context):** +- `between` β€” used but only for TPC-H +- `in` β€” used but only for TPC-H +- `isNull` ❌ +- `isNotNull` ❌ +- `lineage` ❌ +- `phyloDescendantOf` ❌ +- `like` β€” used but only for TPC-H +- `nucleotideEquals` ❌ +- `aminoAcidEquals` ❌ +- `hasMutation` ❌ +- `hasAAMutation` ❌ +- `insertionContains` ❌ +- `aminoAcidInsertionContains` ❌ +- `exact` ❌ +- `maybe` ❌ +- `nOf` ❌ + +### Syntax features NOT demonstrated: + +- `'2021-01-01'::date` β€” used but only `::interval` (unsupported) context +- Set literal `{...}` β€” used but only for TPC-H +- Record literal `{name:=value}` β€” used but only for TPC-H aggregates +- `true`/`false` boolean literals ❌ +- `null` literal ❌ +- `!` (NOT) operator ❌ +- `<>` (not-equals) operator β€” used in TPC-H only + +## Summary + +`L1-250: πŸ”΄ bug: entire file is TPC-H benchmark queries from academic SaneQL paper. None parse with SILO's SaneQL implementation. Zero SILO-specific functions demonstrated. File misleads users about supported syntax.` + +**Recommendation:** Replace with actual SILO genomic query examples. Good examples already exist in `endToEndTests/test/queries/*.json` β€” extract the `"query"` fields from those. Example correct queries from the e2e tests: + +``` +-- Filter + aggregate +default.filter(country = 'Switzerland').groupBy({count:=count()}) + +-- Lineage with sublineages +default.filter(pango_lineage.lineage('B.1.1.7', includeSublineages:=true)).groupBy({count:=count()}) + +-- Date between +default.filter(date.between('2021-01-01'::date, '2021-12-31'::date)).groupBy({count:=count()}) + +-- Mutations +default.filter(false).mutations(minProportion:=0.5) + +-- Insertions +default.insertions().orderBy({insertion}) + +-- Details with limit/offset +default.filter(country = 'Switzerland').orderBy({primary_key}).offset(9).limit(2).project({age, country}) + +-- nOf +default.filter(nOf(2, {nucleotideEquals(position:=241, symbol:='T'), nucleotideEquals(position:=29734, symbol:='T')})).groupBy({count:=count()}) + +-- Phylo +default.filter((primary_key = 'key_11') || (primary_key = 'key_22')).mostRecentCommonAncestor('usherTree').orderBy({mrcaNode}) +``` diff --git a/1238-review/unresolved_nodes.md b/1238-review/unresolved_nodes.md new file mode 100644 index 000000000..1b88764c8 --- /dev/null +++ b/1238-review/unresolved_nodes.md @@ -0,0 +1,80 @@ +# PR #1238 β€” Unresolved Node Headers Review + +Files reviewed: +- `unresolved_phylo_subtree_node.h` +- `unresolved_most_recent_common_ancestor_node.h` +- `unresolved_mutations_node.h` +- `unresolved_insertions_node.h` + +--- + +## Findings + +### 🟑 risk: All 4 files β€” Missing `#include ` + +All four files throw `std::runtime_error` in `toQueryPlan()` but none include ``. +Currently compiles only because some transitive include chain (likely through Arrow or ``) +pulls it in. This is fragile β€” any include reordering or library upgrade can break compilation. + +- `unresolved_phylo_subtree_node.h:36` +- `unresolved_most_recent_common_ancestor_node.h:33` +- `unresolved_mutations_node.h:38` +- `unresolved_insertions_node.h:29` + +**Fix:** Add `#include ` to each file. + +### 🟑 risk: All 4 files β€” `getOutputSchema()` returns empty `{}` + +`getOutputSchema()` returns empty vector in all unresolved nodes. This is called during +AST construction by `handleGroupBy` (ast_to_query.cpp:610) and `handleProject` +(ast_to_query.cpp:645) on child nodes. If a user composes e.g. +`project(mutations(...), fields: ...)` or `groupBy(mutations(...), ...)`, the empty schema +causes `CHECK_SILO_QUERY` to fail with a confusing "field X not present in output schema" +error instead of a clear "this node must be resolved first" message. + +Also called in `wrapWithDecompressIfNeeded` (planner.cpp:51) β€” empty `{}` silently skips +decompression wrapping, which is benign since pushdown replaces the node, but still a +latent correctness concern if call order ever changes. + +**Two options:** +1. **Throw** like `toQueryPlan()` does β€” makes the contract explicit: "don't call this before pushdown." +2. **Document** that empty `{}` is intentional and these nodes must always be leaf/terminal during AST construction (currently true for mutations/insertions/phylo/mrca, but not enforced). + +Option 1 preferred β€” fail loud, fail early. + +### πŸ”΅ nit: All 4 files β€” `std::runtime_error` vs project exception types + +Project has `QueryCompilationException` and `IllegalQueryException` (both derive +`std::runtime_error`). An unresolved node surviving to execution is an internal logic error, +not a user query error. Consider using `SILO_ASSERT` or a dedicated internal error type +instead of bare `std::runtime_error`. This would make it easier to distinguish "bug in +planner" from "bad user query" in error handling. + +### πŸ”΅ nit: All 4 files β€” Public member variables + +All members are public. Consistent with resolved counterparts (`MutationsNode`, +`InsertionsNode`, `PhyloSubtreeNode`, `MostRecentCommonAncestorNode` all have public +members too), so this follows existing project convention. No action needed β€” just noting +for awareness. + +### πŸ”΅ nit: Template vs non-template inconsistency + +`UnresolvedMutationsNode` and `UnresolvedInsertionsNode` are +templates, while `UnresolvedPhyloSubtreeNode` and `UnresolvedMostRecentCommonAncestorNode` +are not. This mirrors the resolved counterparts (`MutationsNode`, `InsertionsNode` +are templates; `PhyloSubtreeNode`, `MostRecentCommonAncestorNode` are not). Consistent +with existing pattern. `SymbolType` is used purely as a type tag for `dynamic_cast` +dispatch in planner.cpp:388-401 β€” valid pattern, no issue. + +--- + +## Summary + +| Severity | Count | Summary | +|----------|-------|---------| +| 🟑 risk | 2 | Missing ``, empty `getOutputSchema()` | +| πŸ”΅ nit | 2 | Exception type choice, public members (both follow convention) | + +**Overall:** Clean, minimal placeholder nodes. Two real risks: missing include (fragile build) +and silent empty schema (confusing error on composition). Both straightforward fixes. +No bugs that would cause runtime crashes in normal usage paths.