-
Notifications
You must be signed in to change notification settings - Fork 82
feat: Improve schema-based search accuracy using log_surgeon::Query
.
#1313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Improve schema-based search accuracy using log_surgeon::Query
.
#1313
Conversation
WalkthroughAdds an interpretation-driven schema/wildcard workflow to GrepCore: normalizes query interpretations, identifies wildcard-encodable positions, and generates logtype strings; removes the ByteLexer-based Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Caller
participant GrepCore
participant Parser as WildcardParser
participant LTD as LogTypeDict
participant VD as VariableDict
Caller->>GrepCore: process_raw_query(raw_query, heuristics, dicts, ignore_case)
GrepCore->>Parser: parse -> QueryInterpretations
GrepCore->>GrepCore: normalize_interpretations(interpretations)
alt schema-driven path
loop per interpretation
GrepCore->>GrepCore: get_wildcard_encodable_positions(interpretation)
GrepCore->>GrepCore: generate_logtype_string(interpretation, wildcardPositions, mask)
GrepCore->>LTD: lookup generated logtype string
loop per variable token
GrepCore->>VD: process_schema_var_token(token, var_dict, ignore_case, is_mask_encoded)
VD-->>GrepCore: variable match(es)
end
GrepCore-->>Caller: emit SubQuery(s)
end
else legacy/heuristic path
GrepCore-->>Caller: fallback processing (existing heuristic logic)
end
note right of GrepCore: process_schema_var_token handles encoded vs non-encoded tokens and wildcard masking
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}⚙️ CodeRabbit configuration file
Files:
🧬 Code graph analysis (1)components/core/src/clp/Query.hpp (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
🔇 Additional comments (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 6
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (6)
components/core/src/clp/GrepCore.cpp
(2 hunks)components/core/src/clp/GrepCore.hpp
(5 hunks)components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
(1 hunks)components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
(1 hunks)components/core/tests/test-GrepCore.cpp
(1 hunks)taskfiles/deps/main.yaml
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
components/core/src/clp/GrepCore.cpp
components/core/tests/test-GrepCore.cpp
components/core/src/clp/GrepCore.hpp
🧠 Learnings (3)
📚 Learning: 2024-10-13T09:27:43.408Z
Learnt from: LinZhihao-723
PR: y-scope/clp#557
File: components/core/tests/test-ir_encoding_methods.cpp:1216-1286
Timestamp: 2024-10-13T09:27:43.408Z
Learning: In the unit test case `ffi_ir_stream_serialize_schema_tree_node_id` in `test-ir_encoding_methods.cpp`, suppressing the `readability-function-cognitive-complexity` warning is acceptable due to the expansion of Catch2 macros in C++ tests, and such test cases may not have readability issues.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-11-01T03:26:26.386Z
Learnt from: LinZhihao-723
PR: y-scope/clp#570
File: components/core/tests/test-ir_encoding_methods.cpp:376-399
Timestamp: 2024-11-01T03:26:26.386Z
Learning: In the test code (`components/core/tests/test-ir_encoding_methods.cpp`), exception handling for `msgpack::unpack` can be omitted because the Catch2 testing framework captures exceptions if they occur.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp
📚 Learning: 2024-10-14T03:42:10.355Z
Learnt from: LinZhihao-723
PR: y-scope/clp#558
File: components/core/tests/test-ffi_KeyValuePairLogEvent.cpp:14-14
Timestamp: 2024-10-14T03:42:10.355Z
Learning: In the file `components/core/tests/test-ffi_KeyValuePairLogEvent.cpp`, including `<json/single_include/nlohmann/json.hpp>` is consistent with the project's coding standards.
Applied to files:
components/core/src/clp/ffi/ir_stream/search/test/utils.cpp
🧬 Code graph analysis (3)
components/core/src/clp/GrepCore.cpp (2)
components/core/src/clp/GrepCore.hpp (2)
interpretation
(193-195)interpretation
(207-210)components/core/src/clp/EncodedVariableInterpreter.hpp (5)
add_dict_var
(258-263)add_dict_var
(533-546)add_dict_var
(533-538)encoded_var
(52-52)encoded_var
(111-111)
components/core/tests/test-GrepCore.cpp (1)
components/core/src/clp/GrepCore.cpp (2)
get_bounds_of_next_potential_var
(28-146)get_bounds_of_next_potential_var
(28-33)
components/core/src/clp/GrepCore.hpp (4)
components/core/src/clp/QueryToken.cpp (2)
is_var
(137-145)is_var
(137-137)components/core/src/clp/EncodedVariableInterpreter.hpp (13)
logtype
(58-60)logtype
(58-58)logtype
(66-68)logtype
(66-66)logtype
(74-76)logtype
(74-74)logtype
(82-84)logtype
(82-82)wildcard_search_dictionary_and_get_encoded_matches
(218-223)wildcard_search_dictionary_and_get_encoded_matches
(486-511)wildcard_search_dictionary_and_get_encoded_matches
(486-491)encoded_var
(52-52)encoded_var
(111-111)components/core/src/clp/GrepCore.cpp (4)
get_wildcard_encodable_positions
(148-164)get_wildcard_encodable_positions
(148-149)generate_logtype_string
(166-233)generate_logtype_string
(166-169)components/core/src/glt/Query.hpp (1)
logtype_entries
(103-105)
🔇 Additional comments (13)
components/core/src/clp/ffi/ir_stream/search/test/utils.cpp (1)
15-15
: LGTM! Required header for fmt::join usage.The addition of
<fmt/ranges.h>
is necessary to enable thefmt::join
function used on line 64. This aligns with the fmt 11.x upgrade.components/core/src/clp/ffi/ir_stream/search/test/test_QueryHandlerImpl.cpp (1)
15-15
: LGTM! Consistent header addition for fmt::join.The addition of
<fmt/ranges.h>
enablesfmt::join
functionality on line 237, consistent with the fmt 11.x upgrade.components/core/tests/test-GrepCore.cpp (2)
12-14
: Good use of brace initialization for consistency.The change to brace initialization enhances code consistency and prevents narrowing conversions.
79-99
: Wildcard handling changes are intentional.GrepCore documents a fallback to "decompression + wildcard matching" for tokens with greedy wildcards (components/core/src/clp/GrepCore.hpp ~lines 472–476), and the updated tests in components/core/tests/test-GrepCore.cpp (wildcard case starting ~line 75) reflect the DFA-based behaviour: escaped '' now delimits "1" while unescaped '' is preserved in "abc*123".
components/core/src/clp/GrepCore.hpp (3)
130-182
: Well-documented schema sub-query generation logic.The documentation clearly explains the complex logic for handling encodable wildcard variables and the binary choices for interpretation. The example effectively illustrates the 2^k possible combinations.
336-346
: Verify thread-safety of static variables in GrepCore.hppinterpretations_generated and interpretations (components/core/src/clp/GrepCore.hpp: lines 336–346) are static and can race under concurrent access. The verification script produced no output, so concurrency of this code path or existing synchronization could not be confirmed. Confirm whether this runs on multiple threads; if so, replace static state with per-call/per-archive state, use thread_local, or guard access with a mutex/atomic.
613-614
: Verify SymbolId underlying type and make casts consistent.SymbolId is defined in external log_surgeon — confirm its underlying integer type and align casts: GrepCore.hpp:613–614 and GrepCore.cpp:156–157,196–197 use static_cast<uint32_t>(TokenInt/TokenFloat) while Utils.cpp (≈140–156) uses static_cast(log_surgeon::SymbolId::TokenInt/TokenFloat). Match the cast to the enum's underlying type (or change var_type) to avoid subtle mismatches.
taskfiles/deps/main.yaml (3)
443-447
: Approved — spdlog v1.15.3 is compatible with fmt v11.2.0.
Release notes show spdlog 1.15.3 bumped the bundled {fmt} to 11.2.0, so no changes required.
253-254
: Verify fmt 11.2.0 compatibility across the codebase.
- Static scan shows multiple fmt call sites; representative hits: components/core/src/clp_s/Utils.cpp:81, components/core/src/clp_s/kv_ir_search.cpp:175, components/core/src/clp_s/indexer/MySQLIndexStorage.cpp:31,58, components/core/tests/test-clp_s-search.cpp:117,215, components/core/src/clp_s/JsonConstructor.cpp:27.
- ast-grep returned no exact matches for custom formatters (no
template<> struct fmt::formatter<...>
found).- Action: build the project with fmt 11.2.0 and run all unit/integration tests; if you get compiler or test failures, update affected call sites or add/update any custom formatters (also search for partial specializations or ADL-based formatters) and re-run tests.
320-321
: Confirm log_surgeon update (a722d07) includes QueryInterpretation.
a722d07 tarball contains src/log_surgeon/wildcard_query_parser/QueryInterpretation.hpp and QueryInterpretation.cpp (also Query.hpp/Query.cpp and tests/test-query-interpretation.cpp), so the log_surgeon::wildcard_query_parser::QueryInterpretation referenced by components/core/src/clp/GrepCore.* is present.components/core/src/clp/GrepCore.cpp (3)
18-26
: LGTM: token/type aliases are clear and scopedThe using-declarations improve readability without polluting too broadly. No issues spotted.
213-230
: LGTM: representable-int/float fast-path with dictionary fallbackThe detection path is clear and orders checks correctly; encoded_var capture is fine as an out-param.
199-211
: C++20 is already enforced for the target that builds this TU — no action required.The clp target sets target_compile_features(... PRIVATE cxx_std_20) (components/core/src/clp/clp/CMakeLists.txt:172) and other CLP targets likewise use cxx_std_20, so std::unordered_map::contains is supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
components/core/src/clp/GrepCore.hpp
(5 hunks)components/core/tests/test-GrepCore.cpp
(1 hunks)taskfiles/deps/main.yaml
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/tests/test-GrepCore.cpp
components/core/src/clp/GrepCore.hpp
🧬 Code graph analysis (2)
components/core/tests/test-GrepCore.cpp (1)
components/core/src/clp/GrepCore.cpp (2)
get_bounds_of_next_potential_var
(28-146)get_bounds_of_next_potential_var
(28-33)
components/core/src/clp/GrepCore.hpp (3)
components/core/src/clp/DictionaryReader.hpp (1)
search_string
(94-94)components/core/src/clp/EncodedVariableInterpreter.hpp (13)
logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)wildcard_search_dictionary_and_get_encoded_matches
(222-227)wildcard_search_dictionary_and_get_encoded_matches
(497-522)wildcard_search_dictionary_and_get_encoded_matches
(497-502)encoded_var
(57-57)encoded_var
(116-116)components/core/src/clp/GrepCore.cpp (4)
get_wildcard_encodable_positions
(148-164)get_wildcard_encodable_positions
(148-149)generate_logtype_string
(166-233)generate_logtype_string
(166-169)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
🔇 Additional comments (7)
components/core/tests/test-GrepCore.cpp (2)
13-16
: LGTM: test initialisation simplified.Brace‑initialising begin/end/is_var improves clarity; signature update aligns with the header.
21-101
: LGTM: expectations preserved after signature change.The call sites correctly drop the lexer argument and still validate tokenisation semantics.
If CI allows, add one integration test covering the new interpretation‑driven path in process_raw_query to ensure schema‑based subquery generation remains sound.
components/core/src/clp/GrepCore.hpp (2)
184-195
: Fix typos in documentation.“indicies” → “indices”; also remove the duplicated spacing.
- * Scans the interpretation and returns the indicies of all encodable * wildcard variables. + * Scans the interpretation and returns the indices of all encodable wildcard variables. @@ - * @return A vector of positions of encodabe wildcard variables. + * @return A vector of positions of encodable wildcard variables.
540-542
: Replace magic number 32 with a named constant.Carry forward the earlier feedback and make the limit self-documenting.
- if (wildcard_encodable_positions.size() > 32) { + if (wildcard_encodable_positions.size() > cMaxEncodableWildcardVariables) { throw std::runtime_error("Too many encodable variables."); }Add near the class’ private section (example placement shown):
// Place inside class GrepCore, private: section static constexpr size_t cMaxEncodableWildcardVariables{32};taskfiles/deps/main.yaml (3)
443-448
: spdlog–fmt ABI compatibility — verify in CIv1.15.3 bundles fmt 11.2.0 and is compatible with fmt 11.x; run a minimal CI build compiling with -DSPDLOG_FMT_EXTERNAL=ON against fmt 11.2.0 (or the latest 11.x) to confirm no platform/toolchain-specific issues.
253-255
: SHA256 verified — run CI to confirm build compatibility
- taskfiles/deps/main.yaml (lines 253–255): TARBALL_SHA256 matches the downloaded fmt 11.2.0 tarball: bc23066d87ab3168f27cef3e97d545fa63314f5c79df5ea444d41d56f962c6af.
- Repository shows widespread fmt usage (find_package(fmt REQUIRED), fmt::fmt target, many fmt::format/format_to/formatter/includes).
- Cannot compile here; run CI or a local full build to ensure all consumers compile cleanly with fmt 11.2.0.
320-322
: APIs present in commit a722d07 — OK to pin.Query::get_all_multi_token_interpretations is in src/log_surgeon/wildcard_query_parser/Query.hpp; QueryInterpretation, VariableQueryToken and StaticQueryToken headers are present under src/log_surgeon/wildcard_query_parser/; SymbolId enum is in src/log_surgeon/Constants.hpp. No missing headers or symbols found.
encoded_variable_t encoded_var{}; | ||
if ((is_int | ||
&& EncodedVariableInterpreter::convert_string_to_representable_integer_var( | ||
raw_string, | ||
encoded_var | ||
)) | ||
|| (is_float | ||
&& EncodedVariableInterpreter::convert_string_to_representable_float_var( | ||
raw_string, | ||
encoded_var | ||
))) | ||
{ | ||
sub_query.add_non_dict_var(encoded_var); | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this handle when the EncodedVariableInterpreter
fails? It just seems to continue, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, continuing is the desired behavior.
The idea is to treat it as a dictionary variable if it fails. In the heuristic case these methods succeeding is exclusively what tells you its an integer/float.
In the schema case we can first check if the schema variable type matches, but we still need to check if the int/float is encodable (i.e., its not too long or the user didn't name some malformed regex int/float).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
components/core/src/clp/GrepCore.cpp
(2 hunks)components/core/src/clp/GrepCore.hpp
(6 hunks)components/core/src/clp/Query.hpp
(1 hunks)components/core/tests/test-GrepCore.cpp
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp/Query.hpp
components/core/src/clp/GrepCore.hpp
components/core/tests/test-GrepCore.cpp
components/core/src/clp/GrepCore.cpp
🧠 Learnings (1)
📚 Learning: 2025-09-11T13:15:39.930Z
Learnt from: anlowee
PR: y-scope/clp#1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.
Applied to files:
components/core/src/clp/GrepCore.cpp
🧬 Code graph analysis (3)
components/core/src/clp/GrepCore.hpp (3)
components/core/tests/test-GrepCore.cpp (10)
interpretation
(48-51)interpretation
(48-49)interpretation
(53-58)interpretation
(53-56)logtype
(207-216)logtype
(207-208)logtype
(218-228)logtype
(218-222)generate_schema_sub_queries
(81-94)generate_schema_sub_queries
(81-86)components/core/src/clp/EncodedVariableInterpreter.hpp (13)
logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)wildcard_search_dictionary_and_get_encoded_matches
(222-227)wildcard_search_dictionary_and_get_encoded_matches
(497-522)wildcard_search_dictionary_and_get_encoded_matches
(497-502)encoded_var
(57-57)encoded_var
(116-116)components/core/src/clp/GrepCore.cpp (4)
get_wildcard_encodable_positions
(148-164)get_wildcard_encodable_positions
(148-149)generate_logtype_string
(166-233)generate_logtype_string
(166-169)
components/core/tests/test-GrepCore.cpp (4)
components/core/src/clp/GrepCore.hpp (9)
interpretation
(197-199)interpretation
(211-214)process_schema_var_token
(235-241)process_schema_var_token
(606-668)process_schema_var_token
(606-612)generate_schema_sub_queries
(178-185)generate_schema_sub_queries
(535-603)generate_schema_sub_queries
(535-541)value
(75-80)components/core/src/clp/GrepCore.cpp (6)
get_wildcard_encodable_positions
(148-164)get_wildcard_encodable_positions
(148-149)generate_logtype_string
(166-233)generate_logtype_string
(166-169)get_bounds_of_next_potential_var
(28-146)get_bounds_of_next_potential_var
(28-33)components/core/src/clp/EncodedVariableInterpreter.hpp (14)
id
(56-56)value
(97-100)value
(107-110)add_dict_var
(266-271)add_dict_var
(548-561)add_dict_var
(548-553)logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)components/core/src/clp/Query.cpp (2)
add_dict_var
(102-105)add_dict_var
(103-103)
components/core/src/clp/GrepCore.cpp (2)
components/core/src/clp/GrepCore.hpp (2)
interpretation
(197-199)interpretation
(211-214)components/core/tests/test-GrepCore.cpp (7)
interpretation
(48-51)interpretation
(48-49)interpretation
(53-58)interpretation
(53-56)add_int_var
(175-175)add_float_var
(177-177)add_dict_var
(179-179)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (15)
components/core/src/clp/Query.hpp (1)
158-159
: Deterministic default for wildcard flag — good.Default-initializing m_wildcard_match_required to false prevents UB from uninitialized reads. Please confirm SubQuery::clear() also resets this flag to false to maintain invariants across reuse.
components/core/tests/test-GrepCore.cpp (1)
590-718
: Solid coverage for schema subquery generation.Good end-to-end assertions on wildcard-encodable combinations, variable shapes, and logtype IDs.
components/core/src/clp/GrepCore.hpp (8)
9-14
: Add missing include for std::set.This header declares std::set in public APIs; include to avoid relying on transitive includes.
Apply:
#include <string_view> +#include <set> #include <unordered_map> #include <unordered_set> #include <vector>
188-196
: Typos in documentation.Fix “indicies” → “indices” and “encodabe” → “encodable”.
- * Scans the interpretation and returns the indicies of all encodable wildcard variables. + * Scans the interpretation and returns the indices of all encodable wildcard variables. ... - * @return A vector of positions of encodabe wildcard variables. + * @return A vector of positions of encodable wildcard variables.
207-213
: Typo in documentation.“interpetation” → “interpretation”.
- * @param interpretation The interpetation to convert to a logtype string. + * @param interpretation The interpretation to convert to a logtype string.
288-294
: Follow project style: usefalse == expr
instead of!expr
.Aligns with coding guidelines.
As per coding guidelines
- if (!query_token.has_greedy_wildcard_in_middle() && query_token.is_ambiguous_token()) { + if (false == query_token.has_greedy_wildcard_in_middle() + && query_token.is_ambiguous_token()) + {
342-352
: Remove thread-unsafe statics; compute interpretations per call (or add a proper cache).Function-local statics here are shared across calls and archives, race-prone, and nullified by cExecuteForEveryArchive. Construct Query and get interpretations per call.
- static bool interpretations_generated{false}; - static std::set<log_surgeon::wildcard_query_parser::QueryInterpretation> interpretations; - - // TODO: This needs to be done for every archive until we have per schema logic. - constexpr bool cExecuteForEveryArchive{true}; - if (cExecuteForEveryArchive || false == interpretations_generated) { - log_surgeon::wildcard_query_parser::Query const query(search_string); - interpretations.clear(); - interpretations = query.get_all_multi_token_interpretations(lexer); - interpretations_generated = true; - } + log_surgeon::wildcard_query_parser::Query const query(search_string); + auto interpretations = query.get_all_multi_token_interpretations(lexer);If caching is required, introduce a schema-keyed cache with synchronisation.
353-356
: Fix comment typo “Transfrom”.- // Transfrom log-surgeon interpretations into CLP sub-queries. + // Transform log-surgeon interpretations into CLP sub-queries.
543-551
: Avoid undefined/implementation-defined shifts; extract magic number.
- size_t shifting by variable bit width is UB on 32-bit; use uint64_t and guard.
- Replace magic 32 with a named constant.
- if (wildcard_encodable_positions.size() > 32) { - throw std::runtime_error("Too many encodable variables."); - } - size_t const num_combos{static_cast<size_t>(1) << wildcard_encodable_positions.size()}; - for (size_t mask{0}; mask < num_combos; ++mask) { + constexpr size_t cMaxEncodableWildcardVariables{32}; + auto const k = wildcard_encodable_positions.size(); + if (k > cMaxEncodableWildcardVariables) { + throw std::runtime_error("Too many encodable variables."); + } + if (k >= 64) { + throw std::runtime_error("Wildcard encodable variable count exceeds mask width."); + } + uint64_t const num_combos{1ULL << k}; + for (uint64_t mask{0}; mask < num_combos; ++mask) {
549-552
: Reduce per-iteration allocations.Building an unordered_map per mask is expensive. Use a flat container (e.g., vector<uint8_t>) keyed by local index, or reuse a pre-sized map.
Example approach:
- Precompute vector<size_t> positions = wildcard_encodable_positions.
- For each mask, fill vector<uint8_t> encoded(k) with (mask >> i) & 1 and consult positions[i] inside generate_logtype_string via a small adapter.
components/core/src/clp/GrepCore.cpp (5)
6-7
: Include for std::holds_alternative/std::get.Avoid relying on transitive includes.
#include <string_view> +#include <variant> #include <unordered_map> #include <vector>
155-158
: Prefer strongly typed enum comparisons.Compare as SymbolId, not via uint32_t.
- auto const var_type{variable_token.get_variable_type()}; - bool const is_int{static_cast<uint32_t>(TokenInt) == var_type}; - bool const is_float{static_cast<uint32_t>(TokenFloat) == var_type}; + auto const var_symbol = static_cast<decltype(TokenInt)>(variable_token.get_variable_type()); + bool const is_int = (var_symbol == TokenInt); + bool const is_float = (var_symbol == TokenFloat);
172-181
: Reserve logic assumes one byte per placeholder.If EncodedVariableInterpreter ever appends more than one byte, this under-reserves. Consider exposing/using a constexpr placeholder size or asserting the invariant.
184-185
: Fix comment typo.“Generete” → “Generate”.
- // Generete `logtype_string`. + // Generate `logtype_string`.
196-198
: Prefer strongly typed enum comparisons.Repeat: compare as SymbolId, not via uint32_t.
- bool const is_int{static_cast<uint32_t>(TokenInt) == var_type}; - bool const is_float{static_cast<uint32_t>(TokenFloat) == var_type}; + bool const is_int = (var_type == TokenInt); + bool const is_float = (var_type == TokenFloat);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/src/clp/GrepCore.cpp
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp/GrepCore.cpp
🧠 Learnings (1)
📚 Learning: 2025-09-11T13:15:39.930Z
Learnt from: anlowee
PR: y-scope/clp#1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.
Applied to files:
components/core/src/clp/GrepCore.cpp
🧬 Code graph analysis (1)
components/core/src/clp/GrepCore.cpp (3)
components/core/src/clp/GrepCore.hpp (2)
interpretation
(197-199)interpretation
(211-214)components/core/tests/test-GrepCore.cpp (7)
interpretation
(48-51)interpretation
(48-49)interpretation
(53-58)interpretation
(53-56)add_int_var
(175-175)add_float_var
(177-177)add_dict_var
(179-179)components/core/src/clp/EncodedVariableInterpreter.hpp (5)
add_dict_var
(266-271)add_dict_var
(548-561)add_dict_var
(548-553)encoded_var
(57-57)encoded_var
(116-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: package-image
🔇 Additional comments (5)
components/core/src/clp/GrepCore.cpp (5)
19-23
: LGTM: explicit SymbolId enumerators.Bringing TokenInt/TokenFloat into scope is clear and consistent with usage below.
25-26
: LGTM: local aliases for STL types.Keeps signatures succinct without hiding ownership semantics.
173-183
: Preallocation assumes add_*_var appends one character.If add_int_var/add_float_var/add_dict_var ever append more than one byte, this under-reserves. Either expose constexpr sizes or assert the invariant.
Can you confirm each add_*_var appends exactly one byte today?
185-185
: Fix typo in comment.“Generete” -> “Generate”.
- // Generete `logtype_string`. + // Generate `logtype_string`.
167-232
: Cacheinterpretation.get_logtype()
, iterate by const& and replacecontains
+at
with a singlefind()
Removes double map lookup and drops the C++20unordered_map::contains
dependency; verify your CMakeLists.txt C++ standard to ensure C++17 compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp/GrepCore.cpp
(2 hunks)components/core/src/clp/GrepCore.hpp
(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp/GrepCore.hpp
components/core/src/clp/GrepCore.cpp
🧠 Learnings (1)
📚 Learning: 2025-09-11T13:15:39.930Z
Learnt from: anlowee
PR: y-scope/clp#1176
File: components/core/src/clp_s/ColumnWriter.cpp:67-72
Timestamp: 2025-09-11T13:15:39.930Z
Learning: In components/core/src/clp_s/ColumnWriter.cpp, when using sizeof with types like variable_dictionary_id_t, always use the fully qualified namespace (clp::variable_dictionary_id_t) to match the variable declaration and maintain consistency with other column writers in the codebase.
Applied to files:
components/core/src/clp/GrepCore.hpp
components/core/src/clp/GrepCore.cpp
🧬 Code graph analysis (2)
components/core/src/clp/GrepCore.hpp (3)
components/core/tests/test-GrepCore.cpp (10)
interpretation
(48-51)interpretation
(48-49)interpretation
(53-58)interpretation
(53-56)logtype
(207-216)logtype
(207-208)logtype
(218-228)logtype
(218-222)generate_schema_sub_queries
(81-94)generate_schema_sub_queries
(81-86)components/core/src/clp/EncodedVariableInterpreter.hpp (13)
logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)wildcard_search_dictionary_and_get_encoded_matches
(222-227)wildcard_search_dictionary_and_get_encoded_matches
(497-522)wildcard_search_dictionary_and_get_encoded_matches
(497-502)encoded_var
(57-57)encoded_var
(116-116)components/core/src/clp/GrepCore.cpp (4)
get_wildcard_encodable_positions
(149-165)get_wildcard_encodable_positions
(149-150)generate_logtype_string
(167-232)generate_logtype_string
(167-170)
components/core/src/clp/GrepCore.cpp (3)
components/core/src/clp/GrepCore.hpp (2)
interpretation
(198-200)interpretation
(212-215)components/core/tests/test-GrepCore.cpp (5)
interpretation
(48-51)interpretation
(48-49)interpretation
(53-58)interpretation
(53-56)add_dict_var
(179-179)components/core/src/clp/EncodedVariableInterpreter.hpp (5)
add_dict_var
(266-271)add_dict_var
(548-561)add_dict_var
(548-553)encoded_var
(57-57)encoded_var
(116-116)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: build-macos (macos-14, false)
🔇 Additional comments (13)
components/core/src/clp/GrepCore.hpp (10)
7-7
: Good fix: add missing include.Prevents reliance on transitive includes.
212-215
: LGTM: clear API for logtype-string generation.
235-243
: LGTM: helper API surface reads well.
293-297
: LGTM: project style honoured (false == …).
624-627
: Confirm intent: encoded-wildcard path adds no var constraint.Early-return marks wildcard_match_required without adding an encoded var. If that’s intentional (accept any encoded value and defer to message-level matching), fine; otherwise, add a non-dict var placeholder for clarity/consistency.
28-32
: Gate test-only friend behind a build flag.Expose the friend only in test builds to avoid leaking test details in public headers.
- class GrepCoreTest; + #ifdef CLP_ENABLE_TESTS + class GrepCoreTest; + #endif ... class GrepCore { - friend class GrepCoreTest; + #ifdef CLP_ENABLE_TESTS + friend class GrepCoreTest; + #endif
345-355
: Remove unsafe function-local statics; compute interpretations per call.Statics here are thread-unsafe and nullified by cExecuteForEveryArchive{true}. Build the interpretations locally.
- static bool interpretations_generated{false}; - static std::set<log_surgeon::wildcard_query_parser::QueryInterpretation> interpretations; - - // TODO: This needs to be done for every archive until we have per schema logic. - constexpr bool cExecuteForEveryArchive{true}; - if (cExecuteForEveryArchive || false == interpretations_generated) { - log_surgeon::wildcard_query_parser::Query const query(search_string); - interpretations.clear(); - interpretations = query.get_all_multi_token_interpretations(lexer); - interpretations_generated = true; - } + log_surgeon::wildcard_query_parser::Query const query(search_string); + auto interpretations = query.get_all_multi_token_interpretations(lexer);
356-356
: Fix typo: “Transfrom” → “Transform”.- // Transfrom log-surgeon interpretations into CLP sub-queries. + // Transform log-surgeon interpretations into CLP sub-queries.
545-556
: Fix undefined/implementation-defined shifts; use 64-bit masks and guard k.Shifting size_t by its bit width is UB on 32-bit. Also 2^32 combos is infeasible.
- size_t const num_combos{static_cast<size_t>(1) << wildcard_encodable_positions.size()}; - for (size_t mask{0}; mask < num_combos; ++mask) { + auto const k = wildcard_encodable_positions.size(); + if (k >= 64) { + throw std::runtime_error("Too many encodable variables for mask enumeration (k >= 64)."); + } + uint64_t const num_combos{1ULL << k}; + for (uint64_t mask{0}; mask < num_combos; ++mask) { std::unordered_map<size_t, bool> wildcard_mask_map; - for (size_t i{0}; i < wildcard_encodable_positions.size(); ++i) { - wildcard_mask_map[wildcard_encodable_positions[i]] = mask >> i & 1; + for (size_t i{0}; i < k; ++i) { + wildcard_mask_map[wildcard_encodable_positions[i]] = (mask >> i) & 1ULL; }Additionally, consider lowering the practical cap from 32 to something manageable (e.g., 20) to avoid 2^k explosion.
609-623
: Use strongly typed enum comparisons for SymbolId.Avoid uint32_t-based comparisons; compare as SymbolId.
- auto const var_type{variable_token.get_variable_type()}; - bool const is_int{static_cast<uint32_t>(log_surgeon::SymbolId::TokenInt) == var_type}; - bool const is_float{static_cast<uint32_t>(log_surgeon::SymbolId::TokenFloat) == var_type}; + auto const var_symbol = static_cast<log_surgeon::SymbolId>(variable_token.get_variable_type()); + bool const is_int = (var_symbol == log_surgeon::SymbolId::TokenInt); + bool const is_float = (var_symbol == log_surgeon::SymbolId::TokenFloat);components/core/src/clp/GrepCore.cpp (3)
6-8
: LGTM: explicit STL includes (unordered_map, variant, vector).
19-23
: LGTM: using-declarations improve readability and locality.
149-165
: Avoid variant copies and repeated get_logtype() calls.Cache the logtype and iterate by const reference; reserve capacity.
-auto GrepCore::get_wildcard_encodable_positions(QueryInterpretation const& interpretation) - -> vector<size_t> { - vector<size_t> wildcard_encodable_positions; - for (size_t i{0}; i < interpretation.get_logtype().size(); ++i) { - auto const token{interpretation.get_logtype()[i]}; +auto GrepCore::get_wildcard_encodable_positions(QueryInterpretation const& interpretation) + -> vector<size_t> { + auto const& logtype = interpretation.get_logtype(); + vector<size_t> wildcard_encodable_positions; + wildcard_encodable_positions.reserve(logtype.size()); + for (size_t i{0}; i < logtype.size(); ++i) { + auto const& token = logtype[i];
…esting code for DP algo; Add final unit-test for testing PR end-to-end; Still lots of edge cases and sub-steps that could be tested more rigorously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (6)
components/core/src/clp/GrepCore.hpp (5)
191-197
: Fix typos in docs.“than” → “that”; “encodabe” → “encodable”.
- * An encodable variable is a variable token than: + * An encodable variable is a variable token that: @@ - * @return A vector of positions of encodabe wildcard variables. + * @return A vector of positions of encodable wildcard variables.
28-32
: Gate test‑only friend with a build flag.Avoid leaking test details to consumers; wrap the friend in CLP_BUILD_TESTING (or your project’s test macro).
-class GrepCoreTest; + #if defined(CLP_BUILD_TESTING) + class GrepCoreTest; + #endif class GrepCore { - friend class GrepCoreTest; + #if defined(CLP_BUILD_TESTING) + friend class GrepCoreTest; + #endif
544-547
: Avoid per‑iteration unordered_map allocations for the mask.This map is rebuilt for every mask; switch to a flat mask (vector<uint8_t>/bitset) or add an overload of generate_logtype_string that takes (positions, mask).
- std::unordered_map<size_t, bool> wildcard_mask_map; - for (size_t i{0}; i < wildcard_encodable_positions.size(); ++i) { - wildcard_mask_map[wildcard_encodable_positions[i]] = mask >> i & 1ULL; - } + // Lightweight mask: local index -> bit + std::vector<uint8_t> mask_bits(wildcard_encodable_positions.size()); + for (size_t i{0}; i < wildcard_encodable_positions.size(); ++i) { + mask_bits[i] = static_cast<uint8_t>((mask >> i) & 1ULL); + } + // Prefer an overload: generate_logtype_string(interpretation, wildcard_encodable_positions, mask_bits)If changing signature isn’t feasible now, at least reserve:
+ wildcard_mask_map.reserve(wildcard_encodable_positions.size());
536-543
: Cap combinatorial explosion (2^k) to a practical limit.k=32 allows 4,294,967,296 combinations; this can hang a search. Lower the cap (e.g., 16) and fail fast.
- constexpr size_t cMaxEncodableWildcardVariables{32}; + constexpr size_t cMaxEncodableWildcardVariables{16}; // practical cap to bound 2^k work @@ - uint64_t const num_combos{1ULL << wildcard_encodable_positions.size()}; + auto const k = wildcard_encodable_positions.size(); + uint64_t const num_combos{1ULL << k};Optionally add telemetry to surface when the cap is hit.
612-614
: Use strongly‑typed enum comparisons for symbol IDs.Avoid underlying‑integer compares; cast once to SymbolId and compare enums.
- auto const var_type{variable_token.get_variable_type()}; - - bool const is_int{static_cast<uint32_t>(log_surgeon::SymbolId::TokenInt) == var_type}; - bool const is_float{static_cast<uint32_t>(log_surgeon::SymbolId::TokenFloat) == var_type}; + auto const var_symbol{ + static_cast<log_surgeon::SymbolId>(variable_token.get_variable_type()) + }; + bool const is_int = (var_symbol == log_surgeon::SymbolId::TokenInt); + bool const is_float= (var_symbol == log_surgeon::SymbolId::TokenFloat);components/core/tests/test-GrepCore.cpp (1)
493-500
: Fix mask width and bit extraction.Shift on size_t with literal 1 risks UB on 32‑bit; also mask bit should use 1ULL.
- size_t const num_combos{static_cast<size_t>(1) << wildcard_encodable_positions.size()}; + uint64_t const num_combos{1ULL << wildcard_encodable_positions.size()}; REQUIRE(num_combos == 4); unordered_set<string> logtype_strings; - for (size_t mask{0}; mask < num_combos; ++mask) { + for (uint64_t mask{0}; mask < num_combos; ++mask) { unordered_map<size_t, bool> wildcard_mask_map; for (size_t i{0}; i < wildcard_encodable_positions.size(); ++i) { - wildcard_mask_map[wildcard_encodable_positions[i]] = mask >> i & 1; + wildcard_mask_map[wildcard_encodable_positions[i]] = (mask >> i) & 1ULL; }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
components/core/src/clp/GrepCore.hpp
(6 hunks)components/core/src/clp/string_utils/string_utils.cpp
(3 hunks)components/core/tests/test-GrepCore.cpp
(1 hunks)components/core/tests/test-string_utils.cpp
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/tests/test-string_utils.cpp
components/core/src/clp/GrepCore.hpp
components/core/src/clp/string_utils/string_utils.cpp
components/core/tests/test-GrepCore.cpp
🧠 Learnings (1)
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
components/core/src/clp/GrepCore.hpp
🧬 Code graph analysis (3)
components/core/tests/test-string_utils.cpp (2)
components/core/src/clp/string_utils/string_utils.cpp (2)
wildcard_match_unsafe_case_sensitive
(238-342)wildcard_match_unsafe_case_sensitive
(238-238)components/core/src/clp/string_utils/string_utils.hpp (1)
wildcard_match_unsafe_case_sensitive
(142-142)
components/core/src/clp/GrepCore.hpp (3)
components/core/tests/test-GrepCore.cpp (10)
interpretation
(55-58)interpretation
(55-56)interpretation
(60-65)interpretation
(60-63)logtype
(220-230)logtype
(221-222)logtype
(232-242)logtype
(232-236)generate_schema_sub_queries
(88-101)generate_schema_sub_queries
(88-93)components/core/src/clp/EncodedVariableInterpreter.hpp (13)
logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)wildcard_search_dictionary_and_get_encoded_matches
(222-227)wildcard_search_dictionary_and_get_encoded_matches
(497-522)wildcard_search_dictionary_and_get_encoded_matches
(497-502)encoded_var
(57-57)encoded_var
(116-116)components/core/src/clp/GrepCore.cpp (4)
get_wildcard_encodable_positions
(149-165)get_wildcard_encodable_positions
(149-150)generate_logtype_string
(167-232)generate_logtype_string
(167-170)
components/core/tests/test-GrepCore.cpp (6)
components/core/src/clp/string_utils/string_utils.cpp (2)
wildcard_match_unsafe_case_sensitive
(238-342)wildcard_match_unsafe_case_sensitive
(238-238)components/core/src/clp/GrepCore.hpp (12)
interpretation
(198-200)interpretation
(212-215)process_schema_var_token
(236-242)process_schema_var_token
(601-663)process_schema_var_token
(601-607)generate_schema_sub_queries
(179-186)generate_schema_sub_queries
(529-598)generate_schema_sub_queries
(529-535)value
(76-81)process_raw_query
(56-65)process_raw_query
(248-368)process_raw_query
(248-257)components/core/src/clp/GrepCore.cpp (6)
get_wildcard_encodable_positions
(149-165)get_wildcard_encodable_positions
(149-150)generate_logtype_string
(167-232)generate_logtype_string
(167-170)get_bounds_of_next_potential_var
(29-147)get_bounds_of_next_potential_var
(29-34)components/core/src/clp/EncodedVariableInterpreter.hpp (14)
id
(56-56)value
(97-100)value
(107-110)add_dict_var
(266-271)add_dict_var
(548-561)add_dict_var
(548-553)logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)components/core/src/clp/Query.cpp (2)
add_dict_var
(102-105)add_dict_var
(103-103)components/core/src/clp/QueryToken.cpp (2)
is_var
(137-145)is_var
(137-137)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: package-image
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: build-macos (macos-15, false)
🔇 Additional comments (4)
components/core/src/clp/string_utils/string_utils.cpp (3)
17-18
: LGTM! Clear precondition documentation.The added comment clarifies the assumption that
wild_current
has no duplicate greedy wildcards, which aligns with the implementation on line 41-42.
263-269
: LGTM! Improved consecutive wildcard handling.The while loop correctly consumes all consecutive '*' characters and efficiently handles the trailing wildcard case with an early return.
316-319
: LGTM! Correct trailing wildcard handling.The loop correctly consumes all trailing '*' characters in wild when tame ends, and the final comparison properly determines if the match is successful.
components/core/tests/test-string_utils.cpp (1)
148-151
: Good additions: double-asterisk coverage is solid.These cases mirror single-asterisk variants and match the matcher’s semantics for consecutive '*'. No changes requested.
Also applies to: 158-161, 168-171, 178-181, 188-191, 198-201
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
components/core/src/clp/GrepCore.hpp (2)
616-618
: Compare SymbolId values stronglyThis still relies on
uint32_t
casts; please castvar_type
back toSymbolId
and compare strongly typed, matching the.cpp
implementation. Doing so removes the magic numbers and avoids future breakage if the enum underlying type changes.- auto const var_type{variable_token.get_variable_type()}; - - bool const is_int{static_cast<uint32_t>(log_surgeon::SymbolId::TokenInt) == var_type}; - bool const is_float{static_cast<uint32_t>(log_surgeon::SymbolId::TokenFloat) == var_type}; + auto const var_type{ + static_cast<log_surgeon::SymbolId>(variable_token.get_variable_type()) + }; + + bool const is_int{log_surgeon::SymbolId::TokenInt == var_type}; + bool const is_float{log_surgeon::SymbolId::TokenFloat == var_type};
195-201
: Fix typos in the new DoxygenPlease correct “than” → “that” and “encodabe” → “encodable” so the comment reads cleanly.
- * An encodable variable is a variable token than: + * An encodable variable is a variable token that: @@ - * @return A vector of positions of encodabe wildcard variables. + * @return A vector of positions of encodable wildcard variables.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/core/CMakeLists.txt
(1 hunks)components/core/src/clp/GrepCore.hpp
(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp/GrepCore.hpp
🧠 Learnings (1)
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
components/core/src/clp/GrepCore.hpp
🧬 Code graph analysis (1)
components/core/src/clp/GrepCore.hpp (3)
components/core/tests/test-GrepCore.cpp (10)
interpretation
(55-58)interpretation
(55-56)interpretation
(60-65)interpretation
(60-63)logtype
(220-230)logtype
(221-222)logtype
(232-242)logtype
(232-236)generate_schema_sub_queries
(88-101)generate_schema_sub_queries
(88-93)components/core/src/clp/EncodedVariableInterpreter.hpp (13)
logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)wildcard_search_dictionary_and_get_encoded_matches
(222-227)wildcard_search_dictionary_and_get_encoded_matches
(497-522)wildcard_search_dictionary_and_get_encoded_matches
(497-502)encoded_var
(57-57)encoded_var
(116-116)components/core/src/clp/GrepCore.cpp (4)
get_wildcard_encodable_positions
(149-165)get_wildcard_encodable_positions
(149-150)generate_logtype_string
(167-232)generate_logtype_string
(167-170)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (1)
components/core/CMakeLists.txt (1)
774-776
: Scoped test macro wiring looks goodDefining
CLP_BUILD_TESTING
only for theunitTest
target is exactly what the header guard expects, so production builds stay untouched. Nicely done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
components/core/src/clp/GrepCore.hpp (1)
141-190
: Minor typo in documentation.Line 151: "We represents" should be "We represent".
- * If there are k encodable wildcard variables, then 2^k logtype strings are possible. As a - * result we limit k <= 16. We represents these alternatives using a bit mask. + * If there are k encodable wildcard variables, then 2^k logtype strings are possible. As a + * result we limit k <= 16. We represent these alternatives using a bit mask.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/src/clp/GrepCore.hpp
(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp/GrepCore.hpp
🧠 Learnings (1)
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
components/core/src/clp/GrepCore.hpp
🧬 Code graph analysis (1)
components/core/src/clp/GrepCore.hpp (3)
components/core/tests/test-GrepCore.cpp (10)
interpretation
(59-62)interpretation
(59-60)interpretation
(64-69)interpretation
(64-67)logtype
(224-234)logtype
(225-226)logtype
(236-246)logtype
(236-240)generate_schema_sub_queries
(92-105)generate_schema_sub_queries
(92-97)components/core/src/clp/EncodedVariableInterpreter.hpp (13)
logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)wildcard_search_dictionary_and_get_encoded_matches
(222-227)wildcard_search_dictionary_and_get_encoded_matches
(497-522)wildcard_search_dictionary_and_get_encoded_matches
(497-502)encoded_var
(57-57)encoded_var
(116-116)components/core/src/clp/GrepCore.cpp (4)
get_wildcard_encodable_positions
(150-169)get_wildcard_encodable_positions
(150-151)generate_logtype_string
(171-238)generate_logtype_string
(171-174)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: package-image
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
🔇 Additional comments (4)
components/core/src/clp/GrepCore.hpp (4)
7-14
: LGTM!The added includes (
<set>
,<unordered_map>
) and the updated log_surgeon dependency (Query.hpp
) are necessary for the new interpretation-based search functionality.
348-359
: LGTM!The interpretation-based path correctly constructs
log_surgeon::wildcard_query_parser::Query
and generates schema sub-queries. The TODO comment on line 349 appropriately flags the caching optimization opportunity for future work.
540-605
: LGTM!The implementation correctly:
- Caps encodable wildcards at 16 to prevent exponential explosion (line 540, 543-545)
- Uses
uint64_t
for mask operations to avoid undefined behaviour (lines 546-547)- Employs
find()
for single map lookup (lines 575-578)- Reserves capacity for
possible_logtype_ids
(line 597)- Follows project style with
false == expr
comparisons (lines 588, 592)
607-670
: LGTM!The implementation correctly:
- Uses strongly typed enum comparisons (lines 617-619)
- Handles wildcard-masked encoded variables (lines 621-624)
- Falls back to dictionary search for wildcard variables (lines 626-633)
- Attempts encoding for int/float types before dictionary lookup (lines 635-649)
- Handles both single and multiple dictionary entries efficiently (lines 651-669)
- Reserves capacity for both
encoded_vars
andvar_dict_ids
(lines 662-663)
log_surgeon::Query
.log_surgeon::Query
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
components/core/src/clp/GrepCore.cpp
(2 hunks)components/core/src/clp/GrepCore.hpp
(6 hunks)components/core/tests/test-GrepCore.cpp
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp/GrepCore.hpp
components/core/tests/test-GrepCore.cpp
components/core/src/clp/GrepCore.cpp
🧠 Learnings (1)
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
components/core/src/clp/GrepCore.hpp
🧬 Code graph analysis (3)
components/core/src/clp/GrepCore.hpp (3)
components/core/tests/test-GrepCore.cpp (10)
interpretations
(71-74)interpretations
(71-72)interpretation
(95-98)interpretation
(95-96)interpretation
(100-105)interpretation
(100-103)logtype
(262-272)logtype
(263-264)logtype
(274-284)logtype
(274-278)components/core/src/clp/EncodedVariableInterpreter.hpp (13)
logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)wildcard_search_dictionary_and_get_encoded_matches
(225-230)wildcard_search_dictionary_and_get_encoded_matches
(505-530)wildcard_search_dictionary_and_get_encoded_matches
(505-510)encoded_var
(57-57)encoded_var
(116-116)components/core/src/clp/GrepCore.cpp (6)
normalize_interpretations
(153-186)normalize_interpretations
(153-154)get_wildcard_encodable_positions
(188-207)get_wildcard_encodable_positions
(188-189)generate_logtype_string
(209-276)generate_logtype_string
(209-212)
components/core/tests/test-GrepCore.cpp (5)
components/core/src/clp/string_utils/string_utils.cpp (2)
wildcard_match_unsafe_case_sensitive
(236-336)wildcard_match_unsafe_case_sensitive
(236-236)components/core/src/clp/GrepCore.hpp (13)
interpretations
(152-154)generate_schema_sub_queries
(199-206)generate_schema_sub_queries
(553-625)generate_schema_sub_queries
(553-559)interpretation
(218-220)interpretation
(232-235)process_schema_var_token
(256-262)process_schema_var_token
(628-690)process_schema_var_token
(628-634)value
(81-86)process_raw_query
(61-70)process_raw_query
(269-390)process_raw_query
(269-278)components/core/src/clp/GrepCore.cpp (8)
normalize_interpretations
(153-186)normalize_interpretations
(153-154)get_wildcard_encodable_positions
(188-207)get_wildcard_encodable_positions
(188-189)generate_logtype_string
(209-276)generate_logtype_string
(209-212)get_bounds_of_next_potential_var
(33-151)get_bounds_of_next_potential_var
(33-38)components/core/src/clp/EncodedVariableInterpreter.hpp (14)
id
(56-56)value
(97-100)value
(107-110)add_dict_var
(271-276)add_dict_var
(558-571)add_dict_var
(558-563)logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)components/core/src/clp/Query.cpp (2)
add_dict_var
(102-105)add_dict_var
(103-103)
components/core/src/clp/GrepCore.cpp (3)
components/core/src/clp/string_utils/string_utils.cpp (2)
is_wildcard
(146-154)is_wildcard
(146-146)components/core/src/clp/GrepCore.hpp (3)
interpretations
(152-154)interpretation
(218-220)interpretation
(232-235)components/core/src/clp/EncodedVariableInterpreter.hpp (11)
logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)add_dict_var
(271-276)add_dict_var
(558-571)add_dict_var
(558-563)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: antlr-code-committed (macos-15)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: lint-check (macos-15)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-14, true)
🔇 Additional comments (1)
components/core/src/clp/GrepCore.cpp (1)
209-276
: LGTM: efficient logtype constructionPre-sizing, single map lookup, and typed enum compares look solid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
components/core/src/clp/GrepCore.hpp
(6 hunks)components/core/tests/test-GrepCore.cpp
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/tests/test-GrepCore.cpp
components/core/src/clp/GrepCore.hpp
🧠 Learnings (1)
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
components/core/src/clp/GrepCore.hpp
🧬 Code graph analysis (2)
components/core/tests/test-GrepCore.cpp (4)
components/core/src/clp/string_utils/string_utils.cpp (2)
wildcard_match_unsafe_case_sensitive
(236-336)wildcard_match_unsafe_case_sensitive
(236-236)components/core/src/clp/GrepCore.hpp (13)
interpretations
(154-156)generate_schema_sub_queries
(201-208)generate_schema_sub_queries
(555-627)generate_schema_sub_queries
(555-561)interpretation
(220-222)interpretation
(234-237)process_schema_var_token
(258-264)process_schema_var_token
(630-692)process_schema_var_token
(630-636)value
(83-88)process_raw_query
(63-72)process_raw_query
(271-392)process_raw_query
(271-280)components/core/src/clp/GrepCore.cpp (8)
normalize_interpretations
(153-186)normalize_interpretations
(153-154)get_wildcard_encodable_positions
(188-207)get_wildcard_encodable_positions
(188-189)generate_logtype_string
(209-276)generate_logtype_string
(209-212)get_bounds_of_next_potential_var
(33-151)get_bounds_of_next_potential_var
(33-38)components/core/src/clp/EncodedVariableInterpreter.hpp (14)
id
(56-56)value
(97-100)value
(107-110)add_dict_var
(271-276)add_dict_var
(558-571)add_dict_var
(558-563)logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)
components/core/src/clp/GrepCore.hpp (4)
components/core/tests/test-GrepCore.cpp (12)
interpretations
(71-74)interpretations
(71-72)interpretation
(95-98)interpretation
(95-96)interpretation
(100-105)interpretation
(100-103)logtype
(263-273)logtype
(264-265)logtype
(275-285)logtype
(275-279)generate_schema_sub_queries
(80-93)generate_schema_sub_queries
(80-85)components/core/src/clp/QueryToken.cpp (2)
is_var
(137-145)is_var
(137-137)components/core/src/clp/EncodedVariableInterpreter.hpp (13)
logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)wildcard_search_dictionary_and_get_encoded_matches
(225-230)wildcard_search_dictionary_and_get_encoded_matches
(505-530)wildcard_search_dictionary_and_get_encoded_matches
(505-510)encoded_var
(57-57)encoded_var
(116-116)components/core/src/clp/GrepCore.cpp (6)
normalize_interpretations
(153-186)normalize_interpretations
(153-154)get_wildcard_encodable_positions
(188-207)get_wildcard_encodable_positions
(188-189)generate_logtype_string
(209-276)generate_logtype_string
(209-212)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: lint-check (macos-15)
- GitHub Check: rust-checks (macos-15)
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (ubuntu-24.04)
🔇 Additional comments (1)
components/core/tests/test-GrepCore.cpp (1)
69-124
: Nice GrepCore test harness.Line 69: The
GrepCoreTest
wrapper cleanly exposes the schema-focused helpers for unit coverage without widening the production surface—exactly what we need to exercise the DFA flow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
components/core/src/clp/GrepCore.cpp
(2 hunks)components/core/src/clp/GrepCore.hpp
(6 hunks)components/core/tests/test-GrepCore.cpp
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/tests/test-GrepCore.cpp
components/core/src/clp/GrepCore.cpp
components/core/src/clp/GrepCore.hpp
🧠 Learnings (1)
📚 Learning: 2025-08-09T04:07:27.083Z
Learnt from: Bill-hbrhbr
PR: y-scope/clp#1156
File: components/core/CMakeLists.txt:772-772
Timestamp: 2025-08-09T04:07:27.083Z
Learning: In the CLP project's CMakeLists.txt, when reviewing changes related to the ${zstd_TARGET} variable usage in test linking, the team is planning a refactoring PR to improve this mechanism. Guards for undefined target variables should be deferred to that separate PR rather than being added in focused dependency migration PRs.
Applied to files:
components/core/src/clp/GrepCore.hpp
🧬 Code graph analysis (3)
components/core/tests/test-GrepCore.cpp (5)
components/core/src/clp/string_utils/string_utils.cpp (2)
wildcard_match_unsafe_case_sensitive
(236-336)wildcard_match_unsafe_case_sensitive
(236-236)components/core/src/clp/GrepCore.hpp (13)
interpretations
(154-156)generate_schema_sub_queries
(201-208)generate_schema_sub_queries
(557-636)generate_schema_sub_queries
(557-563)interpretation
(220-222)interpretation
(235-239)process_schema_var_token
(260-266)process_schema_var_token
(639-701)process_schema_var_token
(639-645)value
(83-88)process_raw_query
(63-72)process_raw_query
(273-394)process_raw_query
(273-282)components/core/src/clp/GrepCore.cpp (8)
normalize_interpretations
(153-186)normalize_interpretations
(153-154)get_wildcard_encodable_positions
(188-207)get_wildcard_encodable_positions
(188-189)generate_logtype_string
(209-280)generate_logtype_string
(209-213)get_bounds_of_next_potential_var
(33-151)get_bounds_of_next_potential_var
(33-38)components/core/src/clp/EncodedVariableInterpreter.hpp (14)
id
(56-56)value
(97-100)value
(107-110)add_dict_var
(271-276)add_dict_var
(558-571)add_dict_var
(558-563)logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)components/core/src/clp/Query.cpp (2)
add_dict_var
(102-105)add_dict_var
(103-103)
components/core/src/clp/GrepCore.cpp (3)
components/core/src/clp/string_utils/string_utils.cpp (2)
is_wildcard
(146-154)is_wildcard
(146-146)components/core/src/clp/GrepCore.hpp (3)
interpretations
(154-156)interpretation
(220-222)interpretation
(235-239)components/core/src/clp/EncodedVariableInterpreter.hpp (13)
logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)add_dict_var
(271-276)add_dict_var
(558-571)add_dict_var
(558-563)encoded_var
(57-57)encoded_var
(116-116)
components/core/src/clp/GrepCore.hpp (2)
components/core/src/clp/EncodedVariableInterpreter.hpp (13)
logtype
(63-65)logtype
(63-63)logtype
(71-73)logtype
(71-71)logtype
(79-81)logtype
(79-79)logtype
(87-89)logtype
(87-87)wildcard_search_dictionary_and_get_encoded_matches
(225-230)wildcard_search_dictionary_and_get_encoded_matches
(505-530)wildcard_search_dictionary_and_get_encoded_matches
(505-510)encoded_var
(57-57)encoded_var
(116-116)components/core/src/clp/GrepCore.cpp (6)
normalize_interpretations
(153-186)normalize_interpretations
(153-154)get_wildcard_encodable_positions
(188-207)get_wildcard_encodable_positions
(188-189)generate_logtype_string
(209-280)generate_logtype_string
(209-213)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: build-macos (macos-15, false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
components/core/src/clp/Query.hpp
(3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
⚙️ CodeRabbit configuration file
- Prefer
false == <expression>
rather than!<expression>
.
Files:
components/core/src/clp/Query.hpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: ubuntu-jammy-dynamic-linked-bins
- GitHub Check: package-image
- GitHub Check: ubuntu-jammy-static-linked-bins
- GitHub Check: manylinux_2_28-x86_64-dynamic-linked-bins
- GitHub Check: ubuntu-jammy-lint
- GitHub Check: manylinux_2_28-x86_64-static-linked-bins
- GitHub Check: musllinux_1_2-x86_64-dynamic-linked-bins
- GitHub Check: musllinux_1_2-x86_64-static-linked-bins
- GitHub Check: centos-stream-9-dynamic-linked-bins
- GitHub Check: centos-stream-9-static-linked-bins
- GitHub Check: lint-check (ubuntu-24.04)
- GitHub Check: build-macos (macos-15, false)
- GitHub Check: build-macos (macos-14, true)
- GitHub Check: build-macos (macos-15, true)
- GitHub Check: build-macos (macos-14, false)
- GitHub Check: rust-checks (ubuntu-24.04)
- GitHub Check: rust-checks (ubuntu-22.04)
- GitHub Check: rust-checks (macos-15)
🔇 Additional comments (1)
components/core/src/clp/Query.hpp (1)
162-162
: LGTM!Default-initializing
m_wildcard_match_required
tofalse
is a good practice that prevents undefined behaviour and ensures deterministic behaviour whenSubQuery
objects are compared. The default value is semantically correct.
Description
This PR updates schema-based search in CLP to leverage the new
log_surgeon::Query
class to improve accuracy on schema compressed archives. This class improves accuracy by performing DFA intersections between a user's query and the variables in the schema. When generating subqueries using this class the following must be done:QueryInterpretation
matches the logtype and variable dictionaries.QueryInterpretation
into a set ofSubQuery
s if it matches the dictionaries.To compare against the dictionaries, CLP must consider the various combinations of encodable variable types a single
QueryInterpretation
can have. For example:Key changes:
process_raw_query
: updated to use thelog_surgeon::Query
class for DFA-based matching.generate_schema_sub_queries
: iterate over all encodable combinations of interpretations and add a sub query if it matches the dictionaries.generate_logtype_string
: generates a logtype string for a given interpretation and encodable variable maskget_wildcard_encodable_positions
: converts mask for encodable variables to array positions.process_schema_var_token
: adds variable to the sub query.get_bounds_of_next_potential_var
overload (schema case) andSearchToken
.Validation performed
Summary by CodeRabbit