-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add more debugging to test-lexer.cpp; Add lexer schema to NFA and DFA tests. #92
base: main
Are you sure you want to change the base?
feat: Add more debugging to test-lexer.cpp; Add lexer schema to NFA and DFA tests. #92
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
src/log_surgeon/Lexer.tpp (3)
246-247
:⚠️ Potential issueAdd null check before calling
is_accepting()
The code calls
dest_state->is_accepting()
without verifying thatdest_state
is non-null.Apply this diff to add the necessary null check:
auto* dest_state{state->get_dest_state(byte)}; -if (false == dest_state->is_accepting()) { +if (nullptr == dest_state || false == dest_state->is_accepting()) { // handle accordingly }
280-283
:⚠️ Potential issueAdd null check before pushing to
unvisited_states
The code pushes
dest_state
intounvisited_states
without verifying it's non-null.Apply this diff to add the necessary null check:
TypedDfaState const* dest_state{current_state->get_dest_state(byte)}; +if (nullptr == dest_state) { + continue; +} if (false == visited_states.contains(dest_state)) { unvisited_states.push(dest_state); }
87-98
:⚠️ Potential issueAdd null check before dereferencing
dest_state
At line 93,
dest_state
is dereferenced without verifying it's non-null. This could lead to a crash ifget_dest_state
returnsnullptr
.Apply this diff to add the necessary null check:
if (m_has_delimiters && !m_match) { dest_state = m_dfa->get_root()->get_dest_state(next_char); + if (nullptr == dest_state) { + continue; + } m_match = true; m_type_ids = &(dest_state->get_matching_variable_ids());
🧹 Nitpick comments (6)
tests/test-lexer.cpp (2)
185-185
: Track the TODO comment for register values verification.The TODO comment indicates that register values verification is pending the implementation of DFA simulation.
Would you like me to create an issue to track this TODO item? I can help implement the register values verification once the DFA simulation is complete.
151-186
: Enhance error handling in test_scanning_input.The function assumes successful token scanning but should handle potential error cases more robustly.
Consider adding test cases for error scenarios:
auto test_scanning_input(ByteLexer& lexer, std::string_view input, std::string_view rule_name) -> void { CAPTURE(input); - CAPTURE(symbol); + CAPTURE(rule_name); lexer.reset(); CAPTURE(serialize_id_symbol_map(lexer.m_id_symbol)); log_surgeon::ParserInputBuffer input_buffer; string token_string{input}; input_buffer.set_storage(token_string.data(), token_string.size(), 0, true); lexer.prepend_start_of_file_char(input_buffer); log_surgeon::Token token; auto error_code{lexer.scan(input_buffer, token)}; + CAPTURE(error_code); REQUIRE(nullptr != token.m_type_ids_ptr); CAPTURE(token.to_string_view()); CAPTURE(*token.m_type_ids_ptr); REQUIRE(log_surgeon::ErrorCode::Success == error_code);
src/log_surgeon/finite_automata/NfaState.hpp (1)
221-222
: Address the TODO comment regarding UTF-8 handling.The TODO comment indicates missing UTF-8 case handling in the BFS traversal. This could lead to incomplete traversal for UTF-8 characters.
Would you like me to help implement the UTF-8 case handling or create an issue to track this task?
src/log_surgeon/finite_automata/Nfa.hpp (1)
119-121
: Consider addressing the TODO comment about capture group naming.The TODO comment highlights a limitation in the current implementation where capture groups must have unique names. This could restrict valid use cases.
Would you like me to help design a solution that allows for non-unique capture group names?
src/log_surgeon/Lexer.tpp (2)
380-408
: Add validation for empty capture namesWhile the changes improve capture handling, consider adding validation to ensure capture names are not empty.
Add this validation before inserting into
m_symbol_id
:for (auto const& rule : m_rules) { for (auto const* capture : rule.get_captures()) { std::string const capture_name{capture->get_name()}; + if (capture_name.empty()) { + throw std::invalid_argument("Capture names cannot be empty"); + } if (m_symbol_id.contains(capture_name)) { throw std::invalid_argument("`m_rules` contains capture names that are not unique."); }
406-407
: Track DFA's capture handling limitationThe TODO comment indicates that DFA ignores captures, which could lead to functionality loss.
Would you like me to create an issue to track this limitation and propose a solution for handling captures in DFA?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
CMakeLists.txt
(2 hunks)src/log_surgeon/Aliases.hpp
(1 hunks)src/log_surgeon/Lexer.hpp
(5 hunks)src/log_surgeon/Lexer.tpp
(10 hunks)src/log_surgeon/LexicalRule.hpp
(3 hunks)src/log_surgeon/UniqueIdGenerator.hpp
(1 hunks)src/log_surgeon/finite_automata/Dfa.hpp
(5 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(6 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/PrefixTree.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(20 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(3 hunks)tests/CMakeLists.txt
(2 hunks)tests/test-lexer.cpp
(7 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/log_surgeon/Aliases.hpp
🚧 Files skipped from review as they are similar to previous changes (4)
- src/log_surgeon/UniqueIdGenerator.hpp
- src/log_surgeon/finite_automata/PrefixTree.hpp
- src/log_surgeon/finite_automata/Dfa.hpp
- src/log_surgeon/Lexer.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LexicalRule.hpp
tests/test-lexer.cpp
src/log_surgeon/finite_automata/TaggedTransition.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/Nfa.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LexicalRule.hpp
tests/test-lexer.cpp
src/log_surgeon/finite_automata/TaggedTransition.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/Nfa.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (4)
src/log_surgeon/finite_automata/TaggedTransition.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/RegexAST.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (16)
tests/test-lexer.cpp (2)
1-1
: Replace deprecated UTF conversion utilities.The code still uses deprecated
codecvt_utf8
andwstring_convert
. Consider using modern alternatives like the UTF-8 facilities in C++20's<text_encoding>
header or a third-party library like ICU.Also applies to: 108-109
338-396
: Well-structured test cases with improved debugging!The new test cases effectively validate lexer functionality with and without capture groups. The use of CAPTURE macros enhances debugging output, making it easier to identify failure causes.
src/log_surgeon/LexicalRule.hpp (1)
28-30
: LGTM! Well-designed getter method.The new
get_captures()
method is well-implemented with proper const correctness and efficient return by const reference.src/log_surgeon/finite_automata/TaggedTransition.hpp (2)
23-24
: LGTM! Improved type safety with tag IDs.The transition from
Tag const*
totag_id_t
inPositiveTaggedTransition
eliminates potential null pointer issues and improves type safety.Also applies to: 43-43
54-55
: LGTM! Enhanced vector handling.The transition from
std::vector<Tag const*>
tostd::vector<tag_id_t>
inNegativeTaggedTransition
simplifies memory management and improves type safety. The use ofstd::move
for the vector is efficient.Also applies to: 74-74
src/log_surgeon/finite_automata/NfaState.hpp (2)
41-41
: LGTM! Well-structured state management.The new member variables and constructor changes provide better state tracking with:
- Unique ID generation for states and tags
- Clear mapping of state relationships
- Proper memory management with smart pointers
Also applies to: 119-124
183-199
: Consider adding error handling for tag pair creation.The
new_start_and_end_states_for_capture
method assumes tag pair creation will always succeed. Consider adding error handling for potential failure cases.❌ Verification inconclusive
🏁 Script executed:
#!/bin/bash # Check for error handling in similar methods rg -A 5 "create.*tag" .
Length of output: 2360
Subject: Review Comment on Error Handling for Tag Pair Creation
The review comment suggests that the
new_start_and_end_states_for_capture
method should include error handling in case tag pair creation fails. Our search for similar methods (namely,get_or_create_capture_tag_pair
insrc/log_surgeon/finite_automata/Nfa.hpp
) confirms that error handling is not currently implemented there either—indicating that tag generation is assumed always to succeed.Since both methods rely on
m_tag_id_generator.generate_id()
without explicit error checks, we acknowledge the concern. Although similar patterns exist elsewhere, this area may benefit from robust error handling to cover potential failure paths (e.g., if ID generation fails or returns an invalid value).Please consider revising the
new_start_and_end_states_for_capture
method to include error handling logic to verify that valid tag pairs are created. This might include checking the result of each call togenerate_id()
and handling any exceptions or error codes accordingly.src/log_surgeon/finite_automata/Nfa.hpp (2)
136-144
: LGTM! Well-implemented capture tag pair management.The
get_or_create_capture_tag_pair
method efficiently manages tag pairs with:
- Proper caching of existing pairs
- Thread-safe ID generation
- Clear error conditions
235-252
: LGTM! Improved error handling in serialization.The serialization method now properly handles potential failures by:
- Using std::optional for error propagation
- Clear error conditions
- Efficient string building with fmt::format
src/log_surgeon/Lexer.tpp (2)
361-367
: LGTM! Improved type safety withrule_id_t
The change from
uint32_t const& id
torule_id_t const rule_id
enhances type safety by using a more specific type for rule IDs.
369-378
: LGTM! Consistent use ofrule_id_t
The change to use
rule_id_t
inget_rule
maintains consistency with the type system changes.src/log_surgeon/finite_automata/RegexAST.hpp (3)
33-39
: LGTM! Clear and accurate documentationThe updated documentation clearly explains the role of captures and their relationship with the AST nodes.
632-634
: LGTM! Well-documented invariantsThe documentation clearly specifies the non-null requirements for
m_capture
andm_capture_regex_ast
, which helps prevent null pointer issues.
652-657
: LGTM! Robust null pointer validationThe constructor properly validates the non-null requirements with clear error messages.
tests/CMakeLists.txt (1)
35-43
: LGTM! Improved test organizationThe changes improve readability with a better format and add necessary test coverage for new components.
CMakeLists.txt (1)
64-116
: LGTM! Well-organized source filesThe changes properly integrate new components while maintaining a logical organization.
References
Description
test-lexer.cpp
failed it was not possible to tell where it failed or why it failed.CAPTURE
s totest-lexer.cpp
.userId=(?<uID>123)
to NFA and DFA tests.Validation performed
test-lexer.cpp
.Summary by CodeRabbit
New Features
Refactor
Tests