-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: separate the concept of captures and tags; lexer now tracks mapping from variables to capture to tags to registers. #72
base: main
Are you sure you want to change the base?
Conversation
…g_id; Remove error checking in favor of using .at().
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
tests/test-lexer.cpp (2)
69-78
: Fix typo in documentation.There's a typo in the documentation: "inptut" should be "input".
- * end of inptut token. + * end of input token.
200-200
: Fix clang-format violation.The line length exceeds the formatting rules. Consider breaking the line into multiple lines.
- auto const* regex_ast_cat_ptr = dynamic_cast<RegexASTCatByte*>(schema_var_ast.m_regex_ptr.get()); + auto const* regex_ast_cat_ptr + = dynamic_cast<RegexASTCatByte*>(schema_var_ast.m_regex_ptr.get());🧰 Tools
🪛 GitHub Actions: lint
[error] 200-200: code should be clang-formatted [-Wclang-format-violations]
src/log_surgeon/finite_automata/RegexAST.hpp (1)
113-126
: Rename method to align with new terminology.The method name
add_to_nfa_with_negative_tags
should be renamed toadd_to_nfa_with_negative_captures
to maintain consistency with the new capture-based terminology.Apply this diff to rename the method:
- add_to_nfa_with_negative_tags(Nfa<TypedNfaState>* nfa, TypedNfaState* end_state) const -> void { + add_to_nfa_with_negative_captures(Nfa<TypedNfaState>* nfa, TypedNfaState* end_state) const -> void {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
src/log_surgeon/Lexer.tpp
(3 hunks)src/log_surgeon/LexicalRule.hpp
(1 hunks)src/log_surgeon/finite_automata/Dfa.hpp
(1 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(21 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(3 hunks)tests/test-lexer.cpp
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/log_surgeon/finite_automata/Dfa.hpp
- src/log_surgeon/Lexer.tpp
- src/log_surgeon/finite_automata/TaggedTransition.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LexicalRule.hpp
tests/test-lexer.cpp
src/log_surgeon/finite_automata/RegexAST.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/LexicalRule.hpp
tests/test-lexer.cpp
src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (1)
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
🪛 GitHub Actions: lint
tests/test-lexer.cpp
[error] 200-200: code should be clang-formatted [-Wclang-format-violations]
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build (ubuntu-22.04, Release)
- GitHub Check: build (ubuntu-22.04, Debug)
🔇 Additional comments (12)
tests/test-lexer.cpp (4)
1-30
: LGTM!The includes and using declarations are well-organized and appropriate for the changes.
138-163
: LGTM!The implementation thoroughly validates token scanning and includes a clear TODO for future work.
306-320
: LGTM!The test case is well-structured and effectively tests basic lexer functionality.
322-360
: LGTM!The test case thoroughly validates capture group functionality and includes a clear TODO for future work.
src/log_surgeon/LexicalRule.hpp (1)
26-28
: LGTM!The implementation is correct and follows best practices:
- Uses
[[nodiscard]]
to prevent ignoring the return value.- Returns a const reference to avoid unnecessary copying.
- Correctly forwards the call to
get_subtree_positive_captures()
.src/log_surgeon/finite_automata/RegexAST.hpp (7)
22-24
: LGTM!The header file changes and documentation updates correctly reflect the transition from tags to captures.
Also applies to: 32-38
86-106
: LGTM!The member functions are correctly updated to handle captures:
get_subtree_positive_captures()
returns a const reference to avoid copying.set_subtree_positive_captures()
andadd_subtree_positive_captures()
correctly manage the capture list.set_negative_captures()
correctly handles negative captures.
134-152
: LGTM!The serialization changes effectively use modern C++ features:
- Range-based transformations for clean iteration.
fmt::format
for string formatting.- Proper handling of empty captures.
155-156
: LGTM!The member variables are correctly updated to use
Capture
pointers.
647-664
: Move null checks to constructor body.Throwing exceptions in the member initializer list can lead to issues with object construction and exception safety.
Apply this diff to move the null checks into the constructor body:
- : m_group_regex_ast{( - nullptr == group_regex_ast - ? throw std::invalid_argument("Group regex AST cannot be null") - : std::move(group_regex_ast) - )}, - m_capture{ - nullptr == capture ? throw std::invalid_argument("Capture cannot be null") - : std::move(capture) - } { + { + if (nullptr == group_regex_ast) { + throw std::invalid_argument("Group regex AST cannot be null"); + } + if (nullptr == capture) { + throw std::invalid_argument("Capture cannot be null"); + } + m_group_regex_ast = std::move(group_regex_ast); + m_capture = std::move(capture);
712-717
: Rename methods to align with new naming convention.For consistency with the new naming convention:
Apply this diff to rename the methods:
- [[nodiscard]] auto get_group_name() const -> std::string_view { return m_capture->get_name(); } + [[nodiscard]] auto get_capture_name() const -> std::string_view { return m_capture->get_name(); } - [[nodiscard]] auto get_group_regex_ast( + [[nodiscard]] auto get_capture_regex_ast(
908-939
: LGTM!The NFA construction diagram accurately represents the structure with:
- Positive tagged start transition for capture start.
- Epsilon transition to nested NFA.
- Negative tagged transition with negative captures.
- Positive tagged end transition for capture end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/log_surgeon/Lexer.hpp (2)
149-156
: Consider usingcontains
andat
for map access.Replace iterator-based map access with
contains
andat
for consistency with other functions.- auto const tag_id_pair{m_capture_id_to_tag_id_pair.find(capture_id)}; - if (m_capture_id_to_tag_id_pair.end() == tag_id_pair) { - return std::nullopt; - } - return tag_id_pair->second; + if (m_capture_id_to_tag_id_pair.contains(capture_id)) { + return m_capture_id_to_tag_id_pair.at(capture_id); + } + return std::nullopt;
164-171
: Consider usingcontains
andat
for map access.Replace iterator-based map access with
contains
andat
for consistency with other functions.- auto const it{m_tag_to_register_id.find(tag_id)}; - if (m_tag_to_register_id.end() == it) { - return std::nullopt; - } - return it->second; + if (m_tag_to_register_id.contains(tag_id)) { + return m_tag_to_register_id.at(tag_id); + } + return std::nullopt;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/log_surgeon/Lexer.hpp
(5 hunks)tests/test-lexer.cpp
(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-lexer.cpp
src/log_surgeon/Lexer.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-lexer.cpp
src/log_surgeon/Lexer.hpp
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build (ubuntu-22.04, Release)
- GitHub Check: build (ubuntu-22.04, Debug)
🔇 Additional comments (6)
src/log_surgeon/Lexer.hpp (2)
136-142
: LGTM!The function is well-documented and uses safe map access methods.
178-197
: LGTM!The function is well-documented, follows coding guidelines, and uses modern C++ features effectively.
tests/test-lexer.cpp (4)
99-136
: LGTM!The function is well-documented, follows coding guidelines, and uses safe map operations.
138-163
: LGTM! Note the TODO comment about register values.The function is well-documented and has comprehensive assertions. The TODO comment indicates that additional tests for register values will be needed when simulation is implemented.
Please track the TODO comment at line 162 to ensure it's addressed when simulation is implemented.
307-321
: LGTM!The test case effectively verifies basic lexer functionality with both successful and unsuccessful matches.
323-361
: LGTM! Note the TODO comment about TDFA's determinization.The test case effectively verifies lexer functionality with capture groups. The TODO comment indicates that additional tests for register-related functions will be needed when TDFA's determinization is implemented.
Please track the TODO comment at lines 355-356 to ensure it's addressed when TDFA's determinization is implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
tests/test-lexer.cpp (5)
104-108
: Simplify the delimiter collection loop using a range-based for loop.The current loop can be simplified for better readability.
- for (uint32_t i{0}; i < log_surgeon::cSizeOfByte; ++i) { - if (lexer.is_delimiter(i)) { - lexer_delimiters.push_back(i); - } - } + for (uint32_t i : std::views::iota(0U, log_surgeon::cSizeOfByte)) { + if (lexer.is_delimiter(i)) { + lexer_delimiters.push_back(i); + } + }
70-71
: Fix typo in function documentation.There's a typo in the documentation: "inptut" should be "input".
- * end of inptut token. + * end of input token.
162-162
: Enhance TODO comment clarity.The TODO comment should be more descriptive about what needs to be checked and why.
- // TODO: add check for register values when simulation is implemented. + // TODO: Add verification of register values after implementing the NFA simulation to ensure correct state tracking.
355-356
: Enhance TODO comment clarity in capture groups test.The TODO comment should be more descriptive about what needs to be checked and why.
- // TODO: Add check for `get_reg_id_from_tag_id` and `get_reg_ids_from_capture_id` when TDFA's - // determinization is implemented. + // TODO: Add verification of register ID mappings after implementing TDFA determinization to ensure + // correct register allocation for capture groups.
62-67
: Enhance function documentation.The documentation could be more detailed about:
- The purpose of constant delimiters
- The expected state of the lexer after initialization
- Any side effects or assumptions
/** * Initializes the lexer with the constant delimiters and the given schema. + * The constant delimiters (space and newline) are used to separate tokens in the input. + * The lexer's symbol mappings are initialized based on the schema variables. + * Assumes the lexer is in a clean state before initialization. * @param schema Contains the variables to add to the lexer. * @param lexer Returns the initialized parser. */
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/log_surgeon/finite_automata/RegexAST.hpp
(21 hunks)tests/test-lexer.cpp
(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-lexer.cpp
src/log_surgeon/finite_automata/RegexAST.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
tests/test-lexer.cpp
src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (1)
src/log_surgeon/finite_automata/RegexAST.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🪛 GitHub Actions: lint
src/log_surgeon/finite_automata/RegexAST.hpp
[error] 712-712: code should be clang-formatted
[error] 712-712: code should be clang-formatted
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build (ubuntu-22.04, Release)
- GitHub Check: build (ubuntu-22.04, Debug)
🔇 Additional comments (8)
tests/test-lexer.cpp (1)
307-321
: Consider adding edge cases to basic lexer tests.The current test case only verifies basic token matching. Consider adding tests for:
- Empty input
- Input with multiple delimiters
- Input with special characters
src/log_surgeon/finite_automata/RegexAST.hpp (7)
23-23
: LGTM! Documentation updates align with the new capture-based system.The header file change and documentation updates accurately reflect the transition from tags to captures.
Also applies to: 33-38
86-102
: LGTM! Methods updated consistently to use captures.The changes maintain the same functionality while transitioning from tags to captures, with consistent naming conventions.
Also applies to: 104-106, 115-117
633-634
: LGTM! Documentation clearly states non-null requirements.The documentation accurately reflects the requirements for both
m_capture
andm_capture_regex_ast
.
648-664
: Move exception handling from initializer list to constructor body.Throwing exceptions in the member initializer list can lead to issues with object construction and exception safety.
Apply this diff to move the null checks into the constructor body:
- : m_capture_regex_ast{( - nullptr == capture_regex_ast - ? throw std::invalid_argument("Group regex AST cannot be null") - : std::move(capture_regex_ast) - )}, - m_capture{ - nullptr == capture ? throw std::invalid_argument("Capture cannot be null") - : std::move(capture) - } { + { + if (nullptr == capture_regex_ast) { + throw std::invalid_argument("Group regex AST cannot be null"); + } + if (nullptr == capture) { + throw std::invalid_argument("Capture cannot be null"); + } + m_capture_regex_ast = std::move(capture_regex_ast); + m_capture = std::move(capture); RegexAST<TypedNfaState>::set_subtree_positive_captures( m_capture_regex_ast->get_subtree_positive_captures() ); RegexAST<TypedNfaState>::add_subtree_positive_captures({m_capture.get()}); }
668-673
: LGTM! Copy constructor correctly handles deep copying.The copy constructor properly clones both the regex AST and capture objects, maintaining the integrity of the capture hierarchy.
914-934
: LGTM! NFA construction diagram and code updated consistently.The diagram and implementation correctly reflect the new capture-based transition system.
Also applies to: 941-941, 947-947
953-960
: LGTM! Serialization correctly handles capture names.The serialization logic properly converts capture names to UTF-32 strings and maintains consistent formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
src/log_surgeon/Lexer.tpp (1)
408-408
: Address the TODO comment about DFA ignoring captures.The TODO comment indicates that DFA treats capture groups differently than intended. This could lead to incorrect parsing behavior.
Do you want me to help track this issue by opening a new issue?
src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
17-17
: Documentation needs update.The comment should be updated to reflect that a tag ID (not a tag) has been matched.
- * Represents an NFA transition indicating that a tag has been matched. + * Represents an NFA transition indicating that a tag ID has been matched.src/log_surgeon/finite_automata/RegexAST.hpp (2)
633-634
: Documentation needs update.The note about non-null expectations should be more specific.
- * - `m_capture` is always expected to be non-null. - * - `m_capture_regex_ast` is always expected to be non-null. + * - `m_capture` must be non-null as it represents the capture group being matched. + * - `m_capture_regex_ast` must be non-null as it contains the regex pattern for the capture group.
916-916
: Update diagram comments.The diagram comments should reflect the transition from tags to captures.
- | `m_capture` start + | `m_capture` start ID - | `m_capture` end - | (positive tagged end transition) + | `m_capture` end ID + | (positive capture end transition)Also applies to: 935-936
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CMakeLists.txt
(2 hunks)src/log_surgeon/Aliases.hpp
(1 hunks)src/log_surgeon/Lexer.hpp
(5 hunks)src/log_surgeon/Lexer.tpp
(3 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(3 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(3 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(21 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(3 hunks)tests/CMakeLists.txt
(2 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/log_surgeon/Aliases.hpp
🚧 Files skipped from review as they are similar to previous changes (3)
- tests/CMakeLists.txt
- src/log_surgeon/finite_automata/NfaState.hpp
- src/log_surgeon/finite_automata/Nfa.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/TaggedTransition.hpp
src/log_surgeon/Lexer.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/TaggedTransition.hpp
src/log_surgeon/Lexer.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (2)
src/log_surgeon/finite_automata/TaggedTransition.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
src/log_surgeon/finite_automata/RegexAST.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (17)
src/log_surgeon/Lexer.hpp (7)
7-7
: LGTM!The added includes are necessary for the new functionality.
Also applies to: 10-10, 13-13
130-136
: LGTM!The function uses safe map access patterns and follows the coding guidelines.
143-150
: LGTM!The function uses safe map access patterns and follows the coding guidelines.
158-165
: LGTM!The function uses safe map access patterns and follows the coding guidelines.
172-191
: LGTM!The function uses safe map access patterns, follows the coding guidelines, and has proper error handling.
193-194
: Move public member variables to private scope.These member variables should be private to maintain encapsulation. This issue is tracked in #93.
219-221
: LGTM!The member variables use appropriate types and follow naming conventions.
src/log_surgeon/Lexer.tpp (3)
7-7
: LGTM!The added include is necessary for throwing std::invalid_argument.
363-364
: LGTM!The method signatures now use the more specific rule_id_t type instead of uint32_t.
Also applies to: 370-371
382-407
: LGTM!The method properly processes captures and their IDs, uses safe map access patterns, follows the coding guidelines, and has proper error handling.
CMakeLists.txt (1)
65-86
: LGTM! Source files list updated correctly.The changes properly reflect the transition from tag-based to capture-based system by:
- Adding new files (Aliases.hpp, Capture.hpp) for the new functionality
- Removing obsolete files (Tag.hpp)
- Maintaining correct file order
Also applies to: 111-111
src/log_surgeon/finite_automata/TaggedTransition.hpp (3)
8-9
: LGTM! Header includes updated correctly.Added necessary headers for the new implementation:
<utility>
for std::move<vector>
for std::vectorlog_surgeon/Aliases.hpp
for tag_id_tAlso applies to: 13-13
23-25
: LGTM! PositiveTaggedTransition updated correctly.The changes properly transition from Tag pointers to tag IDs:
- Constructor now takes tag_id_t
- Member variable uses tag_id_t
Also applies to: 44-44
55-57
: LGTM! NegativeTaggedTransition updated correctly.The changes properly transition from Tag pointers to tag IDs:
- Constructor now takes vector of tag_id_t
- Member variable uses vector<tag_id_t>
- Correctly uses std::move for efficiency
Also applies to: 76-76
src/log_surgeon/finite_automata/RegexAST.hpp (3)
23-23
: LGTM! Header includes updated correctly.Replaced Tag.hpp with Capture.hpp to reflect the new implementation.
33-38
: LGTM! Documentation updated correctly.The class documentation properly reflects the transition from tags to captures.
86-101
: LGTM! Base class methods updated correctly.Methods properly handle Capture pointers instead of Tag pointers:
- get_subtree_positive_captures
- set_subtree_positive_captures
- add_subtree_positive_captures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/log_surgeon/finite_automata/RegexAST.hpp (2)
118-118
: Consider usingempty()
instead of comparing withfalse
.The condition
false == m_negative_captures.empty()
can be simplified to!m_negative_captures.empty()
for better readability.- if (false == m_negative_captures.empty()) { + if (!m_negative_captures.empty()) {
910-940
: Update diagram comments to reflect capture-based terminology.The diagram's comments still use tag-based terminology in some places. Consider updating them to consistently use capture-based terminology.
- // The NFA constructed for a capture group follows the structure below, with tagged transitions + // The NFA constructed for a capture group follows the structure below, with capture transitions - // | (positive tagged start transition) + // | (positive capture start transition) - // | (negative tagged transition) + // | (negative capture transition) - // | (positive tagged end transition) + // | (positive capture end transition)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/log_surgeon/Lexer.hpp
(5 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(21 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/log_surgeon/finite_automata/TaggedTransition.hpp
- src/log_surgeon/Lexer.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexAST.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (1)
src/log_surgeon/finite_automata/RegexAST.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: build (ubuntu-22.04, Release)
- GitHub Check: build (ubuntu-22.04, Debug)
🔇 Additional comments (2)
src/log_surgeon/finite_automata/RegexAST.hpp (2)
23-23
: LGTM! Documentation and includes updated to reflect the new capture-based system.The changes consistently replace tag-based terminology with capture-based terminology throughout the documentation and includes.
Also applies to: 32-38
728-728
: LGTM! Consistent use of capture-based serialization across all derived classes.The changes to use
serialize_negative_captures
are applied consistently throughout all derived classes.Also applies to: 745-745, 776-776, 807-807, 840-840, 903-903, 962-962, 1118-1118
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last few comments and we're close to merge.
References
Description
*_id_t
aliases to make it more clear what the intended purpose of the maps are.Validation performed
Summary by CodeRabbit
New Features
Lexer
class for improved capture and tag management.Capture
class to replace the previousTag
class, streamlining capture handling.Tests
Capture
class and enhanced tests for the lexer.Chores