-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: Unify epsilon, positive start, positive end, and negative transitions into spontaneous transitions. #76
base: main
Are you sure you want to change the base?
refactor: Unify epsilon, positive start, positive end, and negative transitions into spontaneous transitions. #76
Conversation
Co-authored-by: Lin Zhihao <[email protected]>
Co-authored-by: Lin Zhihao <[email protected]>
…ative position test as this is what is seen in practice when using negative positions.
…ionality was copied over correctly.
Co-authored-by: Lin Zhihao <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
tests/test-lexer.cpp (2)
141-166
: Consider adding register values verification.The TODO comment indicates missing verification of register values. This should be addressed after implementing the DFA simulation.
Would you like me to help create an issue to track the implementation of register values verification after the DFA simulation is complete?
326-364
: Consider adding register-related checks.The TODO comment indicates missing checks for
get_reg_id_from_tag_id
andget_reg_ids_from_capture_id
. These should be added after implementing TDFA's determinization.Would you like me to help create an issue to track the implementation of these additional checks after the TDFA determinization is complete?
src/log_surgeon/finite_automata/TaggedTransition.hpp (1)
48-48
: Update documentation for consistency.For consistency with the rest of the codebase, consider updating the documentation to mention "tag IDs" instead of "tags":
- * Represents an NFA transition indicating that multiple tags have been unmatched. + * Represents an NFA transition indicating that multiple tag IDs have been unmatched.src/log_surgeon/finite_automata/NfaState.hpp (1)
184-214
: Consider adding error details for serialization failures.The method currently returns
std::nullopt
without any context about why serialization failed. Consider enhancing error reporting by:
- Using a custom error type or
- Adding logging to capture failure reasons
src/log_surgeon/Lexer.hpp (2)
130-189
: Enhance const correctness in map lookups.Consider using
std::as_const()
when accessing the maps to ensure const correctness:- if (m_rule_id_to_capture_ids.contains(rule_id)) { - return m_rule_id_to_capture_ids.at(rule_id); + if (std::as_const(m_rule_id_to_capture_ids).contains(rule_id)) { + return std::as_const(m_rule_id_to_capture_ids).at(rule_id);
217-219
: Add documentation for member variables.Consider adding documentation comments for the new member variables to explain their purpose and relationships:
+ /// Maps rule IDs to their associated capture IDs std::unordered_map<rule_id_t, std::vector<capture_id_t>> m_rule_id_to_capture_ids; + /// Maps capture IDs to their corresponding start and end tag IDs std::unordered_map<capture_id_t, std::pair<tag_id_t, tag_id_t>> m_capture_id_to_tag_id_pair; + /// Maps tag IDs to their associated register IDs std::unordered_map<tag_id_t, reg_id_t> m_tag_to_reg_id;src/log_surgeon/finite_automata/Nfa.hpp (2)
143-148
: Optimize vector allocation.Consider reserving space for the tags vector upfront to avoid reallocations:
std::vector<tag_id_t> tags; + tags.reserve(captures.size() * 2); // Each capture needs 2 tags for (auto const capture : captures) {
106-108
: Address TODO about capture group naming limitations.The current design limits use cases by enforcing unique naming across capture groups. Would you like me to help design a solution that allows for scoped capture names?
src/log_surgeon/Lexer.tpp (1)
386-387
: Improve the error message for duplicate capture names.The current error message could be more descriptive by including the duplicate capture name.
- throw std::invalid_argument("`m_rules` contains capture names that are not unique." + throw std::invalid_argument(fmt::format( + "`m_rules` contains duplicate capture name: '{}'", capture_name)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
CMakeLists.txt
(2 hunks)src/log_surgeon/Aliases.hpp
(1 hunks)src/log_surgeon/Lexer.hpp
(5 hunks)src/log_surgeon/Lexer.tpp
(3 hunks)src/log_surgeon/LexicalRule.hpp
(2 hunks)src/log_surgeon/UniqueIdGenerator.hpp
(1 hunks)src/log_surgeon/finite_automata/Dfa.hpp
(1 hunks)src/log_surgeon/finite_automata/Nfa.hpp
(5 hunks)src/log_surgeon/finite_automata/NfaState.hpp
(6 hunks)src/log_surgeon/finite_automata/RegexAST.hpp
(20 hunks)src/log_surgeon/finite_automata/TaggedTransition.hpp
(3 hunks)tests/CMakeLists.txt
(2 hunks)tests/test-lexer.cpp
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- src/log_surgeon/finite_automata/Dfa.hpp
- src/log_surgeon/UniqueIdGenerator.hpp
- src/log_surgeon/LexicalRule.hpp
- tests/CMakeLists.txt
- CMakeLists.txt
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Aliases.hpp
tests/test-lexer.cpp
src/log_surgeon/finite_automata/TaggedTransition.hpp
src/log_surgeon/Lexer.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/Nfa.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
**/*.{cpp,hpp,java,js,jsx,ts,tsx}
: - Prefer false == <expression>
rather than !<expression>
.
src/log_surgeon/Aliases.hpp
tests/test-lexer.cpp
src/log_surgeon/finite_automata/TaggedTransition.hpp
src/log_surgeon/Lexer.hpp
src/log_surgeon/finite_automata/NfaState.hpp
src/log_surgeon/finite_automata/Nfa.hpp
src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (4)
src/log_surgeon/finite_automata/TaggedTransition.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/NfaState.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
src/log_surgeon/finite_automata/Nfa.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: build (ubuntu-22.04, Release)
- GitHub Check: build (ubuntu-22.04, Debug)
- GitHub Check: build (macos-latest, Release)
- GitHub Check: build (macos-latest, Debug)
🔇 Additional comments (18)
tests/test-lexer.cpp (3)
2-23
: LGTM! Well-organized includes and using declarations.The new includes and using declarations are properly organized and necessary for the added functionality.
102-139
: LGTM! Well-structured initialization function.The function properly initializes the lexer with delimiters and schema variables, following coding guidelines and best practices.
310-324
: LGTM! Comprehensive basic lexer test case.The test case properly validates both successful and unsuccessful token matching scenarios.
src/log_surgeon/finite_automata/TaggedTransition.hpp (3)
8-9
: LGTM! Header includes updated appropriately.The added headers support the transition from Tag pointers to tag IDs, and unnecessary headers have been removed.
Also applies to: 13-13
17-17
: LGTM! Successfully transitioned from Tag pointers to tag IDs.The changes effectively decouple the class from the Tag type while maintaining functionality.
Also applies to: 23-24, 37-37, 43-43
54-55
: LGTM! Successfully transitioned to using tag IDs.The implementation changes effectively maintain functionality while using the new tag ID type.
Also applies to: 68-68, 74-74
src/log_surgeon/Aliases.hpp (1)
1-14
: LGTM! Well-structured type aliases.The file is well-organized with proper include guards, namespace usage, and clear type aliases that enhance code readability and maintainability.
src/log_surgeon/finite_automata/NfaState.hpp (3)
16-17
: LGTM! Clean integration of spontaneous transitions.The includes and member variable changes align well with the PR's objective to unify various types of transitions into spontaneous transitions.
Also applies to: 111-111
35-41
: LGTM! Well-designed transition management.The constructor and methods provide a clean and type-safe interface for managing spontaneous transitions.
Also applies to: 49-51, 53-64
177-179
: LGTM! Proper boolean comparison pattern.The method correctly follows the project's coding guideline of using
false == <expression>
pattern.src/log_surgeon/finite_automata/Nfa.hpp (1)
110-110
: Ensure thread safety of UniqueIdGenerator.The UniqueIdGenerator is used to generate unique IDs in
get_or_create_capture_tag_pair
. If this code will be used in a multi-threaded context, consider making UniqueIdGenerator thread-safe.Also applies to: 124-128
src/log_surgeon/Lexer.tpp (3)
361-367
: LGTM! Type safety improvement.The change from
uint32_t
torule_id_t
enhances type safety and makes the code more maintainable.
369-378
: LGTM! Consistent type usage.The parameter type change maintains consistency with the
add_rule
method and improves type safety.
406-406
: Verify the impact of DFA ignoring captures.The TODO comment indicates that DFA ignores captures, which could lead to incorrect behavior.
✅ Verification successful
🏁 Script executed:
#!/bin/bash # Description: Search for capture-related code in DFA implementation. # Test: Search for capture-related code in DFA files rg -A 5 "capture" --glob "src/log_surgeon/finite_automata/Dfa*"Length of output: 63
I'll run a broader search to locate any implementation details or references to the "capture:user=" pattern in the repository. This will help us determine if any DFA code is handling captures or if they are indeed being ignored.
🏁 Script executed:
#!/bin/bash # Search for the "capture:user=" pattern in the entire repository to locate potential capture handling code rg "capture:user=" .Length of output: 149
DFA Captures Verification Outcome
We have confirmed that the DFA implementation does not handle capture groups. Our search for any capture-handling code returned only the original TODO comment (in src/log_surgeon/Lexer.tpp, line 406), and no additional logic was found in the DFA files. This indicates that the DFA currently ignores captures, which could cause issues if named capture groups are expected to be processed.
- src/log_surgeon/Lexer.tpp (line 406) contains the related TODO comment.
- No evidence of capture group processing was found in any DFA-related files.
src/log_surgeon/finite_automata/RegexAST.hpp (4)
33-38
: LGTM! Consistent terminology update.The renaming from tags to captures is consistent throughout the class and aligns with the PR objective.
Also applies to: 86-106
632-663
: LGTM! Robust parameter validation.The constructor correctly validates both parameters, maintaining the non-null requirement previously established for tags.
909-947
: LGTM! Well-documented NFA structure.The changes maintain the same NFA structure while transitioning to capture-based terminology. The documentation clearly explains the structure and transitions.
950-960
: LGTM! Consistent serialization format.The serialization logic maintains the same format while transitioning to capture-based terminology.
References
Description
Validation performed
Summary by CodeRabbit
New Features
Refactor
Tests