Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: separate the concept of captures and tags; lexer now tracks mapping from variables to capture to tags to registers. #72

Open
wants to merge 577 commits into
base: main
Choose a base branch
from

Conversation

SharafMohamed
Copy link
Contributor

@SharafMohamed SharafMohamed commented Jan 13, 2025

References

Description

  • Previously tags were being used to refer to a single capture group, as well as the start and end markers for a capture group's position in the NFA.
    • The former has been changed to be referred to as a capture.
    • The latter is now stored as a unique unsigned integer.
  • To simplify information tracking and ownership transfer, the lexer is now responsible for keeping track of all the relational information it will need after parsing. This includes:
    • A map from each variable id to the capture id's for the groups the variable contains.
    • A map from each capture to its start and end tag.
    • A map from each tag to its final register.
  • Fix filer order in cmake.
  • Use *_id_t aliases to make it more clear what the intended purpose of the maps are.

Validation performed

  • Added new unit-test for the lexer's base functionality.
  • Added new unit-test for the lexer's capture group functionality. This includes testing the maps that can currently be assigned.

Summary by CodeRabbit

  • New Features

    • Enhanced regular expression processing and capture handling for improved error reporting and consistent token management.
    • Upgraded lexical analysis to provide more robust recognition and assignment of tokens, ensuring smoother operation.
    • Introduced a mechanism for generating unique identifiers and clear type definitions, bolstering overall system reliability.
    • Added new methods to the Lexer class for improved capture and tag management.
    • Introduced a new Capture class to replace the previous Tag class, streamlining capture handling.
  • Tests

    • Expanded test coverage to validate the new capture and lexing improvements, ensuring higher quality and stability.
    • Introduced unit tests for the new Capture class and enhanced tests for the lexer.
  • Chores

    • Streamlined build configuration and source management for improved development efficiency.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/test-lexer.cpp (2)

69-78: Fix typo in documentation.

There's a typo in the documentation: "inptut" should be "input".

- * end of inptut token.
+ * end of input token.

200-200: Fix clang-format violation.

The line length exceeds the formatting rules. Consider breaking the line into multiple lines.

-        auto const* regex_ast_cat_ptr = dynamic_cast<RegexASTCatByte*>(schema_var_ast.m_regex_ptr.get());
+        auto const* regex_ast_cat_ptr
+                = dynamic_cast<RegexASTCatByte*>(schema_var_ast.m_regex_ptr.get());
🧰 Tools
🪛 GitHub Actions: lint

[error] 200-200: code should be clang-formatted [-Wclang-format-violations]

src/log_surgeon/finite_automata/RegexAST.hpp (1)

113-126: Rename method to align with new terminology.

The method name add_to_nfa_with_negative_tags should be renamed to add_to_nfa_with_negative_captures to maintain consistency with the new capture-based terminology.

Apply this diff to rename the method:

-    add_to_nfa_with_negative_tags(Nfa<TypedNfaState>* nfa, TypedNfaState* end_state) const -> void {
+    add_to_nfa_with_negative_captures(Nfa<TypedNfaState>* nfa, TypedNfaState* end_state) const -> void {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 439895b and 27ef30b.

📒 Files selected for processing (6)
  • src/log_surgeon/Lexer.tpp (3 hunks)
  • src/log_surgeon/LexicalRule.hpp (1 hunks)
  • src/log_surgeon/finite_automata/Dfa.hpp (1 hunks)
  • src/log_surgeon/finite_automata/RegexAST.hpp (21 hunks)
  • src/log_surgeon/finite_automata/TaggedTransition.hpp (3 hunks)
  • tests/test-lexer.cpp (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/log_surgeon/finite_automata/Dfa.hpp
  • src/log_surgeon/Lexer.tpp
  • src/log_surgeon/finite_automata/TaggedTransition.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==

**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • src/log_surgeon/LexicalRule.hpp
  • tests/test-lexer.cpp
  • src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (1)
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
🪛 GitHub Actions: lint
tests/test-lexer.cpp

[error] 200-200: code should be clang-formatted [-Wclang-format-violations]

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build (ubuntu-22.04, Release)
  • GitHub Check: build (ubuntu-22.04, Debug)
🔇 Additional comments (12)
tests/test-lexer.cpp (4)

1-30: LGTM!

The includes and using declarations are well-organized and appropriate for the changes.


138-163: LGTM!

The implementation thoroughly validates token scanning and includes a clear TODO for future work.


306-320: LGTM!

The test case is well-structured and effectively tests basic lexer functionality.


322-360: LGTM!

The test case thoroughly validates capture group functionality and includes a clear TODO for future work.

src/log_surgeon/LexicalRule.hpp (1)

26-28: LGTM!

The implementation is correct and follows best practices:

  • Uses [[nodiscard]] to prevent ignoring the return value.
  • Returns a const reference to avoid unnecessary copying.
  • Correctly forwards the call to get_subtree_positive_captures().
src/log_surgeon/finite_automata/RegexAST.hpp (7)

22-24: LGTM!

The header file changes and documentation updates correctly reflect the transition from tags to captures.

Also applies to: 32-38


86-106: LGTM!

The member functions are correctly updated to handle captures:

  • get_subtree_positive_captures() returns a const reference to avoid copying.
  • set_subtree_positive_captures() and add_subtree_positive_captures() correctly manage the capture list.
  • set_negative_captures() correctly handles negative captures.

134-152: LGTM!

The serialization changes effectively use modern C++ features:

  • Range-based transformations for clean iteration.
  • fmt::format for string formatting.
  • Proper handling of empty captures.

155-156: LGTM!

The member variables are correctly updated to use Capture pointers.


647-664: Move null checks to constructor body.

Throwing exceptions in the member initializer list can lead to issues with object construction and exception safety.

Apply this diff to move the null checks into the constructor body:

-        : m_group_regex_ast{(
-              nullptr == group_regex_ast
-                      ? throw std::invalid_argument("Group regex AST cannot be null")
-                      : std::move(group_regex_ast)
-          )},
-          m_capture{
-              nullptr == capture ? throw std::invalid_argument("Capture cannot be null")
-                                 : std::move(capture)
-          } {
+        {
+            if (nullptr == group_regex_ast) {
+                throw std::invalid_argument("Group regex AST cannot be null");
+            }
+            if (nullptr == capture) {
+                throw std::invalid_argument("Capture cannot be null");
+            }
+            m_group_regex_ast = std::move(group_regex_ast);
+            m_capture = std::move(capture);

712-717: Rename methods to align with new naming convention.

For consistency with the new naming convention:

Apply this diff to rename the methods:

-    [[nodiscard]] auto get_group_name() const -> std::string_view { return m_capture->get_name(); }
+    [[nodiscard]] auto get_capture_name() const -> std::string_view { return m_capture->get_name(); }

-    [[nodiscard]] auto get_group_regex_ast(
+    [[nodiscard]] auto get_capture_regex_ast(

908-939: LGTM!

The NFA construction diagram accurately represents the structure with:

  • Positive tagged start transition for capture start.
  • Epsilon transition to nested NFA.
  • Negative tagged transition with negative captures.
  • Positive tagged end transition for capture end.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/log_surgeon/Lexer.hpp (2)

149-156: Consider using contains and at for map access.

Replace iterator-based map access with contains and at for consistency with other functions.

-    auto const tag_id_pair{m_capture_id_to_tag_id_pair.find(capture_id)};
-    if (m_capture_id_to_tag_id_pair.end() == tag_id_pair) {
-        return std::nullopt;
-    }
-    return tag_id_pair->second;
+    if (m_capture_id_to_tag_id_pair.contains(capture_id)) {
+        return m_capture_id_to_tag_id_pair.at(capture_id);
+    }
+    return std::nullopt;

164-171: Consider using contains and at for map access.

Replace iterator-based map access with contains and at for consistency with other functions.

-    auto const it{m_tag_to_register_id.find(tag_id)};
-    if (m_tag_to_register_id.end() == it) {
-        return std::nullopt;
-    }
-    return it->second;
+    if (m_tag_to_register_id.contains(tag_id)) {
+        return m_tag_to_register_id.at(tag_id);
+    }
+    return std::nullopt;
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 27ef30b and 2d77ee6.

📒 Files selected for processing (2)
  • src/log_surgeon/Lexer.hpp (5 hunks)
  • tests/test-lexer.cpp (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==

**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • tests/test-lexer.cpp
  • src/log_surgeon/Lexer.hpp
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build (ubuntu-22.04, Release)
  • GitHub Check: build (ubuntu-22.04, Debug)
🔇 Additional comments (6)
src/log_surgeon/Lexer.hpp (2)

136-142: LGTM!

The function is well-documented and uses safe map access methods.


178-197: LGTM!

The function is well-documented, follows coding guidelines, and uses modern C++ features effectively.

tests/test-lexer.cpp (4)

99-136: LGTM!

The function is well-documented, follows coding guidelines, and uses safe map operations.


138-163: LGTM! Note the TODO comment about register values.

The function is well-documented and has comprehensive assertions. The TODO comment indicates that additional tests for register values will be needed when simulation is implemented.

Please track the TODO comment at line 162 to ensure it's addressed when simulation is implemented.


307-321: LGTM!

The test case effectively verifies basic lexer functionality with both successful and unsuccessful matches.


323-361: LGTM! Note the TODO comment about TDFA's determinization.

The test case effectively verifies lexer functionality with capture groups. The TODO comment indicates that additional tests for register-related functions will be needed when TDFA's determinization is implemented.

Please track the TODO comment at lines 355-356 to ensure it's addressed when TDFA's determinization is implemented.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
tests/test-lexer.cpp (5)

104-108: Simplify the delimiter collection loop using a range-based for loop.

The current loop can be simplified for better readability.

-    for (uint32_t i{0}; i < log_surgeon::cSizeOfByte; ++i) {
-        if (lexer.is_delimiter(i)) {
-            lexer_delimiters.push_back(i);
-        }
-    }
+    for (uint32_t i : std::views::iota(0U, log_surgeon::cSizeOfByte)) {
+        if (lexer.is_delimiter(i)) {
+            lexer_delimiters.push_back(i);
+        }
+    }

70-71: Fix typo in function documentation.

There's a typo in the documentation: "inptut" should be "input".

- * end of inptut token.
+ * end of input token.

162-162: Enhance TODO comment clarity.

The TODO comment should be more descriptive about what needs to be checked and why.

-    // TODO: add check for register values when simulation is implemented.
+    // TODO: Add verification of register values after implementing the NFA simulation to ensure correct state tracking.

355-356: Enhance TODO comment clarity in capture groups test.

The TODO comment should be more descriptive about what needs to be checked and why.

-    // TODO: Add check for `get_reg_id_from_tag_id` and `get_reg_ids_from_capture_id` when TDFA's
-    // determinization is implemented.
+    // TODO: Add verification of register ID mappings after implementing TDFA determinization to ensure
+    // correct register allocation for capture groups.

62-67: Enhance function documentation.

The documentation could be more detailed about:

  • The purpose of constant delimiters
  • The expected state of the lexer after initialization
  • Any side effects or assumptions
 /**
  * Initializes the lexer with the constant delimiters and the given schema.
+ * The constant delimiters (space and newline) are used to separate tokens in the input.
+ * The lexer's symbol mappings are initialized based on the schema variables.
+ * Assumes the lexer is in a clean state before initialization.
  * @param schema Contains the variables to add to the lexer.
  * @param lexer Returns the initialized parser.
  */
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d77ee6 and 40cab9d.

📒 Files selected for processing (2)
  • src/log_surgeon/finite_automata/RegexAST.hpp (21 hunks)
  • tests/test-lexer.cpp (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==

**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • tests/test-lexer.cpp
  • src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (1)
src/log_surgeon/finite_automata/RegexAST.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🪛 GitHub Actions: lint
src/log_surgeon/finite_automata/RegexAST.hpp

[error] 712-712: code should be clang-formatted


[error] 712-712: code should be clang-formatted

⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build (ubuntu-22.04, Release)
  • GitHub Check: build (ubuntu-22.04, Debug)
🔇 Additional comments (8)
tests/test-lexer.cpp (1)

307-321: Consider adding edge cases to basic lexer tests.

The current test case only verifies basic token matching. Consider adding tests for:

  • Empty input
  • Input with multiple delimiters
  • Input with special characters
src/log_surgeon/finite_automata/RegexAST.hpp (7)

23-23: LGTM! Documentation updates align with the new capture-based system.

The header file change and documentation updates accurately reflect the transition from tags to captures.

Also applies to: 33-38


86-102: LGTM! Methods updated consistently to use captures.

The changes maintain the same functionality while transitioning from tags to captures, with consistent naming conventions.

Also applies to: 104-106, 115-117


633-634: LGTM! Documentation clearly states non-null requirements.

The documentation accurately reflects the requirements for both m_capture and m_capture_regex_ast.


648-664: Move exception handling from initializer list to constructor body.

Throwing exceptions in the member initializer list can lead to issues with object construction and exception safety.

Apply this diff to move the null checks into the constructor body:

-            : m_capture_regex_ast{(
-                      nullptr == capture_regex_ast
-                              ? throw std::invalid_argument("Group regex AST cannot be null")
-                              : std::move(capture_regex_ast)
-              )},
-              m_capture{
-                      nullptr == capture ? throw std::invalid_argument("Capture cannot be null")
-                                         : std::move(capture)
-              } {
+            {
+            if (nullptr == capture_regex_ast) {
+                throw std::invalid_argument("Group regex AST cannot be null");
+            }
+            if (nullptr == capture) {
+                throw std::invalid_argument("Capture cannot be null");
+            }
+            m_capture_regex_ast = std::move(capture_regex_ast);
+            m_capture = std::move(capture);

             RegexAST<TypedNfaState>::set_subtree_positive_captures(
                     m_capture_regex_ast->get_subtree_positive_captures()
             );
             RegexAST<TypedNfaState>::add_subtree_positive_captures({m_capture.get()});
         }

668-673: LGTM! Copy constructor correctly handles deep copying.

The copy constructor properly clones both the regex AST and capture objects, maintaining the integrity of the capture hierarchy.


914-934: LGTM! NFA construction diagram and code updated consistently.

The diagram and implementation correctly reflect the new capture-based transition system.

Also applies to: 941-941, 947-947


953-960: LGTM! Serialization correctly handles capture names.

The serialization logic properly converts capture names to UTF-32 strings and maintains consistent formatting.

src/log_surgeon/finite_automata/RegexAST.hpp Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
src/log_surgeon/Lexer.tpp (1)

408-408: Address the TODO comment about DFA ignoring captures.

The TODO comment indicates that DFA treats capture groups differently than intended. This could lead to incorrect parsing behavior.

Do you want me to help track this issue by opening a new issue?

src/log_surgeon/finite_automata/TaggedTransition.hpp (1)

17-17: Documentation needs update.

The comment should be updated to reflect that a tag ID (not a tag) has been matched.

- * Represents an NFA transition indicating that a tag has been matched.
+ * Represents an NFA transition indicating that a tag ID has been matched.
src/log_surgeon/finite_automata/RegexAST.hpp (2)

633-634: Documentation needs update.

The note about non-null expectations should be more specific.

- * - `m_capture` is always expected to be non-null.
- * - `m_capture_regex_ast` is always expected to be non-null.
+ * - `m_capture` must be non-null as it represents the capture group being matched.
+ * - `m_capture_regex_ast` must be non-null as it contains the regex pattern for the capture group.

916-916: Update diagram comments.

The diagram comments should reflect the transition from tags to captures.

-                    | `m_capture` start
+                    | `m_capture` start ID
-                    | `m_capture` end
-                    | (positive tagged end transition)
+                    | `m_capture` end ID
+                    | (positive capture end transition)

Also applies to: 935-936

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40cab9d and ef3bc8e.

📒 Files selected for processing (9)
  • CMakeLists.txt (2 hunks)
  • src/log_surgeon/Aliases.hpp (1 hunks)
  • src/log_surgeon/Lexer.hpp (5 hunks)
  • src/log_surgeon/Lexer.tpp (3 hunks)
  • src/log_surgeon/finite_automata/Nfa.hpp (3 hunks)
  • src/log_surgeon/finite_automata/NfaState.hpp (3 hunks)
  • src/log_surgeon/finite_automata/RegexAST.hpp (21 hunks)
  • src/log_surgeon/finite_automata/TaggedTransition.hpp (3 hunks)
  • tests/CMakeLists.txt (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • src/log_surgeon/Aliases.hpp
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/CMakeLists.txt
  • src/log_surgeon/finite_automata/NfaState.hpp
  • src/log_surgeon/finite_automata/Nfa.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==

**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • src/log_surgeon/finite_automata/TaggedTransition.hpp
  • src/log_surgeon/Lexer.hpp
  • src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (2)
src/log_surgeon/finite_automata/TaggedTransition.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
src/log_surgeon/finite_automata/RegexAST.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
🔇 Additional comments (17)
src/log_surgeon/Lexer.hpp (7)

7-7: LGTM!

The added includes are necessary for the new functionality.

Also applies to: 10-10, 13-13


130-136: LGTM!

The function uses safe map access patterns and follows the coding guidelines.


143-150: LGTM!

The function uses safe map access patterns and follows the coding guidelines.


158-165: LGTM!

The function uses safe map access patterns and follows the coding guidelines.


172-191: LGTM!

The function uses safe map access patterns, follows the coding guidelines, and has proper error handling.


193-194: Move public member variables to private scope.

These member variables should be private to maintain encapsulation. This issue is tracked in #93.


219-221: LGTM!

The member variables use appropriate types and follow naming conventions.

src/log_surgeon/Lexer.tpp (3)

7-7: LGTM!

The added include is necessary for throwing std::invalid_argument.


363-364: LGTM!

The method signatures now use the more specific rule_id_t type instead of uint32_t.

Also applies to: 370-371


382-407: LGTM!

The method properly processes captures and their IDs, uses safe map access patterns, follows the coding guidelines, and has proper error handling.

CMakeLists.txt (1)

65-86: LGTM! Source files list updated correctly.

The changes properly reflect the transition from tag-based to capture-based system by:

  1. Adding new files (Aliases.hpp, Capture.hpp) for the new functionality
  2. Removing obsolete files (Tag.hpp)
  3. Maintaining correct file order

Also applies to: 111-111

src/log_surgeon/finite_automata/TaggedTransition.hpp (3)

8-9: LGTM! Header includes updated correctly.

Added necessary headers for the new implementation:

  • <utility> for std::move
  • <vector> for std::vector
  • log_surgeon/Aliases.hpp for tag_id_t

Also applies to: 13-13


23-25: LGTM! PositiveTaggedTransition updated correctly.

The changes properly transition from Tag pointers to tag IDs:

  1. Constructor now takes tag_id_t
  2. Member variable uses tag_id_t

Also applies to: 44-44


55-57: LGTM! NegativeTaggedTransition updated correctly.

The changes properly transition from Tag pointers to tag IDs:

  1. Constructor now takes vector of tag_id_t
  2. Member variable uses vector<tag_id_t>
  3. Correctly uses std::move for efficiency

Also applies to: 76-76

src/log_surgeon/finite_automata/RegexAST.hpp (3)

23-23: LGTM! Header includes updated correctly.

Replaced Tag.hpp with Capture.hpp to reflect the new implementation.


33-38: LGTM! Documentation updated correctly.

The class documentation properly reflects the transition from tags to captures.


86-101: LGTM! Base class methods updated correctly.

Methods properly handle Capture pointers instead of Tag pointers:

  1. get_subtree_positive_captures
  2. set_subtree_positive_captures
  3. add_subtree_positive_captures

src/log_surgeon/finite_automata/RegexAST.hpp Outdated Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/log_surgeon/finite_automata/RegexAST.hpp (2)

118-118: Consider using empty() instead of comparing with false.

The condition false == m_negative_captures.empty() can be simplified to !m_negative_captures.empty() for better readability.

-        if (false == m_negative_captures.empty()) {
+        if (!m_negative_captures.empty()) {

910-940: Update diagram comments to reflect capture-based terminology.

The diagram's comments still use tag-based terminology in some places. Consider updating them to consistently use capture-based terminology.

-    // The NFA constructed for a capture group follows the structure below, with tagged transitions
+    // The NFA constructed for a capture group follows the structure below, with capture transitions
-    //                    | (positive tagged start transition)
+    //                    | (positive capture start transition)
-    //                    | (negative tagged transition)
+    //                    | (negative capture transition)
-    //                    | (positive tagged end transition)
+    //                    | (positive capture end transition)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5e2e53 and ddae224.

📒 Files selected for processing (3)
  • src/log_surgeon/Lexer.hpp (5 hunks)
  • src/log_surgeon/finite_automata/RegexAST.hpp (21 hunks)
  • src/log_surgeon/finite_automata/TaggedTransition.hpp (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/log_surgeon/finite_automata/TaggedTransition.hpp
  • src/log_surgeon/Lexer.hpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==

**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • src/log_surgeon/finite_automata/RegexAST.hpp
🧠 Learnings (1)
src/log_surgeon/finite_automata/RegexAST.hpp (3)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#50
File: src/log_surgeon/finite_automata/Tag.hpp:0-0
Timestamp: 2024-11-18T16:45:46.074Z
Learning: The class `TagPositions` was removed from `src/log_surgeon/finite_automata/Tag.hpp` as it is no longer needed.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: build (ubuntu-22.04, Release)
  • GitHub Check: build (ubuntu-22.04, Debug)
🔇 Additional comments (2)
src/log_surgeon/finite_automata/RegexAST.hpp (2)

23-23: LGTM! Documentation and includes updated to reflect the new capture-based system.

The changes consistently replace tag-based terminology with capture-based terminology throughout the documentation and includes.

Also applies to: 32-38


728-728: LGTM! Consistent use of capture-based serialization across all derived classes.

The changes to use serialize_negative_captures are applied consistently throughout all derived classes.

Also applies to: 745-745, 776-776, 807-807, 840-840, 903-903, 962-962, 1118-1118

Copy link
Member

@LinZhihao-723 LinZhihao-723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last few comments and we're close to merge.

tests/test-lexer.cpp Outdated Show resolved Hide resolved
src/log_surgeon/Aliases.hpp Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants