Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Rename next to dest_state and next() to get_dest_state(). #86

Open
wants to merge 633 commits into
base: main
Choose a base branch
from

Conversation

SharafMohamed
Copy link
Contributor

@SharafMohamed SharafMohamed commented Feb 3, 2025

References

git fetch upstream pull/85/head:pr-85
git fetch upstream pull/86/head:pr-86
git diff pr-85 pr-86

Description

  • Rename next and next() to dest_state and get_dest_state(), respectively.

Validation performed

  • Existing tests pass.

Summary by CodeRabbit

  • New Features

    • Introduced enhanced automata transitions with a new mechanism for handling spontaneous transitions and tagging, offering improved flexibility.
  • Refactor

    • Streamlined state transition and acceptance processes for increased clarity and robustness.
    • Improved error handling in input parsing for a smoother user experience.
  • Tests

    • Updated output validation and test coverage to align with the new transition and tagging mechanisms.
    • Enforced stricter immutability in test variables to ensure consistent outcomes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
src/log_surgeon/finite_automata/NfaState.hpp (2)

88-88: Doc string alignment.
The new comment references SpontaneousTransition::serialize. Double-check that the intended usage matches the updated method names and that the doc lines remain accurate.


92-92: Returning a reference to bool is risky.
Returning bool const& can be confusing, as copying a bool is typically preferred. Consider returning by value instead to avoid possible misuse.

- [[nodiscard]] auto is_accepting() const -> bool const& { return m_accepting; }
+ [[nodiscard]] auto is_accepting() const -> bool { return m_accepting; }
src/log_surgeon/Lexer.hpp (1)

172-191: Consider simplifying the register ID retrieval logic.

While the implementation is correct, it could be more concise by combining the optional checks:

auto get_reg_ids_from_capture_id(capture_id_t const capture_id
) const -> std::optional<std::pair<reg_id_t, reg_id_t>> {
    auto const optional_tag_id_pair{get_tag_id_pair_from_capture_id(capture_id)};
-   if (false == optional_tag_id_pair.has_value()) {
-       return std::nullopt;
-   }
-   auto const [start_tag_id, end_tag_id]{optional_tag_id_pair.value()};
-
-   auto const optional_start_reg_id{get_reg_id_from_tag_id(start_tag_id)};
-   if (false == optional_start_reg_id.has_value()) {
-       return std::nullopt;
-   }
-
-   auto const optional_end_reg_id{get_reg_id_from_tag_id(end_tag_id)};
-   if (false == optional_end_reg_id.has_value()) {
-       return std::nullopt;
-   }
-
-   return {optional_start_reg_id.value(), optional_end_reg_id.value()};
+   if (optional_tag_id_pair) {
+       auto const [start_tag_id, end_tag_id]{optional_tag_id_pair.value()};
+       auto const optional_start_reg_id{get_reg_id_from_tag_id(start_tag_id)};
+       auto const optional_end_reg_id{get_reg_id_from_tag_id(end_tag_id)};
+       if (optional_start_reg_id && optional_end_reg_id) {
+           return {optional_start_reg_id.value(), optional_end_reg_id.value()};
+       }
+   }
+   return std::nullopt;
}
tests/CMakeLists.txt (1)

12-14: Good architectural improvement!

The replacement of TaggedTransition.hpp with SpontaneousTransition.hpp and TagOperation.hpp suggests a better separation of concerns, aligning with previous feedback about tagged transitions not sharing enough functionality.

CMakeLists.txt (1)

81-81: LGTM! The build system changes reflect a good architectural improvement.

The replacement of TaggedTransition.hpp with SpontaneousTransition.hpp and TagOperation.hpp suggests a better separation of concerns, which should make the codebase more maintainable.

This change aligns well with the Single Responsibility Principle by separating transition logic from tag operations.

Also applies to: 83-83

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6904e51 and 2ff9f3c.

📒 Files selected for processing (16)
  • CMakeLists.txt (1 hunks)
  • src/log_surgeon/Lexer.hpp (2 hunks)
  • src/log_surgeon/Lexer.tpp (8 hunks)
  • src/log_surgeon/LexicalRule.hpp (2 hunks)
  • src/log_surgeon/SchemaParser.cpp (1 hunks)
  • src/log_surgeon/finite_automata/Dfa.hpp (1 hunks)
  • src/log_surgeon/finite_automata/DfaState.hpp (2 hunks)
  • src/log_surgeon/finite_automata/DfaStatePair.hpp (1 hunks)
  • src/log_surgeon/finite_automata/Nfa.hpp (7 hunks)
  • src/log_surgeon/finite_automata/NfaState.hpp (6 hunks)
  • src/log_surgeon/finite_automata/RegexAST.hpp (4 hunks)
  • src/log_surgeon/finite_automata/SpontaneousTransition.hpp (1 hunks)
  • src/log_surgeon/finite_automata/TagOperation.hpp (1 hunks)
  • tests/CMakeLists.txt (1 hunks)
  • tests/test-nfa.cpp (2 hunks)
  • tests/test-register-handler.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,ts,tsx}`: - Prefer `false ==

**/*.{cpp,hpp,java,js,jsx,ts,tsx}: - Prefer false == <expression> rather than !<expression>.

  • src/log_surgeon/finite_automata/Dfa.hpp
  • src/log_surgeon/finite_automata/DfaStatePair.hpp
  • src/log_surgeon/finite_automata/DfaState.hpp
  • tests/test-register-handler.cpp
  • tests/test-nfa.cpp
  • src/log_surgeon/finite_automata/SpontaneousTransition.hpp
  • src/log_surgeon/Lexer.hpp
  • src/log_surgeon/LexicalRule.hpp
  • src/log_surgeon/finite_automata/TagOperation.hpp
  • src/log_surgeon/finite_automata/RegexAST.hpp
  • src/log_surgeon/finite_automata/NfaState.hpp
  • src/log_surgeon/SchemaParser.cpp
  • src/log_surgeon/finite_automata/Nfa.hpp
🧠 Learnings (5)
tests/CMakeLists.txt (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
CMakeLists.txt (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#47
File: src/log_surgeon/finite_automata/TaggedTransition.hpp:16-37
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In `src/log_surgeon/finite_automata/TaggedTransition.hpp`, the classes `PositiveTaggedTransition` and `NegativeTaggedTransition` currently do not share enough functionality to justify refactoring into a common base class.
src/log_surgeon/Lexer.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#42
File: src/log_surgeon/finite_automata/RegexNFA.hpp:37-90
Timestamp: 2024-11-10T16:46:58.543Z
Learning: In this codebase, prefer code clarity over efficiency optimizations unless efficiency is a critical concern.
src/log_surgeon/finite_automata/RegexAST.hpp (2)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexAST.hpp:700-700
Timestamp: 2024-11-13T22:38:19.472Z
Learning: In `RegexASTCapture`, `m_tag` must always be non-null.
src/log_surgeon/finite_automata/NfaState.hpp (1)
Learnt from: SharafMohamed
PR: y-scope/log-surgeon#48
File: src/log_surgeon/finite_automata/RegexNFAState.hpp:0-0
Timestamp: 2024-11-13T20:02:13.737Z
Learning: In `src/log_surgeon/finite_automata/RegexNFAState.hpp`, the constructor `RegexNFAState(std::set<Tag const*> tags, RegexNFAState const* dest_state)` has been updated to use `std::vector<Tag const*> tags` instead of `std::set`.
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build (ubuntu-22.04, Release)
  • GitHub Check: build (ubuntu-22.04, Debug)
  • GitHub Check: build (macos-latest, Release)
🔇 Additional comments (48)
src/log_surgeon/finite_automata/NfaState.hpp (12)

4-24: Header inclusions updated.
The new headers support spontaneous transitions and tagging logic. There are no obvious issues with ordering or redundant includes.


40-44: Constructor sets m_accepting.
This constructor explicitly marks the state as accepting. Ensure that any usage aligns with your design intent (i.e., that all states created with this constructor should indeed be accepting).


46-52: Chained constructor usage.
Calling add_spontaneous_transition within the constructor is clear and reduces duplication. This matches the new spontaneous transition model well.


54-65: Validate pointer lifetime.
add_spontaneous_transition stores dest_state in a vector. Confirm that the lifetime of dest_state is at least as long as the lifetime of this NfaState to avoid dangling references.


98-101: Providing read-only access to spontaneous transitions.
Returning a const reference is appropriate if the caller only needs inspection. This design is consistent with the rest of the interface.


103-105: Consistent style for byte transitions.
The getter mirrors the spontaneous transitions approach and returns a reference for outside inspection. This appears coherent with the new code structure.


107-107: Getter for tree transitions.
Exposing m_tree_transitions via a const reference is consistent with the pattern used for other transitions.


112-112: New member m_spontaneous_transitions.
This vector neatly encapsulates the transitions. No concerns regarding the container choice.


178-179: Epsilon closure with spontaneous transitions.
This replaces older tagged or epsilon transitions effectively. The logic of pushing get_dest_state() onto the stack is straightforward and consistent.


188-190: Conditional acceptance string.
Building accepting_tag_string if m_accepting is true is a convenient approach, ensuring minimal overhead when the state is not accepting.


201-208: Serializing spontaneous transitions.
All transitions are collected in serialized_spontaneous_transitions. Returning std::nullopt upon failure is consistent with the rest of the design.


211-215: Overall serial format.
Including both byte transitions and spontaneous transitions in the same format string is clear and conforms to the new approach.

src/log_surgeon/finite_automata/Nfa.hpp (12)

4-21: Refreshed includes and doc clarifications.
The updated includes and doc lines align with the new spontaneous transitions model and usage of optional return types.


43-44: New doc block for new_state.
The explanation that the returned state has no spontaneous transitions is helpful.


47-52: new_accepting_state creation
Creating an accepting state with a dedicated method is more explicit, improving readability. Confirm that all references to your older approach are removed.


54-62: Handling negative captures.
The doc string describing how tags are negated is consistent, and new_state_from_negative_captures helps clarify usage.


65-76: Positive capture method documentation.
Renamed to reflect “start and end states from positive capture.” The new name is clearer for future maintainers.


85-88: Serialization with std::optional<std::string>.
Returning an optional indicates possible failure, which is a good practice. Double-check that all call sites handle the empty case gracefully.


96-96: Retrieving m_root.
The direct getter is concise. No issues found so far.


148-152: Implementation of new_accepting_state.
The approach of constructing the TypedNfaState with matching_variable_id inside m_states is consistent and well contained.


155-170: new_state_from_negative_captures implementation details.
Moves all relevant tag IDs into the new state. This logic looks correct for negating multiple captures.


173-185: new_start_and_end_states_from_positive_capture usage.
Attaching transitions for start and end tags in a single step is a clean approach, avoiding confusion from older tagged transitions.


188-216: BFS traversal including spontaneous transitions.
Expanding BFS to queue spontaneous transitions ensures no states remain inaccessible. This approach closely mirrors the old epsilon concept.


221-238: Optional-based serialize.
Aborting on the first serialization failure is an appropriate strategy. The rest of the method remains straightforward.

src/log_surgeon/finite_automata/RegexAST.hpp (3)

24-24: Additional include for TagOperation.hpp.
All references to tag operations appear consolidated, which aids clarity in the capturing logic.


117-125: Refined negative captures approach.
The doc clarifies that negative captures generate a spontaneous transition to apply negation. This consistent terminology aligns well with the updated code.


911-942: Capture group NFA documentation.
The updated notes on spontaneous transitions clearly illustrate the new approach, from setting start tags to negating alternate tags, then ending with capturing final tags. This is an excellent update for maintainability.

src/log_surgeon/LexicalRule.hpp (2)

4-5: LGTM!

The addition of the <vector> include is necessary as it's used in the get_captures() method.


45-48: LGTM!

The refactoring improves code clarity by consolidating state creation and variable ID assignment into a single method call.

src/log_surgeon/finite_automata/TagOperation.hpp (2)

13-17: LGTM!

The enum class is well-defined with clear, descriptive values.


25-31: LGTM!

Efficient implementation of comparison operators using std::tie for member-wise comparison.

src/log_surgeon/finite_automata/SpontaneousTransition.hpp (2)

26-30: LGTM!

Well-designed constructors with clear parameter handling and move semantics.


53-67: LGTM!

The implementation is robust with:

  • Proper error handling using std::optional
  • Efficient use of C++20 ranges for transformation
  • Clear string formatting using fmt
src/log_surgeon/finite_automata/DfaState.hpp (2)

49-49: LGTM!

The method rename improves clarity and the const return type enhances const correctness.


60-75: LGTM!

The implementation:

  • Preserves the existing logic while improving const correctness
  • Maintains efficient handling of both Byte and Utf8 state types
src/log_surgeon/finite_automata/DfaStatePair.hpp (2)

72-73: LGTM! Method renaming improves clarity.

The renaming from next to get_dest_state makes the method's purpose more explicit and self-documenting.


74-74: LGTM! Condition checks follow coding guidelines.

The conditions have been updated to follow the coding guidelines:

  • Using nullptr != x pattern for pointer checks
  • Using false == x pattern for boolean expressions

Also applies to: 76-76

tests/test-nfa.cpp (1)

65-67: LGTM! Improved error handling with optional type.

The change to use std::optional for the serialization result improves error handling by explicitly checking for success before using the value.

tests/test-register-handler.cpp (1)

38-38: LGTM! Added const qualifier for immutability.

Adding the const qualifier to empty_handler better expresses the intent that this test variable should not be modified.

src/log_surgeon/finite_automata/Dfa.hpp (2)

8-8: LGTM! Added required include for std::stack.

The addition of <stack> header is necessary for using std::stack in the implementation.


62-62: LGTM! Condition checks follow coding guidelines.

The conditions have been updated to follow the coding guidelines by using the false == x pattern for boolean expressions.

Also applies to: 117-117

src/log_surgeon/Lexer.hpp (2)

6-17: LGTM! The new header inclusions are appropriate.

The addition of <map> and NfaState.hpp headers supports the new functionality while maintaining proper header organization.


132-138: Well-implemented optional return with safe map access!

The method properly checks for key existence before access and returns an optional value, following best practices.

src/log_surgeon/Lexer.tpp (4)

87-88: LGTM! The state transition renaming improves clarity.

The change from next to get_dest_state better describes the method's purpose.


218-219: Good consistency in state transition handling!

The explicit type declaration and consistent method naming improve code clarity.


280-283: Well-implemented modern C++ practices!

The use of contains for set membership and consistent state transition naming improve code quality.


340-340: LGTM! Consistent method naming.

The change maintains consistency with the new state transition naming convention.

src/log_surgeon/SchemaParser.cpp (1)

3-23:

❓ Verification inconclusive

Verify the impact of replacing Lexer with Reader.

The switch from Lexer.hpp to Reader.hpp suggests a significant change in input handling. Please ensure all lexer functionality previously used is available through the Reader interface.


🏁 Script executed:

#!/bin/bash
# Search for any remaining Lexer usage that might need updating
rg -l "Lexer" --type cpp --type hpp

Length of output: 63


Action Required: Re-verify the Lexer Replacement Impact

The recent switch from Lexer.hpp to Reader.hpp appears to be a significant refactoring of input handling. Our initial automated search for the term “Lexer” ran into issues with header file type recognition. Please re-run your search using the following command to accurately target both header and source files:

#!/bin/bash
# Re-run the search for "Lexer" in all .hpp and .cpp files
rg -l Lexer -g '*.hpp' -g '*.cpp'
  • Confirm that there are no lingering references to Lexer in the codebase.
  • Verify that all functionality previously provided by the lexer is now properly supported by the Reader interface.

Comment on lines +40 to +49
[[nodiscard]] auto serialize() const -> std::string {
switch (m_type) {
case TagOperationType::Set:
return fmt::format("{}{}", m_tag_id, "p");
case TagOperationType::Negate:
return fmt::format("{}{}", m_tag_id, "n");
case TagOperationType::None:
return "none";
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add a default case to the switch statement.

The switch statement should handle all possible enum values, including future additions.

Apply this diff:

         case TagOperationType::None:
             return "none";
+        default:
+            return "unknown";
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[[nodiscard]] auto serialize() const -> std::string {
switch (m_type) {
case TagOperationType::Set:
return fmt::format("{}{}", m_tag_id, "p");
case TagOperationType::Negate:
return fmt::format("{}{}", m_tag_id, "n");
case TagOperationType::None:
return "none";
}
}
[[nodiscard]] auto serialize() const -> std::string {
switch (m_type) {
case TagOperationType::Set:
return fmt::format("{}{}", m_tag_id, "p");
case TagOperationType::Negate:
return fmt::format("{}{}", m_tag_id, "n");
case TagOperationType::None:
return "none";
default:
return "unknown";
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant