Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework cfg #14

Open
wants to merge 30 commits into
base: rework_architecture
Choose a base branch
from
Open

Rework cfg #14

wants to merge 30 commits into from

Conversation

bygu4
Copy link
Collaborator

@bygu4 bygu4 commented Dec 1, 2024

Here we plan to rework cfg and some related modules, removing cycles and adding type annotations. Modules we should consider are:

  • cfg;
  • pda;
  • fcfg.

Changes by module:

objects:

  • Move all formal object representations (State, Terminal, etc.) to a separate module to improve the structure of the project and handle some design issues.
  • Unify objects by adding common FormalObject class.
  • Unify terminal and epsilon representations.
  • Rework object equivalence methods.
  • Inherit PDA StackSymbol from Symbol and Epsilon from StackSymbol to handle some type mismatches.

cfg:

  • Remove to_pda method from CFG class, add from_cfg method for pda instead.
  • Add FormalGrammar abstract class to handle import cycles in cfg module, generalize some methods and properties.
  • Avoid storing Normal Form of the grammar and other returned values for safety and simplicity reasons.
  • Add add_production method for grammars, add methods for changing start symbol of the grammar.
  • Add copying methods for grammars.
  • Move PDA object creator to pda module.

fcfg:

  • Define from_text method for FCFG using generics in the FormalGrammar class.
  • Define add_production by converting the given rule to FeatureProduction class, use that conversion in the constructor.

pda:

  • Add methods for transition removal.
  • Add methods for copying of the PDA.
  • Rework methods of transition iteration.
  • Add methods for transition containment checks and transition function calls.
  • Move CFG variable converter to cfg module.
  • Move all PDA object converters to utils file.

Copy link

github-actions bot commented Dec 2, 2024

Coverage

Coverage Report
FileStmtsMissCoverMissing
pyformlang
   __init__.py90100% 
pyformlang/cfg
   __init__.py30100% 
   cfg.py47022 99%
   cfg_variable_converter.py6544 94%
   cyk_table.py790100% 
   formal_grammar.py6911 99%
   llone_parser.py16333 98%
   parse_tree.py6511 98%
   recursive_decent_parser.py6122 97%
   set_queue.py150100% 
   utils.py250100% 
pyformlang/cfg/tests
   __init__.py00100% 
   test_cfg.py62622 99%
   test_llone_parser.py11711 99%
   test_production.py210100% 
   test_recursive_decent_parser.py2511 96%
   test_terminal.py310100% 
   test_variable.py160100% 
pyformlang/fcfg
   __init__.py40100% 
   fcfg.py13111 99%
   feature_production.py250100% 
   feature_structure.py19133 98%
   state.py360100% 
pyformlang/fcfg/tests
   __init__.py00100% 
   test_fcfg.py1690100% 
   test_feature_structure.py1590100% 
pyformlang/finite_automaton
   __init__.py70100% 
   deterministic_finite_automaton.py18333 98%
   deterministic_transition_function.py2411 96%
   doubly_linked_list.py350100% 
   doubly_linked_node.py100100% 
   epsilon_nfa.py22011 99%
   finite_automaton.py23111 99%
   hopcroft_processing_list.py240100% 
   nondeterministic_finite_automaton.py400100% 
   nondeterministic_transition_function.py480100% 
   partition.py400100% 
   transition_function.py320100% 
   utils.py300100% 
pyformlang/finite_automaton/tests
   __init__.py00100% 
   test_deterministic_finite_automaton.py2960100% 
   test_deterministic_transition_function.py8955 94%
   test_epsilon.py130100% 
   test_epsilon_nfa.py5950100% 
   test_nondeterministic_finite_automaton.py1600100% 
   test_nondeterministic_transition_function.py790100% 
   test_state.py270100% 
   test_symbol.py280100% 
pyformlang/fst
   __init__.py20100% 
   fst.py2420100% 
pyformlang/fst/tests
   __init__.py00100% 
   test_fst.py1650100% 
pyformlang/indexed_grammar
   __init__.py70100% 
   consumption_rule.py340100% 
   duplication_rule.py300100% 
   end_rule.py300100% 
   indexed_grammar.py25722 99%
   production_rule.py320100% 
   reduced_rule.py250100% 
   rule_ordering.py700100% 
   rules.py690100% 
pyformlang/indexed_grammar/tests
   __init__.py00100% 
   test_indexed_grammar.py2240100% 
   test_rules.py350100% 
pyformlang/objects
   __init__.py50100% 
   base_epsilon.py1511 93%
   base_terminal.py70100% 
   formal_object.py240100% 
pyformlang/objects/cfg_objects
   __init__.py60100% 
   cfg_object.py50100% 
   epsilon.py30100% 
   production.py4111 98%
   terminal.py1011 90%
   utils.py1411 93%
   variable.py130100% 
pyformlang/objects/finite_automaton_objects
   __init__.py50100% 
   epsilon.py30100% 
   finite_automaton_object.py50100% 
   state.py711 86%
   symbol.py511 80%
   utils.py140100% 
pyformlang/objects/pda_objects
   __init__.py60100% 
   epsilon.py30100% 
   pda_object.py50100% 
   stack_symbol.py70100% 
   state.py70100% 
   symbol.py50100% 
   utils.py2111 95%
pyformlang/objects/regex_objects
   __init__.py20100% 
   regex_objects.py630100% 
   utils.py220100% 
pyformlang/pda
   __init__.py40100% 
   pda.py32822 99%
   transition_function.py4133 93%
   utils.py5322 96%
pyformlang/pda/tests
   __init__.py00100% 
   test_pda.py2990100% 
pyformlang/regular_expression
   __init__.py30100% 
   python_regex.py26966 98%
   regex.py2811414 95%
   regex_reader.py15944 97%
pyformlang/regular_expression/tests
   __init__.py00100% 
   test_python_regex.py2780100% 
   test_regex.py4110100% 
pyformlang/rsa
   __init__.py30100% 
   box.py512525 51%
   recursive_automaton.py7299 88%
pyformlang/rsa/tests
   __init__.py00100% 
   test_rsa.py370100% 
pyformlang/tests
   __init__.py00100% 
TOTAL862010699% 

Tests Skipped Failures Errors Time
305 0 💤 0 ❌ 0 🔥 7.749s ⏱️

@bygu4
Copy link
Collaborator Author

bygu4 commented Dec 2, 2024

Significant architecture changes are:

  • Remove cycles between pda and cfg by adding from_cfg method for PDA;
  • Move object representations and related utility to separate module for improving structure of the library. This would allow to transfer some common features of objects for adding stricter annotations. Also, it would be simpler to reuse existing objects in Indexed Grammars and FST.

@bygu4 bygu4 requested a review from gsvgit December 4, 2024 18:50

fa_type = TypeVar("fa_type", bound="FiniteAutomaton")
AutomatonT = TypeVar("AutomatonT", bound="FiniteAutomaton")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T stands for Type? If so, let's use full word Type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just FiniteAutomaton ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T is commonly used with generic types. Here we use generics to define copying for all of existing automata types

@@ -100,7 +115,7 @@ def _get_parse_tree_sub(self, word, current_expansion, left=True):
return True
return False

def is_parsable(self, word, left=True):
def is_parsable(self, word: Iterable[Hashable], left: bool = True) -> bool:
"""
Whether a word is parsable or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update docstring

productions: Iterable[FeatureProduction] = None):
variables: AbstractSet[Hashable] = None,
terminals: AbstractSet[Hashable] = None,
start_symbol: Hashable = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the problem to make start_symbol has type Variable? Current type looks not pretty specific.

from ..formal_object import FormalObject


class Terminal(CFGObject):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be unified over all regexps, grammars, etc? Terminal is a common thing for all language-related formalisms.


Parameters
-----------
value : any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can variable be unified over all cfg-related formalisms?


Parameters
----------
given : any
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hashable?

bygu4 added 26 commits December 29, 2024 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants