Skip to content

Commit c5f0c19

Browse files
authored
Fix per symbol first set computation (#6)
Fix per symbol first set computation (Spec._firstSets()) for a non-epsilon production to omit epsilon unless all RHS symbols' first sets contain epsilon. Prior to this fix, epsilon was merged into the first set under the weaker condition that a prefix of RHS contained epsilon in its first set. Note that computeFirstSet(), which computes the first set for symbol strings (e.g. of RHS prefixes during item set closure computation), does not contain this flaw. In order for epsilon to be in a symbol's first set, the entire syntax subtree rooted at the symbol must be capable of being constructed without containing any tokens. The simplest case is for the symbol to have an epsilon production; the more involved case (that this change fixes) is for the symbol to reduce one or more RHS symbols which transitively contain only epsilon-reduced symbols at the leaves. Impacts: - Per symbol first sets could erroneously contain epsilon. - Per symbol follow sets could erroneously contain additional tokens. Follow sets are not used for any purpose other than logging. - Parser tables were affected only if a symbol had multiple productions which could reduce on epsilon. But such productions would induce a parsing ambiguity, so it was impossible to for LR(1)-compatible parsers to be affected.
1 parent 5606949 commit c5f0c19

File tree

1 file changed

+11
-10
lines changed

1 file changed

+11
-10
lines changed

parsing/automaton.py

+11-10
Original file line numberDiff line numberDiff line change
@@ -1396,23 +1396,24 @@ def _firstSets(self) -> None:
13961396
for name in self._nonterms:
13971397
nonterm = self._nonterms[name]
13981398
for prod in nonterm.productions:
1399-
# Merge epsilon if there is an empty production.
1400-
if len(prod.rhs) == 0:
1401-
if not nonterm.firstSetMerge(epsilon):
1402-
done = False
1403-
1399+
mergeEpsilon = True
14041400
# Iterate through the RHS and merge the first sets into
14051401
# this symbol's, until a preceding symbol's first set does
14061402
# not contain epsilon.
14071403
for elm in prod.rhs:
1408-
containsEpsilon = False
1404+
hasEpsilon = False
14091405
for elmSym in elm.firstSet:
1410-
if not nonterm.firstSetMerge(elmSym):
1411-
done = False
14121406
if elmSym == epsilon:
1413-
containsEpsilon = True
1414-
if not containsEpsilon:
1407+
hasEpsilon = True
1408+
elif not nonterm.firstSetMerge(elmSym):
1409+
done = False
1410+
if not hasEpsilon:
1411+
mergeEpsilon = False
14151412
break
1413+
# Merge epsilon if it was in the first set of every symbol.
1414+
if mergeEpsilon:
1415+
if not nonterm.firstSetMerge(epsilon):
1416+
done = False
14161417

14171418
# Compute the follow sets for all symbols.
14181419
def _followSets(self) -> None:

0 commit comments

Comments
 (0)