COMS W4115
Programming Languages and Translators
Lecture 10: Implementing a Parser with Yacc
February 23, 2015
Lecture Outline
- Issues arising in implementing a parser with Yacc
- Shift/reduce conflicts
- Using Yacc to generate LALR(1) parsers
- Using Yacc with ambiguous grammars
- Error recovery in Yacc
1. Issues Arising in Implementing a Parser with Yacc
- A number of issues may arise in using Yacc to create a parser for a grammar.
This lecture discusses some of the most common issues
and how to resolve them if they arise.
- If a grammar is not LALR(1), Yacc will produce one or more
multiply defined entries in the parsing table action function.
These entries are reported as shift/reduce conflicts or
reduce/reduce conflicts.
- These conflicts can often be successfully resolved by rewriting the
grammar or giving Yacc additional information about the associativities
and precedence levels of operators in expressions or leaving them
knowing the rules Yacc uses to deal with parsing action conflicts.
2. Shift/Reduce Conflicts
- One common source of shift/reduce conflicts is using an ambiguous grammar
for expressions that does not specify the associativities and precedence
levels of its operators.
- Example. Consider the following augmented grammar G':
(0) E' → E
(1) E → E+E
(2) E → E*E
(3) E → id
FOLLOW(E
) = { +, *, $
}
Note that this grammar is ambiguous: it does not specify the associativities
of the +
and *
operators
or their relative precedence.
C, the canonical collection of sets of LR(0) items for G', is
I0: E' → ·E
E → ·E+E
E → ·E*E
E → ·id
I1: E' → E·
E → E·+E
E → E·*E
I2: E → id·
I3: E → E+·E
E → ·E+E
E → ·E*E
E → ·id
I4: E → E*·E
E → ·E+E
E → ·E*E
E → ·id
I5: E → E+E·
E → E·+E
E → E·*E
I6: E → E*E·
E → E·+E
E → E·*E
From C we can construct the following SLR(1) parsing table for G':
state |
action |
goto |
id |
+ |
* |
$ |
E |
0 |
s2 |
|
|
|
1 |
1 |
|
s3 |
s4 |
acc |
|
2 |
|
r3 |
r3 |
r3 |
|
3 |
s2 |
|
|
|
5 |
4 |
s2 |
|
|
|
6 |
5 |
|
s3
r1 |
s4
r1 |
r1 |
|
6 |
|
s3
r2 |
s4
r2 |
r2 |
|
Notes
- There is a shift/reduce conflict in
action[5,+]
between
"shift 3
" and "reduce by E → E+E
"
because the associativity of the operator +
is not defined
by the grammar. If we want +
to be left associative, the parser needs to resolve this conflict in favor of
"reduce by E → E+E
".
- There is a shift/reduce conflict in
action[6,*]
between
"shift 4
" and "reduce by E → E*E
"
because the associativity of the operator *
is not defined
by the grammar. If we want *
to be
left associative, the parser needs to resolve this conflict in favor of
"reduce by E → E*E
".
- There is a shift/reduce conflict in
action[5,*]
between "shift 4
" and "reduce by E → E+E
"
because the relative precedence of the operators +
and
*
is not defined by the grammar. This conflict should be
resolved in favor of "shift 4
" if we want *
to have higher precedence than +
.
- Likewise, there is a shift/reduce conflict in
action[6,+]
between "shift 3
" and "reduce by E → E*E
"
because the relative precedence of the operators +
and
*
is not defined by the grammar. This conflict should be
resolved in favor of "reduce by E → E*E
" if we want
*
to have higher precedence than +
.
3. Using Yacc with Ambiguous Grammars
- We can still use the ambiguous expression grammar above in a yacc specification
if we resolve the associativities
and relative precedence
of the
+
and *
operators by adding statements to
the declarations section of the specification. As an example, consider the yacc
specification:
%token id
%left '+'
%left '*'
%%
E : E '+' E
| E '*' E
| id
;
The statement %left '+'
makes the operator +
left associative
by instructing yacc to resolve the shift/reduce conflict in
action[5,+]
in favor of "reduce by E → E+E
".
Similarly, the statement %left '*'
makes the operator *
left associative
by instructing yacc to resolve the shift/reduce conflict in
action[6,*]
in favor of "reduce by E → E*E
".
Putting the statement %left '+'
before the statement
left '*'
, instructs yacc to make +
have lower precedence than
*
. Yacc does this by resolving
the shift/reduce conflict in
action[5,*]
in favor of "shift 4"
and resolving
the shift/reduce conflict in
action[6,+]
in favor of "reduce by E → E*E"
.
You can invoke the command yacc -v yaccSpec.y
to
generate the output file y.output
to see the states
generated by yacc and where the parsing action conflicts, if any, occur.
4. The "Dangling-Else" Ambiguity
- Unless otherwise instructed, yacc resolves all parsing
action conflicts using the following two rules:
- A reduce/reduce conflict is resolved by choosing the
conflicting production listed first in the Yacc
specification.
- A shift/reduce conflict is resolved in favor of shift.
Note that this rule correctly resolves the shift/reduce conflict
arising from the dangling-else ambiguity.
- This behavior of Yacc can be exploited to gracefully handle the
dangling-else ambiguity found in many programming languages.
- Consider the following simplified ambiguous grammar for if- and if-else statements:
S' → S
S → i S e S | i S | a
Here the symbol i
stands for if expr then
,
the symbol e
stands for else
, and
the symbol a
stands for all other productions.
We have also added an augmenting production S' → S
.
The canonical collections of sets of LR(0) items for this grammar is as follows:
I0: S' → .S I3: S → a.
S → .iSeS
S → .iS I4: S → iS.eS
S → .a S → iS.
I1: S' → S. I5: S → iSe.S
S → .iSeS
I2: S → i.SeS S → .iS
S → i.S S → .a
S → .iSeS
S → .iS I6: S → iSeS.
S → .a
The set of items I4
gives rise to a shift/reduce
conflict. The item S → iS.eS
calls for a shift on
e
and since FOLLOW(S
} = {e
, $},
the item S → iS.
calls for a reduction by production
S → iS.
on e
. To associate each e
with the closest unelsed if, we should resolve the conflict in favor of shift
to state 5. Because Yacc favors shift over reduce in a shift/reduce conflict,
Yacc will parse nested if- and if-else statements correctly.
See Section 4.8.2 of ALSU for a more detailed discussion.
5. Error Recovery in Yacc
- In Yacc, error recovery can be performed with error productions.
- To process errors at the level of a nonterminal
A
,
add an error production A → error α
where error
is a Yacc reserved word, and α is a string
of grammar symbols, possibly empty. Yacc will generate a parser
including this production.
- If Yacc encounters an error during parsing, it pops symbols from its stack
until it finds the topmost state on its stack whose underlying set of items
includes an item of the form
-
A → . error α
The parser then shifts the special token error
onto the stack
as though it saw the token error
on the input.
- If α is not empty, Yacc skips ahead on the input looking for a
substring that can be "reduced" to α on the stack. Now the parser
will have
error α
on top of its stack, which it will
reduce to A
. The parser then resumes normal parsing.
- Example: Consider the Lex-Yacc desk calculator in Section 5 of Lecture 6.
Suppose we add to the Yacc program the translation rule
lines : error '\n' { yyerror("reenter previous line:"); yyerrok; }
Now, on encountering an error, the parser starts popping symbols from
its stack until it encounters a state with a shift action on error
.
The parser shifts the token error
on to the stack and
then skips ahead in the input until it finds a newline which it shifts
onto the stack. The parser reduces error '\n'
to
lines
and emits the error message "reenter previous line:".
The special Yacc routine yyerrok
resets the parser to its
normal mode of operation.
6. Practice Problems
- Consider the following grammar G:
(1) S → a S b S
(2) S → b S a S
(3) S → ε
- What language does G generate?
- Construct an SLR(1) parsing table for G.
- Explain why the parsing action conflicts arise in the parsing table.
- Construct an equivalent LALR(1) grammar for L(G).
- Show that your grammar is LALR(1) by using yacc to
construct an LALR(1) parsing table for it.
- Is your grammar LL(1)?
7. Reading
aho@cs.columbia.edu