COMS W4115
Programming Languages and Translators
Lecture 9: Bottom-Up Parsing
February 18, 2015
Lecture Outline
- Bottom-up parsing
- LR(1) parsing
- Constructing a simple LR(1) parser
- DFA for viable prefixes
1. Bottom-up Parsing
- Bottom-up parsing can be viewed as trying to find a
rightmost derivation in reverse for an input string.
- A handle is a rightmost substring in right-sentential form
that matches the body of a production such that
replacing that right-hand side by the nonterminal on the left-hand
side of that production represents one step in
the reverse of the rightmost derivation.
(1) S → S ( S )
(2) S → ε
The handles in a rightmost derivation for the input string ( ) ( )
:
S ⇒ S ( S ) // handle is S ( S )
⇒ S ( ) // handle is the empty string between ( )
⇒ S ( S ) ( ) // handle is S ( S )
⇒ S ( ) ( ) // handle is the empty string between first ( )
⇒ ( ) ( ) // handle is the empty string prefix
Shift-reduce parsing is a form of bottom-up parsing
in which we shift terminal symbols of the string to be parsed onto
a stack until a handle appears on top of the stack. We then
replace the handle by the nonterminal symbol on the left-hand
side of the associated production (this is a "reduce" action). We keep
repeating this process until we have reduced the input string
to the start symbol of the grammar. This process simulates the
reverse of a rightmost derivation for the input string.
Thus, we can think of shift-reduce parsing as "handle pruning."
2. LR(1) Parsing
- Model of an LR(1) parser (Fig. 4.35).
- "L" means left-to-right scanning of the input, the "R" means
constructing a rightmost derivation in reverse, and the "1"
means one symbol of lookahead in making parsing decisions.
- LR parsing table for G:
state |
action |
goto |
( |
) |
$ |
S |
0 |
r2 |
r2 |
r2 |
1 |
1 |
s2 |
|
acc |
|
2 |
r2 |
r2 |
r2 |
3 |
3 |
s2 |
s4 |
|
|
4 |
r1 |
r1 |
r1 |
|
- The entry
action[1,(] = s2
means when in state 1 scanning the token '(',
shift the token '(' on top of the stack and then
push state 2 on top of the stack. The input pointer is advanced to the next token.
- The entry
action[0,(] = r2
means when in state 0 scanning the token '(',
reduce the handle consisting of the empty string on top of the stack
according to production (2)
S → ε
. Put the nonterminal S
on the left side of the production on top of the stack and then put
state 1 = goto[0,S]
on top of the stack. The input pointer is
not advanced.
- The entry
action[4,(] = r1
means when in state 4 scanning the token '(',
reduce the handle consisting of the string S(S)
on top of the stack
according to production (1)
S → S(S)
. The handle and the intervening states are removed.
The nonterminal S
on the left side of the production is then put on top of the stack and then
state 1 = goto[0,S]
is put on top of the stack. The input pointer
is not advanced.
acc
means accept and stop parsing.
- A blank entry means report a syntax error and stop parsing.
- Moves made by an LR(1) parser on input
( ) ( )
[Alg. 4.44].
Stack Input Action
0 ()()$ reduce by (2) S → ε and push state 1 on stack
0S1 ()()$ shift ( on stack; push state 2 on stack
0S1(2 )()$ reduce by (2) S → ε and push state 3
0S1(2S3 )()$ shift ) and push state 4
0S1(2S3)4 ()$ reduce by (1) S → S(S) and push state 1
0S1 ()$ shift ( and push state 2
0S1(2 )$ reduce by (2) S → ε and push state 3
0S1(2S3 )$ shift ) and push state 4
0S1(2S3)4 $ reduce by (1) S → S(S) and push state 1
0S1 $ accept
Note that an LR parser is a shift-reduce parser that traces out a
rightmost derivation in reverse.
3. Constructing a Simple LR(1) Parsing Table for a Grammar
- An LR(0) item of a grammar is a production of the grammar with a dot
at some position of the right side.
E.g.,
S → ·S(S)
, S → S·(S)
,
or S → S(S)·
.
- We will use two functions to construct the sets of items for a grammar:
- closure(I), where I is a set of items, is the set of items
constructed by the following two rules:
- Initially, put every item in I into closure(I).
- If A → α·Bβ is in closure(I)
and B → γ is a production, then add the item
B → ·γ to closure(I) if it is not already there.
Keep repeating this step until no more new items can be added to I.
- goto(I, X), where I is a set of items and
X is a grammar symbol, is the closure of the set of all items
A → αX·β where
A → α·Xβ is in I.
- An augmented grammar G' is one to which we have added
a new starting production S' → S where S is
the start symbol of the given grammar G.
Reducing by the new starting production signals acceptance of the input
string being parsed. We will always augment a grammar when we construct
an SLR parsing table for it.
- The sets-of-items construction
- Input: An augmented grammar G'.
- Output: C, the canonical collection of sets of LR(0) items for G'.
- Method:
I0 = closure({[S' → ·S]});
C = {I0};
repeat
for each set of items I in C and grammar symbol X such that
goto(I,X) is not empty and not in C do
add goto(I,X) to C;
until no more sets of items can be added to C;
Example: Given the augmented grammar G'
S' → S
S → S(S)
S → ε
C, the canonical collection of sets of LR(0) items for G', is
I0: S' → ·S
S → ·S(S)
S → ·
I1: S' → S·
S → S·(S)
I2: S → S(·S)
S → ·S(S)
S → ·
I3: S → S(S·)
S → S·(S)
I4: S' → S(S)·
Algorithm to construct the SLR(1) parsing table from C
,
the canonical collection of sets of LR(0) items for an augmented grammar
G'
- Input:
C = {I0, I1, ... , In
}.
- Output: The SLR parsing table functions
action
and
goto
.
- Method:
- State
i
and its action
and goto
functions are constructed from Ii
as follows:
- If item [
A→ α·aβ
] is in
Ii
and goto(Ii, a) =
Ij
, then add "shift j
" to
action[i, a]
. Here a
is a terminal.
- If item [
A → α·
] is in Ii
,
then add "reduce A → α
" to action[i, a]
for all a
in FOLLOW(A
).
Here A
cannot be S'
.
- If item [
S' → S·
] is in Ii
,
then add "accept
" to action[i, $]
.
- If
goto(Ii, A) = Ij
, then in
the parsing table set goto[i, A] = j
.
- The initial state of the parser is constructed from the set of items
containing [
S' → ·S
].
- Notes:
- If each parsing table entry has at most one action, then the grammar
is said to be SLR(1). If any entry has more than one action,
then the algorithm fails to produce a parser.
- All undefined entries are made
error
.
- Example: the LR parsing table above is an SLR(1) parsing table for the
balanced-parentheses grammar.
4. DFA for Viable Prefixes
- A viable prefix
is a prefix of a right sentential form that does not continue past the
right end of the rightmost handle of that sentential form.
- The shift and goto functions of the canonical collection of sets
of LR(0) items for a grammar G define a DFA that recognizes the
viable prefixes of G.
- An item [
A → β·γ
] is valid for a viable prefix αβ
if there is a rightmost derivation from S'
to αAw
to αβγw
.
5. Practice Problems
- Consider the following grammar G:
(1) S → S S +
(2) S → S S *
(3) S → a
- Construct a rightmost derivation and parse tree for the input string
aaa*+$
.
- Show the handle in each sentential form
in the derivation.
- Construct the canonical collection of sets of LR(0) items for the augmented grammar.
- Construct an SLR(1) parsing table for G.
- Show how your SLR(1) parser processes the input string
aaa*+$
.
6. Reading
aho@cs.columbia.edu