COMS W4115
Programming Languages and Translators
Lecture 9: Bottom-Up Parsing
February 18, 2015

Lecture Outline

Bottom-up parsing
LR(1) parsing
Constructing a simple LR(1) parser
DFA for viable prefixes

1. Bottom-up Parsing

Bottom-up parsing can be viewed as trying to find a rightmost derivation in reverse for an input string.
A handle is a rightmost substring in right-sentential form that matches the body of a production such that replacing that right-hand side by the nonterminal on the left-hand side of that production represents one step in the reverse of the rightmost derivation.

Consider the grammar G:


      (1) S → S ( S )
      (2) S → ε

The handles in a rightmost derivation for the input string ( ) ( ):


      S ⇒ S ( S )      // handle is S ( S )
        ⇒ S ( )        // handle is the empty string between ( )
        ⇒ S ( S ) ( )  // handle is S ( S )
        ⇒ S ( ) ( )    // handle is the empty string between first ( )
        ⇒ ( ) ( )      // handle is the empty string prefix

Shift-reduce parsing is a form of bottom-up parsing in which we shift terminal symbols of the string to be parsed onto a stack until a handle appears on top of the stack. We then replace the handle by the nonterminal symbol on the left-hand side of the associated production (this is a "reduce" action). We keep repeating this process until we have reduced the input string to the start symbol of the grammar. This process simulates the reverse of a rightmost derivation for the input string. Thus, we can think of shift-reduce parsing as "handle pruning."

2. LR(1) Parsing

Model of an LR(1) parser (Fig. 4.35).
"L" means left-to-right scanning of the input, the "R" means constructing a rightmost derivation in reverse, and the "1" means one symbol of lookahead in making parsing decisions.
LR parsing table for G:

`state`	`action`	`goto`
`(`	`)`	`$`	`S`
`0`	`r2`	`r2`	`r2`	`1`
`1`	`s2`		`acc`
`2`	`r2`	`r2`	`r2`	`3`
`3`	`s2`	`s4`
`4`	`r1`	`r1`	`r1`

The entry action[1,(] = s2 means when in state 1 scanning the token '(', shift the token '(' on top of the stack and then push state 2 on top of the stack. The input pointer is advanced to the next token.
The entry action[0,(] = r2 means when in state 0 scanning the token '(', reduce the handle consisting of the empty string on top of the stack according to production (2) S → ε. Put the nonterminal S on the left side of the production on top of the stack and then put state 1 = goto[0,S] on top of the stack. The input pointer is not advanced.
The entry action[4,(] = r1 means when in state 4 scanning the token '(', reduce the handle consisting of the string S(S) on top of the stack according to production (1) S → S(S). The handle and the intervening states are removed. The nonterminal S on the left side of the production is then put on top of the stack and then state 1 = goto[0,S] is put on top of the stack. The input pointer is not advanced.
acc means accept and stop parsing.
A blank entry means report a syntax error and stop parsing.

Moves made by an LR(1) parser on input ( ) ( ) [Alg. 4.44].


    Stack      Input           Action
    0          ()()$    reduce by (2) S → ε and push state 1 on stack
    0S1        ()()$    shift ( on stack; push state 2 on stack
    0S1(2       )()$    reduce by (2) S → ε and push state 3
    0S1(2S3     )()$    shift ) and push state 4
    0S1(2S3)4    ()$    reduce by (1) S → S(S) and push state 1
    0S1          ()$    shift ( and push state 2
    0S1(2         )$    reduce by (2) S → ε and push state 3
    0S1(2S3       )$    shift ) and push state 4
    0S1(2S3)4      $    reduce by (1) S → S(S) and push state 1
    0S1            $    accept

Note that an LR parser is a shift-reduce parser that traces out a rightmost derivation in reverse.

3. Constructing a Simple LR(1) Parsing Table for a Grammar

An LR(0) item of a grammar is a production of the grammar with a dot at some position of the right side. E.g., S → ·S(S), S → S·(S), or S → S(S)·.
We will use two functions to construct the sets of items for a grammar:

closure(I), where I is a set of items, is the set of items constructed by the following two rules:

Initially, put every item in I into closure(I).
If A → α·Bβ is in closure(I) and B → γ is a production, then add the item B → ·γ to closure(I) if it is not already there. Keep repeating this step until no more new items can be added to I.

goto(I, X), where I is a set of items and X is a grammar symbol, is the closure of the set of all items A → αX·β where A → α·Xβ is in I.

An augmented grammar G' is one to which we have added a new starting production S' → S where S is the start symbol of the given grammar G. Reducing by the new starting production signals acceptance of the input string being parsed. We will always augment a grammar when we construct an SLR parsing table for it.

The sets-of-items construction

Input: An augmented grammar G'.
Output: C, the canonical collection of sets of LR(0) items for G'.
Method:


   I₀ = closure({[S' → ·S]});
   C = {I₀};
   repeat
     for each set of items I in C and grammar symbol X such that
       goto(I,X) is not empty and not in C do
         add goto(I,X) to C;
   until no more sets of items can be added to C;

Example: Given the augmented grammar G'


  S' → S
  S  → S(S)
  S  → ε

C, the canonical collection of sets of LR(0) items for G', is


  I₀: S' → ·S
      S → ·S(S)
      S → ·

  I₁: S' → S·
      S → S·(S)

  I₂: S → S(·S)
      S → ·S(S)
      S → ·

  I₃: S → S(S·)
      S → S·(S)

  I₄: S' → S(S)·

Algorithm to construct the SLR(1) parsing table from C, the canonical collection of sets of LR(0) items for an augmented grammar G'

Input: C = {I₀, I₁, ... , I_n}.
Output: The SLR parsing table functions action and goto.
Method:

State i and its action and goto functions are constructed from I_i as follows:

If item [A→ α·aβ] is in I_i and goto(I_i, a) = I_j, then add "shift j" to action[i, a]. Here a is a terminal.
If item [A → α·] is in I_i, then add "reduce A → α" to action[i, a] for all a in FOLLOW(A). Here A cannot be S'.
If item [S' → S·] is in I_i, then add "accept" to action[i, $].

If goto(I_i, A) = I_j, then in the parsing table set goto[i, A] = j.
The initial state of the parser is constructed from the set of items containing [S' → ·S].

Notes:

If each parsing table entry has at most one action, then the grammar is said to be SLR(1). If any entry has more than one action, then the algorithm fails to produce a parser.
All undefined entries are made error.

Example: the LR parsing table above is an SLR(1) parsing table for the balanced-parentheses grammar.

4. DFA for Viable Prefixes

A viable prefix is a prefix of a right sentential form that does not continue past the right end of the rightmost handle of that sentential form.
The shift and goto functions of the canonical collection of sets of LR(0) items for a grammar G define a DFA that recognizes the viable prefixes of G.
An item [A → β·γ] is valid for a viable prefix αβ if there is a rightmost derivation from S' to αAw to αβγw.

5. Practice Problems

Consider the following grammar G:


     (1) S → S S +
     (2) S → S S *
     (3) S → a

Construct a rightmost derivation and parse tree for the input string aaa*+$.
Show the handle in each sentential form in the derivation.
Construct the canonical collection of sets of LR(0) items for the augmented grammar.
Construct an SLR(1) parsing table for G.
Show how your SLR(1) parser processes the input string aaa*+$.

6. Reading

ALSU, Sects. 4.5, 4.6.

aho@cs.columbia.edu

COMS W4115 Programming Languages and Translators Lecture 9: Bottom-Up Parsing February 18, 2015

Lecture Outline

1. Bottom-up Parsing

2. LR(1) Parsing

3. Constructing a Simple LR(1) Parsing Table for a Grammar

4. DFA for Viable Prefixes

5. Practice Problems

6. Reading

COMS W4115
Programming Languages and Translators
Lecture 9: Bottom-Up Parsing
February 18, 2015