COMS W4115
Programming Languages and Translators
Lecture 8: Predictive Top-Down Parsers
February 16, 2015
Lecture Outline
- Top-down parsing
- FIRST
- FOLLOW
- How to construct a predictive parsing table
- LL(1) grammars
- Transformations on grammars
1. Top-Down Parsing
- Throughout this lecture, we assume we have been given a grammar G and an input string w of nonterminals (token names) that we want to parse.
- Top-down parsing consists of trying to construct a parse tree
for w
starting from the root of the tree and creating
the nodes of the parse tree in preorder.
- Equivalently, top-down parsing consists of trying to find a
leftmost derivation for w from the start symbol of G.
- Recursive-descent parsing is a top-down method of syntax
analysis in which a set of recursive procedures is used
to process the input string with a
procedure associated with each nonterminal of
the grammar. See Fig. 4.13, p. 219.
- A nonrecursive predictive parser uses an explicit stack and
a parsing table to do deterministic top-down parsing.
- In this class we will develop an algorithm to construct
predictive parsing tables for a large class of useful
grammars called LL(1) grammars.
- To implement this algorithm we need to define two functions on grammar symbols,
FIRST and FOLLOW.
2. FIRST
- FIRST(α), where α is a string of terminal and nonterminal symbols,
is the set of terminal symbols that are prefixes of the
strings
derivable from α.
If α can derive ε, then we add ε to FIRST(α).
- Algorithm to compute FIRST(X):
- If X is a terminal, then FIRST(X) = { X }.
- If X → ε is a production, then add ε to FIRST(X).
- If X
→ Y1 Y2 ... Yk
is a production for k ≥ 1, and
for some i ≤ k,
Y1Y2 ... Yi-1 derives the empty string,
and a
is in FIRST(Yi),
then add a
to FIRST(X).
If Y1Y2 ... Yk
derives the empty string,
then add ε to FIRST(X).
- Example. Consider the grammar G:
- For G, FIRST(
S
) = {(, ε
}.
3. FOLLOW
- FOLLOW(A), where A is a nonterminal of G, is the set of terminals that
can appear immediately
to the right of A in some sentential form of G.
Let us assume the string to be parsed is terminated by an end-of-string
endmarker $.
If A can be the rightmost symbol in some sentential form,
then we add the right endmarker $ to FOLLOW(A).
- Algorithm to compute FOLLOW(A) for all nonterminals A of a grammar:
- Place $ in FOLLOW(S) where S is the start symbol of the grammar.
- If A → αBβ is a production,
then add every terminal symbol
a
in FIRST(β)
to FOLLOW(B).
- If there is a production A → αB,
or a production A → αBβ,
where FIRST(β) contains ε,
then add every symbol in FOLLOW(A) to FOLLOW(B).
- Example. For G above, FOLLOW(
S
) = {)
, $}.
4. How to Construct a Predictive Parsing Table
- Input: Grammar G.
- Output: Predictive parsing table M.
- Method:
for (each production A → α in G) {
for (each terminal a in FIRST(α))
add A → α to M[A, a];
if (ε is in FIRST(α))
for (each symbol b in FOLLOW(A))
add A → α to M[A, b];
}
make each undefined entry of M be error;
Example 1. Predictive parsing table for the grammar:
FIRST(S
) = {+, *, a
}
FOLLOW(S
) = {+, *, a, $
}
Input Symbol
Nonterminal a + * $
S S → a S → +SS S → *SS error
Example 2. Predictive parsing table for the grammar:
FIRST(S
) = {(, ε
}
FOLLOW(S
) = {), $
}
Input Symbol
Nonterminal ( ) $
S S → (S)S S → ε S → ε
Example 3. Predictive parsing table for the grammar:
FIRST(S
) = {(, ε
}
FOLLOW(S
) = {(, ), $
}
Input Symbol
Nonterminal ( ) $
S S → S(S) S → ε S → ε
S → ε
5. LL(1) Grammars
- A grammar is LL(1) iff whenever
A → α | β
are two distinct productions, the following three conditions hold:
- For no terminal
a
do both α and β derive
strings beginning with a
.
- At most one of α and β can derive the empty string.
- If β derives the empty string, then α does not derive any string
beginning with a terminal in FOLLOW(
A
).
Likewise, if α derives the empty string, then β does not derive
any string beginning with a terminal in FOLLOW(A
).
- We can use the algorithm above to construct a predictive parsing
table with uniquely defined entries for any LL(1) grammar.
- The first "L" in LL(1) means scanning the input from left to right,
the second "L" for producing a leftmost derivation, and the "1" for
using one symbol of lookahead to make each parsing action decision.
6. Transformations on Grammars
- Two common language-preserving transformations are often applied to
grammars to try to make them parsable by top-down methods.
These are eliminating left recursion and left factoring.
- Eliminating left recursion:
expr → expr + term
| term
by
expr → term expr'
expr' → + term expr'
| ε
Left factoring:
stmt → if ( expr ) stmt else stmt
| if (expr) stmt
| other
by
stmt → if ( expr ) stmt stmt'
| other
stmt' → else stmt
| ε
7. Practice Problems
- Consider the following grammar G for Boolean expressions:
B → B or T | T
T → T and F | F
F → not B | ( B ) | true | false
- What precedence and associativity does this grammar give to the operators
and, or, not
?
- Compute FIRST and FOLLOW for each nonterminal in G.
- Transform G into an equivalent LL(1) grammar G'.
- Construct a predictive parsing table for G'.
- Show how your predictive parser processes the input string
true and not false or true
- Draw the parse tree traced out by your parser.
8. Reading
aho@cs.columbia.edu