COMS W4115
Programming Languages and Translators
Lecture 7: Parsing Context-Free Grammars
February 11, 2015
Outline
- Proving a language is not context free
- Closure and decidability results for context-free languages
- The parsing problem for context-free grammars
- Top-down parsing
- Transformations on grammars
1. Proving a Language is Not Context Free
- In Lecture 5, Section 5, we stated the pumping lemma for regular languages.
This lemma can be used to prove certain languages are not regular.
- There is an analogous result for context-free languages, called
the pumping lemma for context-free languages, that can be used to show certain
languages are not context free.
- The pumping lemma for context-free languages says: If L is a context-free language,
then there exists a
constant n such that if z is any string in L of length n or more, then
z can be written as uvwxy subject to the following conditions:
- The length of vwx is less than or equal to n.
- The length of vx is one or more. (That is, not both of v and x can be empty.)
- For all i ≥ 0, uviwxiy is in L.
- A typical proof using the pumping lemma to show a language L is not context free
proceeds by assuming L is context free, and then finding a long string in L
which, when pumped, yields a string not in L, thereby deriving a contradiction.
- Examples of non-context-free languages:
- {
anbncn
| n
≥ 0 }
- {
ww
| w
is in (a|b)* }
- {
ambnambn
|
n
≥ 0 }
2. Closure and Decidability Results for Context-Free Languages
- Like the regular languages, the context-free languages are closed under
the following operations:
- union
- reversal
- Kleene star
- homomorphism
- inverse homomorphism
- Unlike the regular languages, the context-free languages are not closed
under intersection or complement.
- Decidability results
- Given a CFG G and a string w, it is decidable whether w is in L(G).
- Given a CFG G, it is decidable whether L(G) is empty.
- Given a CFG G, it is undecidable whether G is ambiguous.
- Given two CFGs G1 and G2, it is undecidable whether
L(G1) = L(G2).
3. The Parsing Problem for Context-Free Grammars
- The parsing problem for context-free grammars
is given a CFG G and an input string w
to construct all parse trees for w according to G, if w is in L(G).
- The Cocke-Younger-Kasami algorithm is a dynamic programming algorithm
that given a Chomsky Normal Form grammar G and an input string w will
create in O(|w|3) time a table from which all parse trees
for w according to G can be constructed.
- For compiler applications two styles of parsing algorithms are common:
top-down parsing and bottom-up parsing.
4. Top-Down Parsing
- Top-down parsing consists of trying to construct a parse tree
for an input string starting from the root and creating
the nodes of the parse tree in preorder.
- Equivalently, top-down parsing consists of trying to find a
leftmost derivation for the input string.
- Consider grammar G:
S → + S S | * S S | a
Leftmost derivation for + a * a a:
S ⇒ + S S
⇒ + a S
⇒ + a * S S
⇒ + a * a S
⇒ + a * a a
Recursive-descent parsing
- Recursive-descent parsing is a top-down method of syntax
analysis in which a set of recursive procedures is used
to process the input string.
- One procedure is associated with each nonterminal of
the grammar. See Fig. 4.13, p. 219.
- The sequence of successful procedure calls defines the parse tree.
Nonrecursive predictive parsing
- A nonrecursive predictive parser uses an explicit stack.
- See Fig. 4.19, p. 227, for a model of table-driven predictive
parser.
- Parsing table for G:
Input Symbol
Nonterminal a + * $
S S → a S → +SS S → *SS
Moves made by this predictive parser on input +a*aa
.
(The top of the stack is to the left.)
Stack Input Output
S$ +a*aa$
+SS$ +a*aa$ S → +SS
SS$ a*aa$
aS$ a*aa$ S → a
S$ *aa$
*SS$ *aa$ S → *SS
SS$ aa$
aS$ aa$ S → a
S$ a$
a$ a$ S → a
$ $
Note that these moves trace out a leftmost derivation for the input.
5. Transformations on Grammars
- Two common language-preserving transformations are often applied to
grammars to try to make them parsable by top-down methods.
These are eliminating left recursion and left factoring.
- Eliminating left recursion:
expr → expr + term
| term
by
expr → term expr'
expr' → + term expr'
| ε
Left factoring:
stmt → if ( expr ) stmt else stmt
| if (expr) stmt
| other
by
stmt → if ( expr ) stmt stmt'
| other
stmt' → else stmt
| ε
6. Practice Problems
- Write down a CFG for regular expressions over the alphabet
{
a
, b
}.
Show a parse tree for the regular expression
a | b*a
.
- Using the nonterminals
stmt
and expr
,
design context-free grammar productions to model
- C while-statements
- C for-statements
- C do-while statements
- Consider grammar G:
S → S S + | S S * | a
- What language does this grammar generate?
- Eliminate the left recursion from this grammar.
- Use the pumping lemma to show that
{
anbncn
| n
≥ 0 }
is not context free.
7. Reading
- ALSU, Sects. 2.4, 4.3, 4.4
aho@cs.columbia.edu