COMS W3261 CS Theory
Lecture 10: Pumping Lemma for CFL's
1. The Pumping Lemma for CFL's
- Analogous to the pumping lemma for regular languages,
there is a pumping lemma for context-free languages.
The pumping lemma for CFL's can be used to show certain languages
are not context free.
- The pumping lemma for CFL's states that
for every infinite context-free language L,
there exists a constant n that depends on L such that
for all sentences z in L of length n or more, we can write
z as uvwxy where
- |vwx| ≤ n,
- vx ≠ ε (that is, v and x cannot both be empty), and
- for all i ≥ 0, the string uviwxiy
is in L.
- Outline of proof:
- The starting point is a Chomsky Normal Form grammar for L.
- An important property of a parse tree in a CNF grammar
is that if the length of a longest path in a parse tree for a sentence
in L is p,
then the length of the sentence is at most 2p-1.
This can be easily proven by induction on p. See HMU, Sect. 7.2.1, p. 280.
- Suppose a CNF grammar for L has m variables.
Let n = 2m and
consider a sentence z in L such that
|z| ≥ n-1. From the observation above, a parse tree for z
must have a path longer than m. Because the grammar has only m
variables, this means there must be at least two identical variables on a longest path
in that parse tree. Let
- A0, A1, …, Ai, …,
Aj, …, Ak
be the sequence of variables on this path where A0 = S
and Ai = Aj is the last repeated variable
on this path.
- This means the parse tree represents a derivation of the form
- A0 ⇒* uAiy
⇒* uvAjxy
⇒* uvwxy
- Since Ai is the last repeated variable along this path,
the length of vwx
must be less than or equal to n.
- Since the grammar is in CNF, at least one of v and x must be nonempty.
- Since Ai = Aj,
the portion of the derivation
Ai ⇒* vAjx
can be repeated in a derivation zero or more times.
- For more details, see HMU, Sect. 7.2.2, pp. 281-282.
- One important use of the pumping lemma is to prove certain
languages are not context free.
- Example: Let us use the pumping lemma to show that the language L =
{ anbncn | n ≥ 0 }
is not context free.
- The proof will be by contradiction. Assume L is context free.
Then by the pumping lemma there is a constant n associated with L
such that for all z in L with |z| ≥ n,
z can be written as uvwxy
such that
- |vwx| ≤ n,
- vx ≠ ε, and
- for all i ≥ 0, the string
uviwxiy is in L.
- Consider the string z =
anbncn.
- From condition (1), vwx cannot contain both a's and c's.
- Two cases arise:
- vwx has no c's. But then uwy cannot be in L
since at least one of v or x is nonempty.
- vwx has no a's. Again similarly, uwy cannot be in L.
- In both cases we have a contradiction, so we must conclude L cannot be context free.
The details of the proof can be found in HMU, p. 284.
- Note that the pumping lemma can be treated as an "adversarial game."
See HMU, Sect. 7.2.3, p. 283.
2. Cocke-Younger-Kasami Algorithm for Testing Membership in a CFL
- The Cocke-Younger-Kasami algorithm can be used to determine whether
a given input string w is generated by a given CFG G.
It does so by determining whether a parse tree exists for w in G.
- Input: a Chomsky Normal Form CFG G =
(V, T, P, S) and a string w =
a1a2 … an
in T*.
- Output: "yes" if w is in L(G), "no" otherwise.
- Method: The CYK algorithm is a dynamic programming algorithm that fills in
the cells
Xij
of
a triangular parsing table with nonterminals A
iff A ⇒*
aiai+1 ...
aj.
for i = 1 to n do
if A → ai is in P then
add A to Xii
fill in the table, row-by-row, from row 2 to row n
fill in the cells in each row from left-to-right
if (A → BC is in P) and for some i ≤ k < j
(B is in Xik) and (C is in Xk+1,j) then
add A to Xij
if S is in X1n then
output "yes"
else
output "no"
The algorithm starts by adding a nonterminal A to the cell
Xii
in the bottom row of the table if
there is a production A → a in P
where a = ai
for 1 ≤ i
≤ n
.
Then going up the table a row at a time, the
algorithm adds the nonterminal A to the cell
Xij
iff there is a
production A → BC in P and for some k
between
i
≤ k
< j
,
B is in Xik
and
C is in Xk+1,j
.
Since B is in Xik
, we already know B ⇒*
aiai+1 … ak, and since
C is in Xk+1,j
, we already know C ⇒*
ak+1ak+2 … aj.
This means we now know there is a derivation A ⇒*
aiai+1 … aj.
Note that to compute entry Xij
, the algorithm examines at most
n pairs of entries:
(Xii
, Xi+1,j
),
(Xi,i+1
, Xi+2,j
),
and so on up to pair
(Xi,j-1
, Xj,j
).
When the parsing table has been filled in,
the algorithm outputs "yes" if S is in cell X1n
;
"no" otherwise.
The running time of the algorithm is O(n3).
See HMU, p. 306, for a worked-out example of the CYK algorithm.
The Cocke-Younger-Kasami algorithm can be extended to construct from the
parsing table all parse trees for
the input string w if
w is in L(G).
3. Practice Problems
- Show that the language
{ ww | ww is a string of a's and b's }
is not context free.
- Show that the language
{ anbnci |
n ≥ 0 and i ≤ n }
is not context free.
- Show that the language
{ wwRw | w is a string
of a's and b's } is not context free.
- Construct the CYK parsing table for the input string
baaba
using
the following CNF grammar:
S → AB | BC
A → BA | a
B → CC | b
C → AB | a
- Construct a parse tree for the input string
baaba
from the parsing table created in problem (4).
4. Reading
aho@cs.columbia.edu
verma@cs.columbia.edu