COMS W3261 CS Theory
Lecture 9: CFG's and PDA's
1. From a CFG to an Equivalent PDA
- Given a CFG G, we can construct a PDA P such that
N(P) = L(G).
- The PDA will simulate leftmost derivations of G.
- Algorithm to construct a PDA for a CFG
- Input: a CFG G = (V, T, Q, S).
- Method: Let P =
({q}, T, V ∪ T, δ, q, S) where
- δ(q, ε, A) =
{(q, β) | A → β is in Q }
for each nonterminal A in V.
- δ(q, a, a) = {(q, ε)}
for each terminal a in T.
- For a given input string w, the PDA can simulate a leftmost derivation
for w in G.
- We can prove that N(P) = L(G) by showing that
w is in N(P) iff w is in L(G):
- If part: If w is in L(G), then there is a leftmost derivation
S = γ1 ⇒ γ2 ⇒ … ⇒ γi ⇒ … ⇒ γn = w
Suppose the sentential
form γi = xiαi
where xi is a prefix of w and αi is
a sequence of input and stack symbols
for 1 ≤ i ≤ n.
We can show by induction on i that if P simulates this leftmost
derivation by the sequence of moves
- (q, w, S) |–*
(q, yi, αi),
then xiyi = w.
This shows that if S ⇒* w, then (q, w, S) |–*
(q, ε, ε). Thus, L(G) is contained in N(P).
Only-if part: If
(q, x, A) |–*
(q, ε, ε), then
A ⇒* x.
We can prove this statement by induction on the number of moves made
by P.
This shows that if (q, w, S) |–*
(q, ε, ε), then
S ⇒* w. Thus, N(P) is contained in L(G).
We can now conclude N(P) = L(G). Thus, every context-free language can
be recognized by a PDA.
2. From a PDA to an Equivalent CFG
- We now show that every language recognized by a PDA can be generated
by a context-free grammar.
- Given a PDA P, we can construct a CFG G such that
L(G) = N(P).
- The basic idea of the proof is to generate the strings that cause
P to go from state q to state p,
popping a symbol X off the
stack, using a nonterminal of the form [q X p].
- Algorithm to construct a CFG for a PDA
- Input: a PDA P =
(Q, Σ, Γ, δ, q0,
Z0, F).
- Output: a CFG G = (V, Σ, R, S)
such that L(G) = N(P).
- Method:
- Let the nonterminal S be the start symbol of G.
The other nonterminals in V will be symbols of the form
[p X q] where p and q are states in Q,
and X is a stack symbol in Γ.
- The set of productions R is constructed as follows:
- For all states p, R has the production
S → [q0 Z0 p].
- If δ(q, a, X) contains
(r, Y1 Y2 … Yk),
then R has the productions
- [q X rk] →
a[r Y1 r1]
[r1 Y2 r2] …
[rk-1 Yk rk]
- for all lists of states r1, r2, … ,
rk.
- We can prove that [q X p] ⇒* w iff
(q, w, X) |–* (p, ε, ε).
- From this, we have
[q0 Z0 p] ⇒* w iff
(q0, w, Z0) |–*
(p, ε, ε), so we can conclude
L(G) = N(P).
- Sections 6 and 7 allow us to conclude that family of languages generated by context-free
grammars is the same as the family of languages recognized by pushdown automata.
- In summary, the regular languages are a proper subset of the deterministic CFL’s which are a proper
subset of all CFL’s.
3. Eliminating Useless Symbols from a CFG
- A symbol X is useful for a CFG if there is a derivation of the
form S ⇒* αXβ ⇒* w
for some string of terminals w.
- If X is not useful, then we say X is useless.
- To be useful, a symbol X needs to be
- generating; that is, X needs to be able to derive
some string of terminals.
- reachable; that is, there needs to be a derivation of the form
S ⇒* αXβ where α and β
are strings of nonterminals and terminals.
- To eliminate useless symbols from a grammar, we
- identify
the nongenerating symbols and eliminate all productions containing
one or more of these symbols, and then
- eliminate all productions containing symbols that are not reachable
from the start symbol.
- In the grammar
S → AB | a
A → b
S
, A
, a
, and
b
are generating. B
is not generating.
Eliminating the productions containing the nongenerating symbols we get
S → a
A → b
Now we see A
is not reachable from S
, so
we can eliminate the second production to get
S → a
The generating symbols can be computed inductively bottom-up from the
set of terminal symbols.
The reachable symbols can be computed inductively starting from S
.
4. Eliminating ε-productions from a CFG
- If a language L has a CFG, then L - { ε } has a CFG without
any ε-productions.
- A nonterminal A in a grammar is nullable if
A ⇒* ε.
- The nullable nonterminals can be determined iteratively.
- We can eliminate all ε-productions in a grammar as follows:
- Eliminate all productions with ε bodies.
- Suppose A → X1X2 ... Xk
is a production and m of the k Xi's
are nullable. Then add the 2m versions of this
production where the nullable Xi's are present
or absent. (But if all symbols are nullable, do not add an
ε-production.)
- Let us eliminate the ε-productions from the grammar G
S → AB
A → aAA | ε
B → bBB | ε
S
, A
and B
are nullable.
For the production S → AB
we add the productions S → A | B
For the production A → aAA
we add the productions A → aA | a
For the production B → bBB
we add the productions B → bB | b
The resulting grammar H with no ε-productions is
S → AB | A | B
A → aAA | aA | a
B → bBB | bB | b
We can prove that L(H) = L(G) - { ε }.
5. Eliminating Unit Productions from a CFG
- A unit production is one of the form A → B
where both A and B are nonterminals.
- Let us assume we are given a grammar G with no ε-productions.
- From G we can create an equivalent grammar H with no unit productions
as follows.
- Define (A, B) to be a unit pair if
A ⇒* B in G.
- We can inductively construct all unit pairs for G.
- For each unit pair (A, B) in G, we add to H the productions
A → α where B → α is a
nonunit production of G.
- Consider our standard grammar G for arithmetic expressions:
E → E + T | T
T → T * F | F
F → ( E ) | a
The unit pairs are (E,E), (E,T), (E,F), (T,T), (T,F), (F,F)
.
The equivalent grammar H with no unit productions is:
E → E + T | T * F | ( E ) | a
T → T * F | ( E ) | a
F → ( E ) | a
6. Putting a CFG into Chomsky Normal Form
- A grammar G is in Chomsky Normal Form if each production in G is
one of forms:
- A → BC where A, B, and C are nonterminals except
B and C may not be the start symbol, or
- A → a where a is a terminal.
- If ε is in L(G), then G contains the production
S → ε.
- We will further assume G has no useless symbols.
- This is slight generalization of the definition of Chomsky Normal Form in
HMU to permit a CNF grammar to generate the empty string.
- Every context-free language can be generated by
a Chomsky Normal Form grammar.
- Let us assume we have a CFG G with no useless symbols.
We can transform G into an equivalent Chomsky Normal
Form grammar as follows:
- If L(G) contains ε, add the new starting production
S' → S where S' is a new start symbol
and S is the old start symbol and the new ε-production
S' → ε.
- Eliminate all ε productions except S' → ε
if it was added in step (1).
- Eliminate all unit productions.
- Arrange that all bodies of length two or more consist only of nonterminals
by replacing each terminal a in a body of length two or more
with a new nonterminal A' and adding the new production
A' → a.
- Replace bodies of length three or more with a cascade of productions, each with
a body of two nonterminals.
For example, we can replace the production A → BCDE
with the cascade of productions
- where C' and D' are new nonterminals.
- We can put the grammar H above into Chomsky Normal Form to get:
E → EA | TB | LC | a
A → PT
P → +
B → MF
M → *
L → (
C → ER
R → )
T → TB | LC | a
F → LC | a
7. Practice Problems
- Convert the following grammar
- to an equivalent PDA that accepts by empty stack.
- Convert your PDA from problem (1) to an equivalent PDA
that accepts by final state.
- Convert your PDA from problem (1) to an equivalent CFG.
- Eliminate useless symbols from the following grammar:
S → AB | CA
A → a
B → BC | AB
C → aB | b
- Put the following grammar into Chomsky Normal Form:
S → ASB | ε
A → aAS | a
B → BbS | A | bb
C → aB | b
8. Reading
aho@cs.columbia.edu
verma@cs.columbia.edu