COMS W3261 CS Theory
Lecture 4: Regular Expressions and Finite Automata
1. Equivalence of DFA's, NFA's, ε-NFA's, and Regular Expressions
- We say that two finite automata are equivalent if they define the same language.
- We have shown the subset construction can be used to convert
either an NFA or an ε-NFA into an equivalent DFA.
- We say that a regular expression and a finite automaton
are equivalent if they both define the same language.
- Using the McNaughton-Yamada-Thompson algorithm, we can construct
an equivalent ε-NFA from a regular expression
to show that regular expressions are no more powerful in defining
languages than finite automata.
- We can also show how to construct an equivalent regular expression from a DFA.
- These algorithms allow us to conclude that DFA's, NFA's, ε-NFA's, and regular
expressions are all equivalent in definitional power -- they
all define precisely the regular languages.
2. McNaughton-Yamada-Thompson Algorithm:
From an RE to an Equivalent ε-NFA
- Let R be a regular expression. Using the algorithm in
Section 3.2.3 of HMU, pp. 102-107, we can construct
an ε-NFA N such that L(N) = L(R).
- The algorithm takes as input
a regular expression R of length n and recursively constructs from
it an equivalent ε-NFA that has
- exactly one start state and one final state,
- at most 2n states,
- no arcs coming into its start state,
- no arcs leaving its final state,
- at most two arcs leaving any nonfinal state
- This algorithm was discovered by McNaughton and Yamada, and
independently by Ken Thompson who used it in the string-matching program
grep
on Unix. On an input string of length m,
an n-state MYT ε-NFA can be efficiently simulated
in time O(mn) using a two-stack algorithm.
(See pp. 157-159, A. Aho, M. Lam, R. Sethi, J. Ullman,
Compilers: Principles, Techniques, and Tools, Second Edition,
Addison Wesley, 2007, or Russ Cox's paper cited below for details.)
3. From a DFA to an Equivalent RE
- We will show that every regular language can be defined by a regular expression
by constructing from a DFA an equivalent regular expression.
- To facilitate this construction we introduce another form of epsilon-NFA
that we shall call a generalized NFA (GNFA). A GNFA is a just an epsilon-NFA in which
the labels on the arcs are regular expressions rather than epsilons or symbols from the
input alphabet. A GNFA accepts an input string w iff there is a path of
transitions from its start state to one of its final states in which
the language denoted by
the concatenation of the regular expressions along that path contains the
string w.
- Here is an algorithm
(sometimes called the "ripping algorithm") for constructing an equivalent
regular expression
from a DFA D.
- Transform D into a GNFA G by creating
the new start state S and the new distinct accept state A.
- In G, add an ε-transition from S to the old start
state of D.
There will be no transitions in G into S.
There will be a transition out of S to every other state in G,
labeled by ∅ if it has no other label.
- In G, add an ε-transition from each old final state of
D to A.
There will be no transitions in G out of A.
- If an arc has multiple labels, say a, b, c, replace these
labels by the regular expression a+b+c.
- If there is no arc from a state s to a state t,
then add an arc from s to t in G with the label ∅.
(We won't normally show these arcs.)
- As long as old DFA states remain in G, repeatedly
remove a DFA state r from G.
For every pair of states p and q in G, let
Rpr, Rrr, Rrq, and Rpq be the current
regular expressions labeling the arcs from p to r, r to r,
r to q, and p to q,
respectively.
Then add the regular expression
- (Rpr)(Rrr)*(Rrq) + (Rpq)
as the new label on the arc from state p to state q.
We do this even when p = q.
- When only S, the new start state, and A, the new accept state, of G remain,
the regular expression on the arc from S to A is the equivalent regular
expression for D.
- We can show that as each DFA state is removed, the resulting
GNFA is equivalent to the previous one and hence to the original DFA.
- This algorithm can be easily generalized to transform an
epsilon-NFA into an equivalent regular expression.
- This result shows that DFAs, NFAs, epsilon-NFAs, GNFAs, and regular expressions
are all equivalent in their defining power. They all define exactly
the class of regular languages.
4. Practice Problems
- Using structural induction prove that the MYT algorithm produces an
ε-NFA with at most 2n states from a regular expression of
length n.
- Consider the regular expression R =
a+b*a
.
- Use the MYT algorithm to construct an equivalent ε-NFA
for the regular expression R.
- Simulate the behavior of your ε-NFA on the input string
bba
using the two-stack algorithm.
- Use the subset construction to convert your ε-NFA into a DFA.
- Show the behavior of your DFA on the input string
bba
.
- Convert the following DFA into an equivalent regular expression.
A
is the start state and {B
} is the set of final states.
State |
Input Symbol |
0 |
1 |
A |
A |
B |
B |
B |
B |
- Convert the following DFA into an equivalent regular expression.
A
is the start state and the only final state.
State |
Input Symbol |
0 |
1 |
A |
B |
A |
B |
A |
B |
- Convert the following DFA into an equivalent regular expression.
A
is the start state and
{B
, C
} is the set of final states.
State |
Input Symbol |
0 |
1 |
A |
B |
C |
B |
A |
B |
C |
B |
A |
- Extend the ripping algorithm to apply to epsilon-NFAs.
- Prove that the ripping algorithm constructs an equivalent regular expression for the DFA to which it is applied.
5. Reading
aho@cs.columbia.edu
verma@cs.columbia.edu