COMS W3261 CS Theory
Lecture 2: Finite Automata and Regular Languages
1. The Chomsky Hierarchy of Classes of Languages
- In this course we will study the following classes of languages in the Chomsky hierarchy:
- The regular languages
- The context-free languages
- The recursive languages
- The recursively enumerable languages
- We will show that this is a proper hierarchy of classes of languages.
2. Deterministic Finite Automata
- A deterministic finite automaton (DFA) is one of the simplest
and most useful models of computation. A DFA is used to define or recognize
a language, called a regular language.
- A DFA has five components:
- A finite nonempty set of states Q.
- An input alphabet consisting of a finite set of symbols Σ.
- A transition function δ that maps Q × Σ to Q.
This transition function can be represented by a transition diagram
in which the nodes are labeled by states and arcs by input symbols.
- A start state which is one of the states in Q.
- A set of final (or accepting) states F that is a subset
(possibly empty) of Q.
- A DFA accepts an input string x if and only if there is a sequence of transitions
from the initial state to a final state that spells out x.
We will often say a DFA matches a string x if the DFA accepts x.
- L(D), the language defined by a DFA D, is the set of
strings accepted by D.
- A language defined by a DFA is said to be regular.
3. Example 1: DFA for Binary Strings with an Even Number of 0's
- Consider the following DFA D with:
- The finite set of states Q = {
A
, B
}
- The input alphabet Σ = {
0
, 1
}
- The transition function δ as represented by the following
transition table:
State |
Input Symbol |
0 |
1 |
A |
B |
A |
B |
A |
B |
- Start state
A
- The set of final states F = {
A
}
- Given the input string
10101
, D makes the following
sequence of transitions
from its start state A
:
- 1 0 1 0 1
- A A B B A A
- This sequence of transitions takes D from its start state
A
to the final state A
.
Thus, the input 10101
is accepted by D.
- Given the input string
101
, D makes the following transitions
from its start state A
:
- 1 0 1
- A A B B
- This sequence of transitions takes D from its start state
A
to the nonfinal state B
.
Therefore, the input 101
is not accepted by D.
- We can prove the language defined by D is the set of all strings of 0's and 1's having an even number of 0's.
4. Example 2: DFA for Binary Numbers Divisible by 3
- We can create a DFA to recognize all strings of
0's and 1's representing binary numbers divisible by three.
We assume the binary string 0 represents the number 0, 1 represents 1,
00 represents 0, 01 represents 1, 10 represents 2, 11 represents 3, and so on.
We will also assume the empty string represents the number 0.
- Consider the following DFA D with:
- The finite set of states Q = {
A
, B
, C
}
- The input alphabet Σ = {
0
, 1
}
- The transition function δ as represented by the following
transition table:
State |
Input Symbol |
0 |
1 |
A |
A |
B |
B |
C |
A |
C |
B |
C |
- Start state
A
- The set of final states F = {
A
}
- State
A
represents all prefixes of binary strings that are congruent to 0 mod 3,
state B
all prefixes congruent to 1 mod 3, and
state C
all prefixes congruent to 2 mod 3.
- Given the input string
1001
, D makes the following transitions
from its start state A
:
- 1 0 0 1
- A B C B A
- In this sequence of transitions D goes from its start state
A
to the final state A
.
Thus, the binary string 1001
, which represents the number 9, is accepted by
D.
- We can prove the language defined by D is the set of all binary strings representing integers divisible by three.
5. Nondeterministic Finite Automata
- A nondeterministic finite automaton (NFA) consists of
- A finite nonempty set of states Q.
- An input alphabet consisting of a finite set of symbols Σ.
- A transition function δ that maps Q × Σ to
P(Q), the set of subsets of Q.
Like a DFA, this transition function can be represented by a transition diagram
in which the nodes are labeled by states and arcs by input symbols.
Unlike a DFA, an NFA may have transitions to zero or more states from
a given state on a given input symbol.
- A start state that is one of the states in Q.
- A set of final (or accepting) states F
that is a subset of Q.
- An NFA accepts an input string x if there is a path of transitions
from the initial state to a final state that spells out x.
Note that, unlike for a DFA, in an NFA there may be more than one path from
the initial state to a final state that spells out x.
- Note the definition of an extended transition function
for an NFA. See Sect 2.3.3 of HMU.
- L(N), the language defined by an NFA N, is the set of
strings accepted by N.
6. Equivalence of DFA's and NFA's: the Subset Construction
- The subset construction transforms an NFA into an equivalent DFA (one that defines
the same language). The subset construction is one of the fundamental algorithms
in automata theory. It shows that NFAs are no more powerful than DFAs
in their ability to define languages.
- From an NFA N =
(QN, Σ, δN, q0,
FN),
the subset construction builds a DFA D =
(QD, Σ, δD, {q0},
FD)
that simulates in parallel all possible moves of N on any given input as follows:
- QD is the set of all subsets of
QN; that is,
QD is the power set of
QN.
- The input alphabet of D is the same as that of N.
- For each state S in QN and each input symbol
a in Σ,
- δD(S, a) is the union of all states in
δN(p, a) for all p in S.
- The start state of D is the set of states {q0}.
- FD, the set of final states of D, is the set of
subsets of states of QD that
contain a state in FN.
- We can prove by induction on the length of an input string w that D can reach
deterministic state S in QD on w
iff D can reach each
nondeterministic state p contained in S on w. See Sect. 2.3.5 of HMU.
7. A Bad Case for the Subset Construction
- Let L be the language
(a+b)*a(a+b)n-1, that is,
the set of
all strings of a's and b's in which the nth
character from the end is an a.
We can prove that every smallest DFA,
one that contains the fewest number of states, that accepts L must
have at least 2n states.
See Sect. 2.3.6 of HMU.
8. Practice Problems
- Let L be the language
{
w
| w
is any string of a
's and
b
's containing at least one a
and
at least one b
}.
- Construct a DFA for L.
- Show the behavior of your DFA processing the input string
aabaa
.
- Let L be the language
{
ab
xba
| x is any string of a
's,
b
's, and c
's not containing the substring ba
}
over the alphabet {a
, b
, c
}.
This language models comments in the programming language C.
- Construct a DFA for L.
- Show the behavior of your DFA processing the input string
abcbaba
.
- Let L be the language consisting of all strings of
a
's
and b
's containing the substring abba
.
Construct a DFA for L.
- Let L be the language consisting of all strings of
0
's
and 1
's having an even number of 0
's and an
even number of 1
's.
- Construct a DFA for L.
- Show the behavior of your DFA processing the input string
01100101
.
- Prove that your DFA is correct.
- Construct an NFA that accepts all strings of
a
's
and b
's ending in abb
.
- Show all sequences of moves that your NFA can make on the input string ababb.
- Use the subset construction to convert your NFA into an equivalent DFA.
- Construct an NFA that accepts all strings of
a
's
and b
's in which the third last character is an a
.
- Use the subset construction to convert your NFA into an equivalent DFA.
- Prove that any DFA recognizing this language must have at least eight states.
- Prove that the DFA in section 3 above accepts a string w iff w has an even number of 0's.
- Prove that the DFA in section 4 above accepts a binary string w iff w represents an integer divisible by three.
9. Reading
aho@cs.columbia.edu
verma@cs.columbia.edu