COMS W4115
Programming Languages and Translators
Lecture 5: Implementing a Lexical Analyzer
February 4, 2015
Outline
- Finite automata
- Converting an NFA to a DFA
- Equivalence of regular expressions and finite automata
- Simulating an NFA
- The pumping lemma for regular languages
- Closure and decision properties of regular languages
1. Finite Automata
- Variants of finite automata are commonly used to match regular expression patterns.
This lecture summarizes the relevant results linking finite automata and regular
expressions usually covered in computer science theory courses.
Of particular relevance to the implementation of lexers are the algorithms that
use deterministic and nondeterministic
finite automata to match regular expression patterns.
- A nondeterministic finite automaton (NFA) consists of
- A finite set of states S.
- An input alphabet consisting of a finite set of symbols Σ.
- A transition function δ that maps S × (Σ ∪ {ε})
to subsets of S. This transition function can be represented
by a transition graph in which the nodes are labeled by states
and there is a directed edge labeled a from node w to node v
if δ(w, a) contains v.
- An initial state s0 in S.
- F, a subset of S, called the final (or accepting) states.
- An NFA accepts an input string x iff there is a path in the transition
graph from the initial state to a final state that spells out x.
- The language defined by an NFA is the set of strings accepted by the NFA.
- A deterministic finite automaton (DFA) is an NFA in which
- There are no ε moves, and
- For each state s and input symbol a there is exactly one transition
out of s labeled a.
2. Converting an NFA to a DFA
- Every NFA can be converted to an equivalent DFA using the subset construction
(Algorithm 3.20, ALSU, pp. 153-154).
- Every DFA can be converted into an equivalent minimum-state DFA
Using Algorithm 3.39, ALSU, pp. 181-183.
All equivalent minimum-state DFAs are isomorphic up to state renaming.
3. Equivalence of Regular Expressions and Finite Automata
- Regular expressions and finite automata define the same
class of languages, namely the regular sets.
- Every regular expression can be converted into an equivalent
NFA using the McNaughton-Yamada-Thompson algorithm
(Algorithm 3.23, ALSU, pp. 159-161).
- Every finite automaton can be converted into a regular expression
using Kleene's algorithm.
4. Simulating an NFA
- Two-stack simulation of an NFA: Algorithm 3.22, ALSU, pp. 156-159.
5. The Pumping Lemma for Regular Languages
- The pumping lemma allows us to prove certain languages, like
{
a
nb
n |
n ≥ 0 }, are not regular.
- The pumping lemma. If L is a regular language, then there exists a
constant n associated with L such that for every string w in L where
|w| ≥ n, we can partition w into three strings
xyz (i.e., w = xyz) such that
- y is not the empty string,
- the length of xy is less than or equal to n, and
- for all k ≥ 0, the string xykz is in L.
6. Closure and Decision Properties of Regular Languages
- The regular languages are closed under the following operations:
- union
- intersection
- complement
- reversal
- Kleene star
- homomorphism
- inverse homomorphism
- Decision properties
- Given a regular expression r and a string w, it is decidable
whether r matches w.
- Give a finite automaton A, it is decidable whether L(A) is empty.
- Given two finite automata A and B, it is decidable whether L(A) = L(B).
7. Practice Problems
- Write down deterministic finite automata for the following regular expressions:
(a*b*)*
(aa|bb)*((ab|ba)(aa|bb)*(ab|ba)(aa|bb)*)*
a(ba|a)*
ab(a|b*c)*bb*a
- Construct a deterministic finite automaton that will recognize all strings
of 0's and 1's representing integers that are divisible by 3.
Assume the empty string represents 0.
- Use the McNaughton-Yamada-Thompson algorithm to convert the regular
expression
a(a|b)*a
into a nondeterministic finite automaton.
- Convert the NFA of (3) into a DFA.
- Minimize the number of states in the DFA of (4).
8. Reading Assignment
aho@cs.columbia.edu