COMS W3261 CS Theory
Lecture 5: Properties of Regular Languages - I
1. Closure Properties of Regular Languages
- Given a set, a closure property of the set is an operation that
when applied to members of the set always returns as its answer a member of
that set. For example, the set of integers is closed under addition.
- The set of regular languages is closed under the operations of
- union, intersection, complement, difference
- concatenation, Kleene closure
- reversal
- homomorphism, inverse homomorphism
- We can use finite automata or regular expressions to prove these closure properties
about the set of regular languages.
- For example, given two DFA's D1 and D2 for
regular languages L1 and L2, respectively,
we can construct an ε-NFA for the concatenation
L1L2
by adding epsilon-transitions from the final states of D1 to the
start state of D2, making the start state of D1
the start state of the ε-NFA, and making the set of final states of
D2 the set of final states of the ε-NFA.
- Alternatively, given two regular expressions R1 and
R2 for L1 and L2, respectively,
the regular expression R1R2 generates
L1L2.
- Some closure properties are easier to prove using finite automata and others using
regular expressions.
- Closure properties for regular languages are often useful
in proving that a given language is regular.
- For example, we can show that the set of strings of a's and b's that do
not contain the substring abb is regular by pointing out that
this set is the complement of the language generated by the regular expression
(a+b)*abb(a+b)*.
2. The Pumping Lemma for Regular Languages
- The pumping lemma gives a necessary (but not sufficient) condition
for a language to be regular.
It is useful for proving given languages are not regular, and hence cannot be
recognized by a finite automaton or defined by a regular expression.
- The pumping lemma for regular languages
states that for every nonfinite regular language L,
there exists a constant n that depends on L such that
for all w in L with |w| ≥ n, there exists
a decomposition
of w into xyz such that the following three conditions must hold:
- y ≠ ε,
- |xy| ≤ n, and
- for all k ≥ 0, the string xykz is in L.
- Proof.
- Suppose D is a DFA such that L(D) = L.
Consider a long input string
w = a1a2 … am where
m ≥ n and let
q0q1q2 … qm
be the sequence of states that D goes through in processing the
input string w.
By the pigeonhole principle, there must be at least one repeated
state in this sequence of states.
Let qi be the first repeated state in this sequence so there
is a closest following state
qj, i < j ≤ n, such that
qj = qi.
- We can now break w into three substrings x, y, z such
that w = xyz and
- x = a1a2 … ai
takes D from state q0
to state qi
- y =
ai+1ai+2 … aj
takes D from state qi
to state qj
- z =
aj+1aj+2 … am
takes D from state qj
to state qm
- The three conditions of the pumping lemma now follow:
- y cannot be the empty string since j is strictly greater than i.
- The total length of xy must be less than or equal to n since
qi is the first repeated state and j ≤ n.
- We can repeat the substring y within w zero or more times since it takes
D from state qi
to state qj
and qj = qi.
So D must accept the inputs xz, xyz, xyyz, xyyyz,
and so on.
- One important use of the pumping lemma is to prove some
languages are not regular.
- Example: The language L consisting of all strings of a's and b's
of the form aibi, i ≥ 0,
is not regular.
- The proof will be by contradiction. Assume L is regular.
Then by the pumping lemma there is a constant n associated with L
such that for all w in L with |w| ≥ n,
w can be written as xyz
such that
- y ≠ ε,
- |xy| ≤ n, and
- for all k ≥ 0, the string xykz
is in L.
- Since |xy| ≤ n, xy = am
for some 0 < m ≤ n.
- Setting k = 0, condition (3) of the pumping lemma says xz must also be
in L.
- But xz is of the form apbn,
where p < n.
- This violates the requirement that xz must be in L.
We have now demonstrated a contradiction to the assertion that
L is regular.
- We can capture the essence of this proof with the observation "finite automata
can't count."
3. Practice Problems
- Let FirstHalf(L) =
{ x | xy is in L and |x| = |y| }.
Show that if L is regular, then FirstHalf(L) is also regular.
- Show that the language consisting of all strings of balanced
parentheses is not regular.
- Show that the language consisting of all strings of a's and b's
that read the same forwards as backwards is not regular.
- Prove that the language
L =
{ w | w = aibj
where i is not equal to j }
is not regular.
- [Hard] Let FTLT(L) =
{ xz | xyz is in L and |x| = |y| = |z| }.
(FTLT(L) concatenates
the first-third and last-third of strings in L whose length is a multiple of 3.)
Show that if L is regular, then FTLT(L) is not always regular.
4. Reading
aho@cs.columbia.edu
verma@cs.columbia.edu