Let be a finite alphabet, and let be an arbitrary language. We show that the language consisting of subsequences of strings in is regular. The next section formally defines what we are trying to prove. The material is essentially [1].
We will treat subsequences using orderings. Define the partial order in where
For language , let denote the upward closure, and denote the downward closure, define as
By abuse of notation, we also define and .
The main theorem can now be stated as follows:
For any , the languages and are regular.
To prove Theorem B, we start with a proposition and some lemmas.
If is regular, then so are and .
Let be an DFA that accepts . We construct NFAs by adding transitions to :
A subset is an antichain if its elements are pairwise incomparable. In other words, and for all .
Every antichain of is finite.
For every language , there exist finite such that
We show that Lemma E Lemma F Theorem B. The proof of Lemma E is the most complicated, so we leave it at the end. To conclude this section, we prove Theorem B from Lemma F.
Theorem.
and
are regular for any
.
Let and for finite as per Lemma F. Since and are both regular (Proposition C), so are and .
Let , the proof of Lemma F consists of two parts.
Lemma. There exists a finite
such that
.
Let be the set of minimal elements of :
It follows that is finite (by Lemma E), and .
Lemma. There exists a finite
such that
.
Let , then by definition. Suppose, for contradiction, , i.e. there exists . Since , let such that . However, because is also in , we have that , contradicting that .
Thus . Pick a finite such that , it follows that as wanted.
We are left to show that for a language , the set of minimal elements is finite, or in general, every antichain of is finite under the ordering (for subsequences). We start with a proposition, and prove the lemma by induction on alphabet size.
A subset is a chain iff all elements of are pairwise comparable.
If Lemma E holds, then every infinite subset of contains an infinite chain.
By contradiction, suppose is infinite, but every chain of is finite. Therefore, has infinitely many chains. By Lemma E, only has finitely many maximal elements, so there exists an element that is the maximal element of infinitely many chains. Thus, infinitely many and hence arbitrarily long strings of precede , i.e. are subsequences of : a contradiction.
Finally, we are ready to prove the lemma.
Lemma. Every antichain of
is finite.
By induction on . Note that an alphabet of size is trivial.
Assume the lemma holds for alphabet size , but is an infinite antichain for . There exists some shortest string such that for all ; if not, , and thus . Furthermore, choose such that is of minimum length. Note that .
Let and write
for each . Notice that, if , then each , which contradicts the induction hypothesis.
By the choice of (being the shortest), we have that for all but finitely many , i.e. there exists such that for all , . Without loss of generality, we can throw away the first strings from , and assume it holds for all (which is still infinitely many). Therefore, for each , there exists such that
where for each (e.g. by choosing the shortest , then the shortest , and so on). It is also the case that because otherwise .
We proceed to throw away more strings from . Formally, we construct a decreasing sequence of infinite index sets such that for every and , we have that whenever .
Let . Given , define
If is finite, then for some fixed string , the index set is infinite. If not, contains an infinite chain (by induction hypothesis)
and it suffices to let be an infinite increasing subsequence of .
Lastly, for belonging to , we have that for all as well. So for all , and
contradicting that (containing and ) is an antichain.