Hi,
Doug wrote:
The trick: From recursive equations (easily derived
from the grammar
of REs), I counted how many REs exist up to various limits on token
counts, Then I generated all strings that satisfied those limits,
turned the recognizer loose on them and counted how many it accepted.
Which reminded me of Doug's paper.
Enumerating the strings of regular languages,
J. Functional Programming 14 (2004) 503-518
Haskell code is developed for two ways to list the strings of the
language defined by a regular expression: directly by set operations
and indirectly by converting to and simulating an equivalent
automaton. The exercise illustrates techniques for dealing with
infinite ordered domains and leads to an effective standard form for
nondeterministic finite automata.
PDF preprint:
https://www.cs.dartmouth.edu/~doug/nfa.pdf
It's also nice for the NFA construction with one state per symbol plus
one final state, and no epsilon transitions. Doug writes:
The even-a language (ab*a|b)* is defined by automaton h, with three
start states.
h0 = State 0 ’~’ []
h1 = State 1 ’b’ [h4,h1,h0]
h2 = State 2 ’a’ [h4,h1,h0]
h3 = State 3 ’b’ [h2,h3]
h4 = State 4 ’a’ [h2,h3]
h = [h4,h1,h0]
The symbols replaced by their state numbers gives (43*2|1)*; state 0 is
the sole final state.
--
Cheers, Ralph.