[TUHS] Testing an RE recogniser exhaustively. (Was: A fuzzy awk)

27 May 2024

Hi,
Doug wrote:
...
  The trick: From recursive equations (easily derived
from the grammar
 of REs), I counted how many REs exist up to various limits on token
 counts,  Then I generated all strings that satisfied those limits,
 turned the recognizer loose on them and counted how many it accepted.

Which reminded me of Doug's paper.
    Enumerating the strings of regular languages,
    J. Functional Programming 14 (2004) 503-518
    Haskell code is developed for two ways to list the strings of the
    language defined by a regular expression: directly by set operations
    and indirectly by converting to and simulating an equivalent
    automaton.  The exercise illustrates techniques for dealing with
    infinite ordered domains and leads to an effective standard form for
    nondeterministic finite automata.
    PDF preprint: https://www.cs.dartmouth.edu/~doug/nfa.pdf
It's also nice for the NFA construction with one state per symbol plus
one final state, and no epsilon transitions.  Doug writes:
    The even-a language (ab*a|b)* is defined by automaton h, with three
    start states.
        h0 = State 0 ’~’ []
        h1 = State 1 ’b’ [h4,h1,h0]
        h2 = State 2 ’a’ [h4,h1,h0]
        h3 = State 3 ’b’ [h2,h3]
        h4 = State 4 ’a’ [h2,h3]
        h = [h4,h1,h0]
The symbols replaced by their state numbers gives (43*2|1)*; state 0 is
the sole final state.
--
Cheers, Ralph.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

[TUHS] Testing an RE recogniser exhaustively. (Was: A fuzzy awk)