Dr. McIlroy,
Much thanks for this! If you don't object, I will add this note to the
repo as it provides insight into the wherefores of the package.
At this point, I must also give credit where credit is due:
* Chet Ramey, who suggested that I ask Russ Cox to take a look at the package,
* Russ Cox, who fixed the major problems and got all the tests to pass,
* Rares Aioanei, who volunteered to tackle fixing things but did not get
to do so before Russ beat him to it.
As implied by the above, the package is now up-to-date and functional!
I hope it's of interest to the broader community.
My own reason for seeking this out is that I have (likely vain) hopes
of one day finding a better regex package to use for gawk. But
regular expressions are interesting in their own right. Russ Cox has
a series of papers on his web site about them that are worth reading.
Finally, thanks again to Dr. McIlroy for humoring me and giving me
his code to play with.
Arnold
Doug McIlroy <doug(a)cs.dartmouth.edu> wrote:
Why would anyone be interested in an old regex package
that never was
a part of any Unix distro?
The driving force was Posix, whose regex spec was quite inscrutable. Could
there be a reference implementation? It was easy to fool every
implementation I could get my hands on, including Gnu's over-the-top
9000-line implementation.
But as I got into it, I got fascinated by regexes per se. In making a
recognizer, there's a tradeoff between contruction time and execution
time. Linear execution can be achieved, but at a potentially exponential
cost in construction time (and space). Backreferencing takes the regex
languages out of the class of regular languages.
Recalling that regular languages are closed under intersection and
negation, I wondered about how to implement new regex operators, &
and -. I came up with a scheme for this optional non-Posix feature that
involved layering continuation-passing over more traditional methods. And
while I was at it, I broke out smaller sublanguages for special treatment
(as does Gnu), all the way down to Knuth-Morris-Pratt for expressions
in which the only operation is catenation.
And finally, having followed the development of C++ from its infancy,
I wanted to try out its new template facility, so there's a bit of
that in the package, too. Arnold has discovered that not only has C++
evolved, but also that without the discipline of -Wall to force clean
code, I was rather cavalier about casting, both explicitly and implicitly.
The only real customer the code ever had was the AST project, which
translated it to C. After the C++ had sat idle for a half-dozen years, I
thought to revive it in Linux, but found it riddled with incompatibilities
with that new environment and gave up. Arnold deserves a citation for
bravery in pushing that through 15 years further on.
Doug