Hi Dan,
It sounds to me like an "optimizer" is needed. There is alreay a compiler
that uses FA's. Is someone else going to create a program
to look for dates without using regular expressions?
Today, I write small-sized RE's. If I write a giant RE, there is nothing
preventing
the owner of RE world to change how they are used. For instance. Compile
your RE
and a subroutine/function is produced that performs the RE search.
RE is a *language*, not necessarily an implementation.
At least that is my understanding.
Ed
On Mon, Mar 6, 2023 at 3:02 PM Dan Cross <crossd(a)gmail.com> wrote:
On Mon, Mar 6, 2023 at 5:02 AM Ed Bradford
<egbegb2(a)gmail.com> wrote:
[snip]
I would like to extend my program to
any date format. That would require
a much bigger RE. I have been led to
believe that a 50Kbyte or 500Kbyte
RE works just as well (if not
as fast) as a 100 byte RE. I think
with parentheses and
pipe-symbols suitably used,
one could match
Monday, March 6, 2023
2023-03-06
Mar 6, 2023
or
...
This reminds me of something that I wanted to bring up.
Perhaps one _could_ define a sufficiently rich regular expression that
one could match a number of date formats. However, I submit that one
_should not_. REs may be sufficiently powerful, but in all likelihood
what you'll end up with is an unreadable mess; it's like people who
abuse `sed` or whatever to execute complex, general purpose programs:
yeah, it's a clever hack, but that doesn't mean you should do it.
Pick the right tool for the job. REs are a powerful tool, but they're
not the right tool for _every_ job, and I'd argue that once you hit a
threshold of complexity that'll be mostly self-evident, it's time to
move on to something else.
As for large vs small REs.... When we start talking about differences
of orders of magnitude in size, we start talking about real
performance implications; in general an NDFA simulation of a regular
expression will have on the order of the length of the RE in states,
so when the length of the RE is half a million symbols, that's
half-a-million states, which practically speaking is a pretty big
number, even though it's bounded is still a pretty big number, and
even on modern CPUs.
I wouldn't want to poke that bear.
- Dan C.