On 3/3/23 6:42 AM, Ralph Corderoy wrote:
I think Grant is after what Russ addresses in sentence
2. :-)
You are mostly correct. The motivation for this thread is very much so
wanting to learn "how best to use today's regular expression
implementations". However there is also the part of me that wants to
have a little bit of understanding behind why the former is the case.
Yes, Friedl does show that wonderfully. From long-ago
memory, Friedl
understands enough to have diagrams of NFAs and DFAs clocking through
their inputs, showing the differences in number of states, etc.
It seems like I need to find another copy of Friedl's book. -- My
current copy is boxed up for a move nearly 1k miles away. :-/
Yes, Friedl says an NFA must recursively backtrack.
As Russ says in #3,
it was a ‘widespread belief’. Friedl didn't originate it; I ‘knew’ it
before reading his book. Friedl was at the sharp end of regexps,
needing to process large amounts of text, at Yahoo! IIRC. He
investigated how the programs available behaved; he didn't start at the
theory and come up with a new program best suited to his needs.
It sounds like I'm coming from a similar position of "what is the best*
way to process this corpus" more than "what is the underlying theory
behind what I'm wanting to do".
Russ's stuff is great. He refuted that
widespread belief, for one
thing. But Russ isn't trying to teach a programmer how to best use the
regexp engine in sed, grep, egrep, Perl, PCRE, ... whereas Friedl takes
the many pages needed to do this.
:-)
It depends what one wants to learn first.
I'm learning that I'm more of a technician that wants to know how to use
the existing tools to the best of his / their ability. While having
some interest in theory behind things.
As Friedl says in the post Russ linked to:
‘As a user, you don't care if it's regular, nonregular, unregular,
irregular, or incontinent. So long as you know what you can expect
from it (something this chapter will show you), you know all you need
to care about.
Yep. That's the position that I would be in if someone were paying me
to write the REs that I'm writing.
‘For those wishing to learn more about the theory
of regular expressions,
the classic computer-science text is chapter 3 of Aho, Sethi, and
Ullman's Compilers — Principles, Techniques, and Tools (Addison-Wesley,
1986), commonly called “The Dragon Book” due to the cover design.
More specifically, this is the “red dragon”. The “green dragon”
is its predecessor, Aho and Ullman's Principles of Compiler Design.’
This all sounds interesting to me, and like something I might add to my
collection of books. But it also sounds like something that will be an
up hill read and vast learning opportunity.
In addition to the Dragon Book, Hopcroft and
Ullman's ‘Automata Theory,
Languages, and Computation’ goes further into the subject. Chapter two
has DFA, NFA, epsilon transitions, and uses searching text as an
example. Chapter three is regular expressions, four is regular
languages. Pushdown automata is chapter six.
Too many books, not enough time to read. :-)
Yep. Even inventorying and keeping track of the books can be time
consuming. -- Thankfully I took some time to do exactly that and have
access to that information on the super computer in my pocket.
--
Grant. . . .
unix || die