John Cowan <cowan(a)mercury.ccil.org> wrote:
Dave Horsfall scripsit:
I'm still trying to get my around about how
a program such as "egrep"
which handles complex patterns can be faster than one that doesn't... It
seems to defeat all logic :-)
[...]
Classic grep uses backtracking, which makes it much
slower on problematic
expressions like "a*b" where there is no b in the input. On the other
hand, creating a deterministic automaton has higher setup costs.
Right. The relevant section in the article that started this thread says:
: Al Aho decided to put theory into practice, and implemented full regular
: expressions (including alternation and grouping which were missing from
: grep)and wrote egrep over a weekend. Fgrep, specialised for the case of
: multiple (alternate) literal strings, was written in the same weekend.
: Egrep was about twice as fast as grep for simplecharacter searches but was
: slower for complex search patterns (due to the high cost of build-ing the
: state machine that recognised the patterns).
The "putting theory into practice" refers to compiling the regex to a DFA,
rather than interpreting an NFA.
Russ Cox has a good summary of differing regex implementation techniques
at
https://swtch.com/~rsc/regexp/regexp1.html
This makes me wonder how well-known was the technique of compiling to a
DFA, and whether it was widely implemented before awk, egrep, and lex.
Tony.
--
f.anthony.n.finch <dot(a)dotat.at>
http://dotat.at/
Fair Isle, Faeroes: Southeast 6 to gale 8, veering southwest gale 8 to storm
10, becoming cyclonic later. Very rough, becoming high or very high. Rain or
squally showers. Moderate or poor.