Hi Grant,
Suppose I have the following two lines:
aaa aaa
aaa bbb
Does the following RE w/ back-reference introduce a big performance
penalty?
(aaa|bbb) \1
As in:
% echo "aaa aaa" | egrep "(aaa|bbb) \1"
aaa aaa
You could measure the number of CPU instructions and experiment.
$ echo xyzaaa aaaxyz >f
$ ticks() { LC_ALL=C perf stat -e instructions egrep "$@"; }
$
$ ticks '(aaa|bbb) \1' <f
xyzaaa aaaxyz
Performance counter stats for 'egrep (aaa|bbb) \1':
2790889 instructions:u
0.009146904 seconds time elapsed
0.009178000 seconds user
0.000000000 seconds sys
$
Bear in mind that egreps differ, even within GNU egrep, say, over time.
$ LC_ALL=C perf stat -e instructions egrep '(aaa|bbb) \1' f
xyzaaa aaaxyz
...
2795836 instructions:u
...
$ LC_ALL=C perf stat -e instructions perl -ne '/(aaa|bbb) \1/ and print' f
xyzaaa aaaxyz
...
2563488 instructions:u
...
$ LC_ALL=C perf stat -e instructions sed -nr '/(aaa|bbb) \1/p' f
xyzaaa aaaxyz
...
610213 instructions:u
...
$
--
Cheers, Ralph.