On Fri, 3 Mar 2023, Ralph Corderoy wrote:
You'd said egrep, which is NDFA, but in other
engines, alternation order
can matter, e.g. āJā starts the most months and some months have more
days than others.
/^(J(an|u[nl])|Ma[ry]|A(ug|pr)|Oct|Dec|...
I can't help but provide an extract from my antispam log summariser (AWK):
# Yes, I have a warped sense of humour here.
/^[JFMAMJJASOND][aeapauuuecoc][nbrrynlgptvc] [ 0123][0-9] / \
{
date = sprintf("%4d/%.2d/%.2d",
year, months[substr($0, 1, 3)], substr($0, 5, 2))
Etc. The idea is not to validate so much as to grab a line of interest to
me and extract the bits that I want.
In this case I trust the source (the Sendmail log), but of course that is
not always the case...
When doing things like this, you need to ask yourself at least the
following questions:
1) What exactly am I trying to do? This is fairly important :-)
2) Can I trust the data? Bobby Tables, Reflections on Trusting Trust...
3) Etc.
And let's not get started on the difference betwixt "trusted" and
"trustworthy" (that distinction keeps security bods awake at night).
-- Dave