On 3/3/23 9:12 AM, Dave Horsfall wrote:
I can't help but provide an extract from my
antispam log summariser
(AWK):
# Yes, I have a warped sense of humour here.
/^[JFMAMJJASOND][aeapauuuecoc][nbrrynlgptvc] [ 0123][0-9] / \
{
date = sprintf("%4d/%.2d/%.2d",
year, months[substr($0, 1, 3)], substr($0, 5, 2))
Thank you for sharing that Dave.
Etc. The idea is not to validate so much as to grab a
line of interest
to me and extract the bits that I want.
Fair enough.
Using bracket expressions for the three letters is definitely another
idea that I hadn't considered.
But I believe I like what I think is -- what I'm going to describe as --
the more precise alternation listing out each month. (Jan|Feb|Mar...
Such an alternation is not going to match Jer like the three bracket
expressions will. I also believe that the alternation will be easier to
maintain in the future. Especially by someone other than me that has
less experience with REs.
In this case I trust the source (the Sendmail log),
but of course
that is not always the case...
I trust that syslog will produce consistent line beginnings more than I
trust the data that is provided to syslog. But I'd still like to be
able to detect "Jer" or "Dot" if syslog ever tosses it's
cookies.
When doing things like this, you need to ask yourself
at least the
following questions:
1) What exactly am I trying to do? This is fairly important :-)
Filter out known to be okay log entries.
2) Can I trust the data? Bobby Tables, Reflections on
Trusting
Trust...
Given that I'm effectively negating things and filtering out log entries
that I want to not see (because they are okay) I'm comfortable with
trusting the data from syslog.
Brown M&Ms come to mind.
3) Etc.
And let's not get started on the difference betwixt "trusted" and
"trustworthy" (that distinction keeps security bods awake at night).
ACK
--
Grant. . . .
unix || die