On 3/3/23 6:47 AM, Dan Cross wrote:
Oh, for sure; to be clear, it was obvious that in the
earlier
discussion the original was just part of something larger.
Good. For a moment I thought that you might be thinking it was stand alone.
FWIW, this RE seems ok to me; the additional context
makes it unlikely
to match something else accidentally.
:-)
It needn't be special. The point is simply that
there's some external
knowledge that can be brought to bear to guide the shape of the REs.
ACK
I've heard "domain (specific) knowledge" used to refer to both extremely
specific training in a field and -- as you have -- data that is having
something done to it.
In this case, you know that log lines won't begin
with `___ 123
456:789` or other similar junk.
They darned well had better not.
Kinda. The "machine" in this case is
actually an abstraction, like a
Turing machine. The salient point here is that REs map to finite state
machines, and in particular, one need not keep (say) a stack of prior
states when simulating them. Note that even in an NDFA simulation,
where one keeps track of what states one may be in, one doesn't need
to keep track of how one got into those states.
ACK
Obviously in a real implementation you've got the
program counter,
register contents, local variables, etc, all of which consume
"memory" in the conventional sense. But the point is that you don't
need additional memory proportional to anything other than the size
of the RE. DFA implementation could be implemented entirely with
`switch` and `goto` if one wanted, as opposed to a bunch of mutually
recursive function calls, NDFA simulation similarly except that
you need some (bounded) additional memory to hold the active set
of states. Contrast this with a pushdown automata, which can parse
a context-free language, in which a stack is maintained that can
store additional information relative to the input (for example,
an already seen character). Pushdown automata can, for example,
recognize matched parenthesis while regular languages cannot.
I think I understand the gist of what you're saying, but I need to
re-read it and think about it a little bit.
Anyway, sorry, this is all rather more theoretical
than is perhaps
interesting or useful.
Apology returned to sender as unnecessary.
You are providing the requested thought provoking discussion, which is
exactly what I asked for. I feel like I'm going to walk away from this
thread wiser based on the thread's content plus all additional reading
material on top of the thread itself.
Bottom line is, I think your REs are probably fine.
`egrep` will
complain at you if they are not, and I wouldn't worry too much about
optimizing them: I'd "stop" whenever you're happy that you've
got
something understandable that matches what you want it to match.
Thank you (again) Dan. :-)
--
Grant. . . .
unix || die