I've always been intrigued
with regexes. When I was first exposed to them, I was mystified
and lost in the greediness of matches. Now, I use them regularly,
but still have trouble using them. I think it is because I don't
really understand how they work.
My question for y'all has to do with early unix. I have a copy of
Thompson, K. (1968). Regular expression search algorithm.
Communications of the ACM, 11(6), 419-422. It is interesting as an
example of Thompson's thinking about regexes. In this paper, he
presents a non-backtracking, efficient, algorithm for converting a
regex into an IBM 7094 (whatever that is) program that can be run
against text input that generates matches. It's cool. It got me to
thinking maybe the way to understand the unix regex lies in a
careful investigation into how it is implemented (original
thought, right?). So, here I am again to ask your indulgence as
the latecomer wannabe unix apprentice. My thought is that ed is
where it begins and might be a good starting point, but I'm not
sure - what say y'all?
I also have a copy of the O'Reilly Mastering Regular Expressions
book, but that's not really the kind of thing I'm talking about.
My question is more basic than how to use regexes practically. I
would like to understand them at a parsing level/state change
level (not sure that's the correct way to say it, but I'm really
new to this kind of lingo). When I'm done with my stepping
through the source, I want to be able to reason that this is why
that search matched that text and not this text and why the search
was greedy, or not greedy because of this logic here...
If my question above isn't focused or on topic enough, here's an
alternative set to ruminate on and hopefully discuss:
1. What's the provenance of regex in unix (when did it appear, in
what form, etc)?
2. What are the 'best' implementations throughout unix (keep it
pre 1980s)?
3. What are some of the milestones along the way (major changes,
forks, disagreements)?
4. Where, in the source, or in a paper, would you point someone to
wanting to better understand the mechanics of regex?
Thanks!
Will
--
GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF