[TUHS] Re: regex early discussions

4 Mar 2024

I've already had a chat with Will, but I wanted to add some other thoughts
to the group as a whole:
   - As was pointed out by others, computer life (certainly not interactive
   computing) does not begin with UNIX (*i.e.* Interactive Text Editors
   have been around since the beginning of Interactive computing).   I'll use
   Thomas Haigh and Paul Ceruzzi's text: "A New History of Modern
computing" -
   which basically pegs that as CTSS.  I don't know what the original editor
   was for CTSS. [if some one like Doug or Ken remembers, I'd be curious to
   know].
   - Numerous editors show up on different systems, including STOPGAP on
   the MIT PDP6, eventually SOS, TECO, EMACs, *etc*., and most have some
   concept of a 'line of text' to distinguish from a 'card image.'
   - Common to all is some way to search or find text and some way to
   replace it - usually on a line of input.
   - One of them is Lampson and Deutsch's "quick editor" or QED for SDS.
   - Language theory was definitely a hot item by the mid-1960s and lots of
   papers discussing automaton and the like appear, including Ken's CACM 1968
   article describing his reg-ex search algorithm implementation for the IBM
   7094 [it should be findable with a search -- send me an email offline, I
   have a copy of a crappy scan but it is readable].
   - Most editors like SOS, TECO and the like do not have support for
   reg-ex, but do have some way to do sophisticated searching (and
   replacement).
   - Ken wrote an implementation of QED for CTSS and included his search
   algorithm as an integral part of this new implementation.
   - When Ken writes the original UNIX editor, he bases it on the above.
   - UNIX builds up this idea of a pipeline, so building separate tools
   that connect together make sense and are natural.
   - When Rudd, Doug, Ken, Dennis, *et al* start to develop UNIX - they are
   building a system for *themselves.*
   - One member of the group (Lee McHahon) is using the g/re/p command to
   find things and gets the brilliant idea of a separate tool, grep(1) would
   be born.
   - The most important item here is that said team is a group of
   programmers, so it was logical that the system was useful and easy to
   understand by other programmers.
Will asked how did people learn about Reg-Ex?   The answer of course, it
depends.
But if you were to take college-level CS courses in the late 60s or the
70s, as Bakul mentioned (I also had a similar experience), if you were
going to be taught about automata and simple language theory -- likely in
your first data structures and algorithms class, as certainly by the time
you took a compiler course. My memory is I learned basic automata theory in
the first, but did not see the idea of regular expressions until compilers
[in my case, this is all pre-dragon book].   For all of you later in the
70s, Aho and Ullman's classic text would have exposed it to you.    FWIW:
In the 2000's my daughter's college CS training, she never had to take a
compiler or comparative languages course, but she was taught about reg-ex
in her data structures course.
The key is you were taught a bit about automata theory, but if you really
started to study it, you look at things like the performance of the
different algorithms.  As Rob says, the key take away from learning about
the reg-ex idea, is its linear performance.  So, if you were trained in
some of the formal CS ideas, *using reg-ex was not a huge lift*. It was
natural.
That said, if you were coming from other systems using things like SOS or
Teco (like me), they offered search functions also but the expressions but
no in the same way.  It was a different way to do things, but people like
me, quickly realized it was a lot more powerful and could do much more. *"Ah
ha .. cool beans, apply something I already knew about in a way I had not
seen before ... next item ..."*
So there are a few things to realize from this.
   1. Adding things like reg-ex to tools like sed(1) and awk(1) were
   natural follow-ons to things like grep(1) and ed(1).
   2. If you were a CS person, it was not a big deal - just the more
   powerful "UNIX-way" as it were. But...
   3. If you came from another world of computing (say DEC or a PC)  where
   such tools were not exposed in a manner that was easy to build upon *and/or
   you had never been taught much of any core CS theory* [which is where
   Will cut his teeth], reg-ex might be astonishing.
So I think its not a question of why -- it was just how UNIX did things. It
was a natural way for a programmer to express something.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

[TUHS] Re: regex early discussions