Warren has been nice enough to put 8th, 9th and 10th edition on the TUHS “Unix Tree” web page.
There is the following question on each entry web page: “Who wants to write something here?”
Below my suggested draft text for Eight Edition. All suggestions for improvement welcome.
===
Shortly after the release of 7th Edition, the VAX became the base machine for further Unix development. The initial code base was the 32V port, enhanced with selected elements from 4.1BSD, such as support for virtual memory and later the TCP/IP stack. From there the code further evolved: Eighth Edition of Unix was released by Bell Laboratories in February 1985, six years after Seventh Edition.
Key innovations in 8th Edition include ‘streams’ and the 'file system switch’, which allowed the “everything is a file” approach to be extended to new areas. Three notable applications built on these were the ‘/proc’ file system and new debugger API, a unified approach to networking over Datakit, TCP/IP and phone lines, and a network file system.
Eighth Edition is also at the root of graphical user interfaces on Unix, being the platform used for the development of the ‘Blit’ graphical terminal.
Several of the new ideas from Eigth Edition found their way into the 3rd release of System V, although in a much modified way.
===
Anybody feel like a text/voice chat on the ClassicCmp Discord server in about 13 hours, say 2200 UTC?
#coff and the General voice channel.
I'll pop on for an hour but start whenever you feel like.
Cheers, Warren
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Minor corrections to the material in Paul's text.
This is meant to be a laundry-list of facts, not a
suggested set of words; I'm feeling too prolix this
morning to produce the latter, and figure those on the
list may be interested in the petty details anyway.
The initial user-mode environment was a mix of 32V,
subsequent work within 1127, and imports from 4.1BSD.
I don't know the exact heritage: whether it was 1127's
work with 4.1BSD stuff added or vice-versa.
The kernel was a clean break, however: 4.1xBSD for some
value of x (probably 4.1a but I don't remember which)
with Research changes. By the time of V8, that means:
-- All trace of BSD's original network interfaces removed,
except that select(2) remained in a slightly-different
form.
-- Stream I/O system added; all communication-device
drivers (serial ports, Ethernet, Datakit) changed to
work with streams. Pipes were streams.
-- File system switch added, supporting Killian's /proc
and Weinberger's first-generation (neta) network file
system.
-- Berkeley FFS replaced by Weinberger's bitmapped
file system: essentially the V7 file system except
the free list was a bitmap and the blocksize was 4KiB.
Hacky implementation, depending on a flag bit in the
minor device number; didn't use the file system switch.
Old 512-byte-block file systems had to be supported
partly to ease the changeover, partly because the first
version had a limited bitmap size so file systems larger
than about 120MiB wouldn't work. This limit was removed
later. (In retrospect I'm surprised I didn't then insist
on converting any remaining old-format file systems in
our domain and then removing the old-format code from
the kernel, since user-mode tools--including a user-mode
file server!--could be used to access any old disks
discovered later.)
For the purposes of Paul's note it probably suffices
just to say that there was a restart with a 4.1-series
kernel with changes as he describes, except also the
new file system format.
Norman Wilson
Toronto ON
Doug McIlroy:
The v8 manual was printed in 1985, but the system was
not "released" in the ordinary sense until a couple of
years ago. Some v8 features made it out into the world
via USG; some were described in open literature or
Usenix presentations, but I believe none were formally
shipped out of the company.
I'm surprised; I thought copies of the V8 manual existed
when I arrived at the Labs in mid-1984, but the date on
the title page is indeed February 1985.
There was no general release of V8 like those for earlier
Research systems, but there was a quasi-official V8 tape
sent to a handful of universities under a special letter
agreement. I remember working on that with Dennis,
checking that everything compiled and worked properly
in a chroot environment before the tape was written.
I think that happened in the summer of 1985.
I don't remember our doing that work, to make a single
coherent consistency-checked release tape, for any
subsequent system; just one-off caveat-emptor snapshots.
Norman Wilson
Toronto ON
The v8 manual was printed in 1985, but the system was
not "released" in the ordinary sense until a couple of
years ago. Some v8 features made it out into the world
via USG; some were described in open literature or
Usenix presentations, but I believe none were formally
shipped out of the company.
Doug
I ran a search for ‘Datakit’ on the archive of this maling list and came across the below message from Norman Wilson (Sep 2017). Having spent quite a bit of time recently on figuring out Datakit details and 8th Edition source, I now much better understand what he was saying — or at least I think I do.
It made me take another look at the /dev/pk[0123].c files in the V7 source code. I’d seen it before, but always thought it was UUCP code.
Now I’m wondering. It looks like UUCP packet "protocol g” is maybe much the same as the original (“Chesson”) packet algorithm for Datakit, and if so it would be “dual use”. It would seem that in V7 line discipline ‘0’ was normal tty handling, discipline ‘1’ was PK protocol over serial and line discipline ‘2’ was PK protocol over something with CRC in the driver - whatever that was.
If the above thought is correct, then it shines a light on network buffering in V7: it uses buffer space in blocks of n*32 bytes, carved out from a pool of disk buffers (see pk3.c); it pre-allocates space for one full receive window.
I have not fully figured it out, but at first glance it seems that the PK line discipline was only integrated with the DH-11 driver in the public V7 source. That would make sense in a networking context, as that board offered input buffering / DMA output to reduce the interrupt load. In 1979 Datakit seems to have connected over a DR-11C board, but there is no driver for that in the V7 source tree.
Am I on the right track?
=====
The point of the stream I/O setup with
stackable line disciplines, rather than the old single
line-discipline switch, was specifically to support networking
as well as tty processing.
Serial-device drivers in V7 used a single line-discipline
driver, used variously for canonical-tty handling and for
network protocols. The standard system as used outside
the labs had only one line discipline configured, with
standard tty handling (see usr/sys/conf/c.c). There were
driver source files for what I think were internal-use-only
networks (dev/pk[12].c, perhaps), but I don't think they
were used outside AT&T.
The problem Dennis wanted to solve was that tty handling
and network protocol handling interfered with one another;
you couldn't ask the kernel to do both, because there was
only one line discipline at a time. Hence the stackable
modules. It was possible to duplicate tty handling (probably
by placing calls to the regular tty line discipline's innards)
within the network-protocol code, but that was messy. It also
ran into trouble when people wanted to use the C shell, which
expected its own special `new tty' line discipline, so the
network code would have to know which tty driver to call.
It made more sense to stack the modules instead, so the tty
code was there only if it was needed, and different tty
drivers could exist without the network code knowing or caring.
When I arrived at the Labs in 1984, the streams code was in
use daily by most of us in 1127. The terminals on our desks
were plugged into serial ports on Datakit (like what we call
a terminal server now). I would turn on my terminal in the
morning, tell the prompt which system I wanted to connect to,
and so far as I could tell I had a direct serial connection.
But in the remote host, my shell talked to an instance of the
tty line module, which exchanged data with a Datakit protocol
module, which exchanged data with the low-level Datakit driver.
If I switched to the C shell (I didn't but some did), csh would
pop off the tty module and push on the newtty module, and the
network code was none the wiser.
Later there was a TCP/IP that used the stream mechanism. The
first version was shoehorned in by Robert T Morris, who worked
as a summer intern for us; it was later cleaned up considerably
by Paul Glick. It's more complicated because of all the
multiplexers involved (Ethernet packets split up by protocol
number; IP packets divided by their own protocol number;
TCP packets into sessions), but it worked. I still use it at
home. Its major flaw is that details of the original stream
implementation make it messy to handle windows of more than
4096 bytes; there are also some quirks involving data left in
the pipe when a connection closes, something Dennis's code
doesn't handle well.
The much-messier STREAMS that came out of the official System
V people had fixes for some of that, but at the cost of quite
a bit more complexity; it could probably be done rather better.
At one point I wanted to have a go at it, but I've never had
the time, and now I doubt I ever will.
One demonstration of virtue, though: although Datakit was the
workhorse network in Research when I was there (and despite
the common bias against virtual circuits it worked pretty well;
the major drawback was that although the underlying Datakit
fabric could run at multiple megabits per second, we never had
a host interface that could reliably run at even a single megabit),
we did once arrange to run TCP/IP over a Datakit connection.
It was very simple in concept: make a Datakit connection (so the
Datakit protocol module is present); push an IP instance onto
that stream; and off you go.
I did something similar in my home V10 world when quickly writing
my own implementation of PPP from the specs many years ago.
The core of that code is still in use in my home-written PPPoE code.
PPP and PPPoE are all outside the kernel; the user-mode program
reads and writes the serial device (PPP) or an Ethernet instance
that returns just the desired protocol types (PPPoE), does the
PPP processing, and reads and writes IP packets to a (full-duplex
stream) pipe on the other end of which is pushed the IP module.
All this is very different from the socket(2) way of thinking,
and it has its vices, but it also has its virtues.
Norman Wilson
Toronto ON
Here is a question for the old hands from the Labs, I’m trying to get the timeline of some development steps right.
The two main things are: when did the 4.1 merge take place, and when were ‘streams’ added?
Going by file dates, the surviving 8th edition source appears to be from 1985. I can see that a lot of files in /usr/include did not change after Jan 1982 (e.g. nlist.h). This suggests that early in 1982 the merge between 4.1 code and 32V code took place, to create the foundation for further development (“proto 8th edition”, so to speak).
Similarly, there are a dozen or so files in the kernel that all have a file date of November 1982. The most interesting one of these is “dtline.c”, a character mode Datakit driver: it uses ‘streams’. This suggests that there was a further code merge late in 1982 and implies that ‘streams’ were developed prior to that date.
From the S/F-Unix papers it seems that ‘streams’ did not exist in 1981, at least they are not mentioned in an otherwise comprehensive set of papers. On the other hand, the S/F-Unix work was done in the Exploratory group, not the Research group: maybe it was inappropriate to mention.
All in all, my hypotheses would be that:
- the 32V/4.1 merge took place early in 1982
- ‘streams’ were developed in 1982 on 32V (maybe also V7) systems
- a further merge took place late in 1982 that combined the new base with latest developments
Does that sound correct, or was it all different?
Related is the question when the "file system switch" was added. It must have been later than 1981 and before 1985, but I have not been able to pinpoint it further.
Paul
Andrew Hume (andrew(a)humeweb.com) has had trouble posting this, and asked me
to try. Reply directly to Andrew, not to me.
============================
I have the following manuals available:
3 Eight Edition Unix manuals (2 shrink-wrapped, one not (but still good
condition))
Unix programmers manual, Release 3.0 (Dolotta et al, 1980)
Sixth Edition programmers manual (Bell Labs cardboard cover)
Sixth Edition Documents manual (Bell Labs cardboard cover)
Seventh Edition programmers manual Volume 2a, Jan 1979. (actually documents
such as make, lint, troff etc)
Documents for UNIX, Volume 2 (Dolotta et al, 1981) sections E and F (make,
lex, security etc)
All the above are in pretty good condition, given they are bound in
cardboard covers and are 40ish years old.
I’d prefer to give them to someone archival, but otherwise, first come,
first served.
Andrew Hume
Anybody feel up for a bit of an archaeology challenge? Warner Losh is
currently poking through a bunch of bits but not having much luck decoding
them correctly. I've put a copy here: https://minnie.tuhs.org/Y5/Challenge/
If you can help, I'd suggest report major findings here, and we can use
the #TUHS channel in the ClassicCmp Discord server for chat.
Here's what Warner has found out so far:
It's quite interesting, but in a
format I've so far not been able to decode more than with emacs.
However, there's all kinds of wonderful here. This looks like it was a
dump from a VMS (or maybe similar DEC OS) ANSI tape. There's 4 datasets
of 2.5MB each. The first one appears to be a V5 tree of some sort (at
least it matches the V5 sources in places I can spot check in
Dennis_v5. The second block looks v6ish or maybe pwbish, but no kernel
sources. I don't think it's a continuation of the v5 stuff from the
first dataset. The third dataset is all binaries, as far as I can tell
so far, but things like mv and passwd. The 4th dataset appears to be
the dump of a VENIX-11 system, complete with source.
The 3rd dataset appears to be a Venix system. At least it has venix and
venix.old in what looks like the root directory. Still trying to sort
out extracting files from these datasets. v7fs hates them, but I'm
almost positive that's what they are.
Cheers, Warren
Crazy longshot post, part 27 in an infinite series
Are there any Xenix-11 images (boot tapes or disk images) around? My
googling skillz aren't mad enough to find this.
I've seen the Xenix 86 image in the archive that was copied from pce's
image warehouse which is cool and the generation of code I'm looking for,
but is for 8086 machines...
Warner
Another book from the same era--quite good--is A Unix Primer
by Ann Nichols Lomuto and Nico Lomuto, copyright 1983.
Before the title page appears an interesting endorsement:
"Prentice-Hall Software Series, Brian Kernighan, advisor
Doug
Prologue to TPC. Bob Morris did a visiting-researcher stint at
AT&T, where he became aware of infelicitous software architure
proposed for ESS 5. He thought Research could do it better. Ken,
Joe, and Lee bit. Lee's architecture was indeed novel: every
device in the system, right down to each touch-tone button, was
modeled as a process. Only after the clean model was working
were some processes--notably the buttons--jammed together to
cinch in the process table.
The team got the switch working in a matter of months--in time
to demonstrate it to Indian Hill before ESS was irrevocably
set in stone. ESS architecture was indeed rethought, taking
some ideas from TPC.
TPC was named after "TPC, The Phone Company" in the 1967 film,
"The President's Analyst".
Doug
About a year ago the Research telephone switch came up on this list.
Rob Pike wrote:
"But the PBX story is correct. To demonstrate how message passing was a good
model for a switching system, in particular to make a point to the
switching systems division of Bell Labs/AT&T, Ken and Joe bought a
commercial PBX and swapped out its processor for a PDP-11/23 (I think), and
programmed it up. It was just before I arrived there but I was given the
impression it had the desired strategic influence on Indian Hill.
The feature we all loved it for was that instead of ringing the phone in
the Unix room when you got a call, it would announce your name through the
voice synthesizer: "Phone call for Ken." "Phone call for Joe". One rapidly
stopped even hearing the announcement if it didn't end with your name.”
I’ve been having an off list discussion with Bill Marshall and this PBX was influential in another way as well.
First of all, Bill can confirm that it indeed was a 11/23, the same racks were used for Datakit switches. He also remembered that the software for this PDP-11 went by the nickname of “TPC” - for Tiny Phone Company. Lee McMahon was on the team writing TPC.
The first software for the Datakit switch was written by Greg Chesson and was called “CMC” (for ‘Common Control’). There are still some references to CMC in the 8th Edition source code.
This first software was later replaced by new code designed by Lee McMahon that was modelled after TPC. This new code was named “TDK”. This, too, can be seen in the 8th Edition source. The TDK protocols for building and releasing a Datakit virtual circuit appear to have been in use into the 1990’s.
https://fingolfin.org/blog/20200327/stdio-abi.html
An interesting look at the history of stdio and subsequent ABI choices.
Cheers, Warren
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Thanks, everyone, for pointing out the "accident of history".
This is a little morality tale about clever programming.
Joe Ossanna would never have tokenized \s constructs that
way if the CAT typesetter had been capable of a wider
range of type sizes. Brian seems to have kept it for
backward compatibility. But groff was perfectly willing
to break backward compatibility in tokenizing command
names. Why not in \s?
I encourage Branden to go whole hog and let
\s123DEWEY WINS produce a banner headline.
Doug
> From: Warren Toomey
> Quite good
Yes.
The second part of:
"Sixth Edition sources are much more widely available than earlier versions,
thanks largely to the Lions book"
is I believe incorrect, though; I reckon it's because V6 was so widely
distributed, both inside and outside Bell. Many more copies -> higher
probability of retention.
Also, from 'Source code listing for the Lions' Commentary in PDF and PostScript':
"in 1988 I discovered an old 9-track tape being discarded of a PDP11
backup. It was hard to determine what it was running, but it did have an
intact /usr/src/ tree of which most of the files were timesamped 1979, even
at that time it seemed ancient. So it was either 7th edition or a derivative
like PWB, which I believe it was.
Do you still have the tape, or its contents?
Noel
> From: Paul Ruizendaal
> The paper is from late 1981. ... When did FIFO's become a
> standard Unix feature?
Err, V4? :-) At least, that's when pipes arrived (I think - we don't have V4
sources, but there are indications that's when they appeared), and a pipe is a
FIFO. RAND ports just allowed (effectively) a pipe to have a name in the file
system.
The implementation of both is pretty straight-forward. A pipe is just a file
which has a maximum length, after which the writer is blocked. A port is
just a pipe (it uses the pipe code) whose inode appears in the file system.
> From: Clem Cole
> I think the code is on one of the 'USENIX' tapes in Warren's archives.
Doc is here:
https://minnie.tuhs.org//cgi-bin/utree.pl?file=BBN-V6/doc/ipc
and sources for all that are here:
https://minnie.tuhs.org//cgi-bin/utree.pl?file=BBN-V6/dmrhttps://minnie.tuhs.org//cgi-bin/utree.pl?file=BBN-V6/ken
(port.c is in 'dmr', not 'ken'where it should be).
Noel
Greetings,
I was reviewing a manual page update and came across an ambiguous answer to
a question that came up, so I thought I'd ask here.
execl and execv first appear in our extant unix man pages in V4. The v3 and
v2 man pages don't have this listed at all. Case closed, right? It appeared
in V4.
However, the Dennis_v1/unix72 tree has execl.s and execv.s in them. Diving
back into the history on Warren's github account,
https://github.com/DoctorWkt/unix-jun72/tree/master/src says they come from:
"The files in lib/ come from the libc.sa file which is on the
last1120c.tar.gz
tap(I) tape image, also at the same URL, and form the C library for the
above compiler."
and
"from a working C compiler for 2nd Edition UNIX."
which suggests that it may have been in V2 or maybe even V1. It's
first use in Unix appears to be V5, but the extant pre-v5 code is so
fragmentary it's hard to know for sure. It's not mentioned in section
II or section III of v1, v2 or v3.
Does anybody know for sure, or can provide more insight into the
last1120c.tar.gz file to help disambiguate?
Warner
When I left BTL in 1983, I made a tar tape. A number of years later I
translated the tape into a file. Only recently have I wandered through it.
I don't know how many people remember Ron Hardin in the Columbus BTL
location. He was one of the smartest guys I ever met. There are lot of Ron
Hardin stories. One of his creations (as far as I know he authored it) was
a program to create Memorandums For File -- technical memorandums. My tar
tape scooped up festoon. To this day it compiles and runs happily on
Windows 10. It was written in 1978 or thereabouts. Here is an example
output:
bin$ festoon.exe
.TL
No Worthynesses
.AU "C. C. Festoon" CCF Headquarters 1584734291
.AS
A restriction had been being amicated by a convenience at the inclusion.
.AE
.MT "MEMORANDUM FOR COAT LOCKER"
.hy 1
On this occasion,
no team responsibilities could have polyesced a renewed emphasis.
A friction had penated an activation.
At the present moment in time,
an undue number of good progresses being collected together with the
populations were being proportionately fideated by
the fact that there was a data stream which was transenniesced by an
issuance being joined together with these team re
sponsibilities,
because natural basises have been veriating a partitioning.
The supplementary work should be conclusively quinquepolyated by a well
defined interfacing.
A sophisticatedness by a schedule is operated by a nature in conflict with
a correspondence under some serious discussi
ons.
It is within the realm of possibility that the effectiveness had
vicfacesced a schedule,
but there was not a necessary background information which is being
testesced by a strong interest,
and a statistical accuracy was tempoesced by the preparation.
It should be noted that a joint partnership very repeatedly aidioated this
publication of a centralized organization.
Due to the fact that there is a simplification which simply enniesced a
process,
a new technology is fluxesced from monorogatities.
It is of the utmost importance that an insurance could be putated by an
assumption.
A major advance centered about a deficiency octocessates an important
outcome.
.P
An effectation would extramicroate to the situation.
A complete revision gravated a direction.
Inasmuch as there was not a potential usefulness that cedeates by the
timely delivery,
a consideration centered around a technique was monofortated by an
integration:
.BL
.LI
There is a not unclear meterdom which had risiesced an occasion.
.LE
.P
A clamstress of this enclosedness is cludescing the hemidormity.
.P
To arrive at an approximation,
a large quantity had been chromated by a strong feeling.
Moreover,
that idea sharing was lusated by a current proposal.
Anytime that the final outcomes had been very firmly unpathesced by not
unphilaible reasonable compromises,
no serious concerns might be being sacrated by internal establishments for
the basic objectives in back of a full utili
zation.
.P
As a consequence of the fact that a total effect might vacate an easily
situational beneficial assistance,
the apparent provisioning being effectuated by a continuing difference can
have protenesced a realization of an underly
ing purpose.
A different doubtful important outcome is cludated by a capkin.
A rationale had fortated attachments.
Moreover,
this assumption had nilcoresced the continuing study.
.P
.H 1 "An Easily Added Basic Assumption Being Joined Together With A Concept
Stage"
There is not an impediment which neoated a restriction,
therefore.
A couple utilizations could morsate a great similarity at considerable
difficulties,
but an input is primescing the concept activities,
and a growing importance was hemicisesced by that beneficial assistance.
In the same connection,
these extremenesses are rather usefully ultralucesced by directions.
.SG
.NS 0
C. R. Glitch
S. A. Hobble
R. S. Limn
M. Shayegan
.NE
Ed Bradford, Ph.D. Physics, retired from IBM
BTL 1976-1983
--
Advice is judged by results, not by intentions.
Cicero
> From: Dagobert Michelsen
> the excellent book "G=C3=B6del, Escher, Bach: An Eternal Golden Braid"
> from Douglas R. Hofstadter which also gives a nice introduction into
> logic and philosopy.
IIRC, the focus of the book is how systems made out of simple components can
exhibit complex behaviours; in particular, how information-processing systems
can come to develop self-awareness.
> From: Chet Ramey
> One of the best books I read in high school.
A book on a very similar topic to GEB, which was _extremely_ important in
developing my understanding of how the universe came to be, is "Recursive
Universe", by William Poundstone, which I recommend very highly to everyone
here. It's still in print, which is really good, because it's not as well
known as it should (IMO) be. It uses an analogy with Conway's Life to explain
how the large-scale structure of the universe can develop from a random
initial state. Buy it now!
Noel
Sorry to flog this topic, but the two examples below are an
unfair comparison. What happened to the multiplications in the
second? And two of the [enter]s in the first are unnecessary.
Ironically three of the four operations in the second are
actually reverse Polish! If you had to utter sqrt first,
as we do on paper and in speech, things wouldn't look so great.
[a] [enter]
[a] [enter]
[multiply]
[b] [enter]
[b] [enter]
[multiply]
[add]
[square root]
[a]
[square]
[plus]
[b]
[square]
[square root]
Doug
Once in a while a new program really surprises me. Reminiscing a while
ago, I came up with a list of eye-opening Unix gems. Only a couple of
these programs are indispensable or much used. What singles them out is
their originality. I cannot imagine myself inventing any of them.
What programs have struck you similarly?
PDP-7 Unix
The simplicity and power of the system caused me to turn away from big
iron to a tiny machine. It offered the essence of the hierarchical
file system, separate shell, and user-level process control that Multics
had yet to deliver after hundreds of man-years' effort. Unix's lacks
(e.g. record structure in the file system) were as enlightening and
liberating as its novelties (e.g. shell redirection operators).
dc
The math library for Bob Morris's variable-precision desk calculator
used backward error analysis to determine the precision necessary at
each step to attain the user-specified precision of the result. In
my software-components talk at the 1968 NATO conference on software
engineering, I posited measurement-standard routines, which could deliver
results of any desired precision, but did not know how to design one. dc
still has the only such routines I know of.
typo
Typo ordered the words of a text by their similarity to the rest of the
text. Typographic errors like "hte" tended to the front (dissimilar) end
of the list. Bob Morris proudly said it would work as well on Urdu as it
did on English. Although typo didn't help with phonetic misspellings,
it was a godsend for amateur typists, and got plenty of use until the
advent of a much less interesting, but more precise, dictionary-based
spelling checker.
Typo was as surprising inside as it was outside. Its similarity
measure was based on trigram frequencies, which it counted in a 26x26x26
array. The small memory, which had barely room enough for 1-byte counters,
spurred a scheme for squeezing large numbers into small counters. To
avoid overflow, counters were updated probabilistically to maintain an
estimate of the logarithm of the count.
eqn
With the advent of phototypesetting, it became possible, but hideously
tedious, to output classical math notation. Lorinda Cherry set out to
devise a higher-level description language and was soon joined by Brian
Kernighan. Their brilliant stroke was to adapt oral tradition into written
expression, so eqn was remarkably easy to learn. The first of its kind,
eqn has barely been improved upon since.
struct
Brenda Baker undertook her Fortan-to-Ratfor converter against the advice
of her department head--me. I thought it would likely produce an ad hoc
reordering of the orginal, freed of statement numbers, but otherwise no
more readable than a properly indented Fortran program. Brenda proved
me wrong. She discovered that every Fortran program has a canonically
structured form. Programmers preferred the canonicalized form to what
they had originally written.
pascal
The syntax diagnostics from the compiler made by Sue Graham's group at
Berkeley were the mmost helpful I have ever seen--and they were generated
automatically. At a syntax error the compiler would suggest a token that
could be inserted that would allow parsing to proceed further. No attempt
was made to explain what was wrong. The compiler taught me Pascal in
an evening, with no manual at hand.
parts
Hidden inside WWB (writer's workbench), Lorinda Cherry's Parts annotated
English text with parts of speech, based on only a smidgen of English
vocabulary, orthography, and grammar. From Parts markup, WWB inferred
stylometrics such as the prevalance of adjectives, subordinate clauses,
and compound sentences. The Today show picked up on WWB and interviewed
Lorinda about it in the first TV exposure of anything Unix.
egrep
Al Aho expected his deterministic regular-expression recognizer would beat
Ken's classic nondeterministic recognizer. Unfortunately, for single-shot
use on complex regular expressions, Ken's could finish while egrep was
still busy building a deterministic automaton. To finally gain the prize,
Al sidestepped the curse of the automaton's exponentially big state table
by inventing a way to build on the fly only the table entries that are
actually visited during recognition.
crabs
Luca Cardelli's charming meta-program for the Blit window system released
crabs that wandered around in empty screen space nibbling away at the
ever more ragged edges of active windows.
Some common threads
Theory, though invisible on the surface, played a crucial role in the
majority of these programs: typo, dc, struct, pascal, egrep. In fact
much of their surprise lay in the novelty of the application of theory.
Originators of nearly half the list--pascal, struct, parts, eqn--were
women, well beyond women's demographic share of computer science.
Doug McIlroy
March, 2020
Tomasz Rola writes on Thu, 19 Mar 2020 21:01:20 +0100 about awk:
>> One task I would be afraid to use awk for, is html processing. Most of
>> html sources I look at nowadays seems discouraging. Extracting
>> anything of value from the mess requires something more potent, I
>> think.
If you want to tackle raw HTML from abitrary source, then I agree with
you: most HTML on the Web is not grammar conformant, there are
numerous vendor extensions, and the HTML is hideously idiosynchratic
and irregularly formatted.
The solution that I adopted 25 years ago was to write a grammar
recognizing, but violation lenient, prettyprinter for HTML. It has
served well and I use it many times daily for my work in the BibNet
Project and TeX User Group bibliography archives, now approaching 1.55
million entries. The latest public release is available here:
http://www.math.utah.edu/pub/sgml/
I notice that the last version there is 1.01; I'll get that updated in
a couple of days to the latest 1.03 [subject to delays due to major
work dislocations due to the virus]. The code should install anywhere
in the Unix family without problems: I build and validate it on more
than 300 O/Ses in our test farm.
With standardized HTML, applying awk is easy, and I have more than 450
awk programs, and 380,000 lines of code, that process publisher
metadata to produce rough BibTeX entries that numerous other tools,
and some manual editing, turn into clean data for free access on the
Web.
For some journals, I run a single command of fewer than 15 characters
to download Web pages for journal issues for which I do not yet have
data, and then a single journal-specific command with no arguments
that runs a large shell script with a long pipeline that outputs
relatively clean BibTeX that then normally takes me only a couple of
minutes to visually validate in an editor session. The major work
there is bracing of proper nouns in titles that my software did not
already handle, thereby preventing downcasing of those words in the
many bibliography styles that do so.
I'm on journal announcement lists for many publishers, so I often have
new data released to the Web just 5 to 10 minutes after receiving
e-mail about new issues.
The above-mentioned archives are at
http://www.math.utah.edu/pub/bibnethttp://www.math.utah.edu/pub/tex/bibhttp://www.math.utah.edu/pub/tex/bib/index-table.htmlhttp://www.math.utah.edu/pub/tex/bib/idxhttp://www.math.utah.edu/pub/tex/bib/toc
They are mirrored at Universität Karlsruhe, Oak Ridge National
Laboratory, Sandia National Laboratory, and elsewhere.
Like Al Aho, Doug McIlroy, and Arnold Robbins, I'm a huge fan of awk;
I believe that I was the first to port it to PDP-10 TOPS-20 and VAX
VMS in the mid-1980s, and it is one of the first mandatory tools that
I install on any new computer.
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: beebe(a)math.utah.edu -
- 155 S 1400 E RM 233 beebe(a)acm.org beebe(a)computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------
i never got on with rpn, even though i am the kind if person who should - i have a little bit of dyslexia, perhaps thats why.
when i moved to plan9 i found it included hoc from K&P, just used that for the odd bits of maths i need.
-Steve