> Arguably ancient PDP-10 operating systems like ITS, WAITS, TENEX were
> somewhat "open" and "free", but it's not a clear cut case.
The open source movement was a revival of the old days of SHARE and other
user groups.
SAP, the SHARE assembly program for the IBM 704, was freely available--with
source code--to all members of the SHARE user group. I am not aware of any
restrictions on redistribution.
Other more specialized programs were also freely available through SHARE. In
particular, Fortran formatted IO was adopted directly from a SHARE program
written by Roy Nutt (who also wrote SAP and helped write Fortran I).
Bell Labs freely distributed the BESYS operating system for the IBM 704.
At the time (1958) no operating system was available from IBM.
IBM provided source code for the Fortran II compiler. In the
fashion of the time, I spent a memorable all-night session with
that code at hand, finding and fixing a bizarre bug (a computed GOTO
bombed if the number of branches was 74 mod 75) with a bizarre cause
(the code changed the index-register field in certain instructions on the
fly--inconsistently). And there was no operating system to help, because
BESYS swapped itself out to make room for the compiler.
Doug
This somewhat stale note was sent some time ago, but was ignored
because it was sent from an unregistered email address.
> And if the Unix patriarchs were perhaps mistaken about how useful
> "head" might be and whether or not it should have been considered
> verboten.
Point well taken.
I don't know which of head(1) and sed(1) came first. They appeared in
different places at more or less the same time. We in Research
declined to adopt head because we already knew the idiom "sed 10q".
However one shouldn't have to do related operations in unrelated ways.
We finally admitted head in v10.
Head was independently invented by Mike Lesk. It was Lesk's
program that was deemed superfluous.
Head might not have been written if tail didn't exist. But, unlike head,
tail strayed from the tao of "do one thing well". Tail -r and tail -f are
as cringeworthy as cat -v.
-f is a strange feature that effectively turns a regular file into a pipe
with memory by polling for new data, A clean general alternative
might be to provide an open(2) mode that makes reads at the current
file end block if some process has the file open for writing.
-r is weird because it enables backwards reading, but only as
limited by count. Better would be a program, say revfile, that simply
reads backwards by lines. Then tail p has an elegant implementation:
revfile p | head | revfile
Doug
>> -f is a strange feature that effectively turns a regular file into a pipe
>> with memory by polling for new data, A clean general alternative
>> might be to provide an open(2) mode that makes reads at the current
>> file end block if some process has the file open for writing.
> OTOH, this would mean adding more functionality (read: complexity)
> into the kernel, and there has always been a general desire to avoid
> pushing <stuff> into the kernel when it can be done in userspace. Do
> you really think using a blocking read(2) is somehow more superior
> than using select(2) to wait for new data to be appended to the file?
I'm showing my age. tail -f antedated select(2) and was implemented
by alternately sleeping and reading. select(2) indeed overcomes that
clumsiness.
> I'll note, with amusement, that -r is one option which is *NOT* in the
> GNU version of tail. I see it in FreeBSD, but this looks like a
> BSD'ism.
-r came from Bell Labs. This reinforces the point that the ancients
had their imperfections.
Doug
> Message: 7
> Date: Thu, 15 Jul 2021 10:28:04 -0400
> From: "Theodore Y. Ts'o"
> Subject: Re: [TUHS] head/sed/tail (was The Unix shell: a 50-year view)
>
> On Wed, Jul 14, 2021 at 10:38:06PM -0400, Douglas McIlroy wrote:
>> Head might not have been written if tail didn't exist. But, unlike head,
>> tail strayed from the tao of "do one thing well". Tail -r and tail -f are
>> as cringeworthy as cat -v.
>>
>> -f is a strange feature that effectively turns a regular file into a pipe
>> with memory by polling for new data, A clean general alternative
>> might be to provide an open(2) mode that makes reads at the current
>> file end block if some process has the file open for writing.
>
> OTOH, this would mean adding more functionality (read: complexity)
> into the kernel, and there has always been a general desire to avoid
> pushing <stuff> into the kernel when it can be done in userspace. Do
> you really think using a blocking read(2) is somehow more superior
> than using select(2) to wait for new data to be appended to the file?
>
> And even if we did this using a new open(2) mode, are you saying we
> should have a separate executable in /bin which would then be
> identical to cat, except that it uses a different open(2) mode?
Yes, it would put more complexity into the kernel, but maybe it is conceptually elegant.
Consider a classic pipe or a socket and the behaviour of read(2) for those objects. The behaviour of read(2) that Doug proposes for a file would make it in line with that for a classic pipe or a socket. Hence, maybe it should not be a mode, but the standard behaviour.
I often think that around 1981 the Unix community missed an opportunity to really think through how networking should integrate with the foundations of Unix. It seems to me that at that time there was an opportunity to merge files, pipes and sockets into a coherent, simple framework. If the 8th edition file-system-switch had been introduced already in V6 or V7, maybe this would have happened.
On the other hand, the installed base was probably already too large in 1981 to still make breaking changes to core concepts. V7 may have been the last chance saloon for that.
Paul
Below is a response from two of the authors with my response to it
inline. Not very impressed. Hopefully they'll get a clue and up
their game. In any case, enough time spent on it.
Jon
Michael Greenberg writes:
>
> HotOS isn't exactly a conventional venue---you might notice that many
> if not most HotOS papers don't fit the outline you've given.
I'm aware of that, and have participated in many unconventional venues
myself. I wasn't saying that papers must fit that outline, but I do
believe that they should contain that information. There's a big
difference between a discussion transcript and a paper; I believe that
papers, especially those published under the auspices of a prestigious
organization such as the ACM, should adhere to a higher standard.
> I'd definitely appreciate detailed feedback on any semantic errors
> we've made!
Unfortunately I can't help you here; that was feedback from
a reader who doesn't want to spend any more time on this.
> Your summary covers much of what we imagined!
>
> As I understand it, the primary goals of the paper were to (a) help
> other academics think about the shell as a viable area for research, (b)
> highlight some work we're doing on JIT compilation, (c) make the case
> for JIT approaches to the shell in general (as its well adapted to its
> dynamism), and (d) explore various axes on which the shell could be
> improved. It seems like we've done a good job communicating (b) to you,
> but perhaps not the rest. Sorry again to disappoint.
I certainly hope that you understand the primary goals of your own paper.
Point-by-point:
(a) While this is a valid point I don't understand why the paper didn't
just state it in a straightforward manner. There are several common
forms. One is to list issues in the introduction while explaining
which one(s) will be addressed in the paper. Another is in the
conclusion where authors list work still to be done.
(b) At least for me this goal was not accomplished because there were no
examples. Figure 1 by itself is insufficient given that the code
used to generate the "result" is not shown. It would have been much
more illuminating had the paper not only shown that code but also the
optimized result. Professionals don't blithely accept magic data.
(c) The paper failed to make this case to me for several reasons.
As I understand it, the paper is somewhat about applying JIT
compilation techniques to interconnected processes. While most
shells include syntax to support the construction of such, it's
really independent of the shell. For completeness, I have a vague
memory of shell implementations for non-multitasking systems that
sequentially ran pipelined programs passing intermediate results
via temporary files. The single "result" reminds me of something
that I saw at a science fair when my child was in middle school;
I looked a one team's results and asked "What makes you think that
a sample size of one is significant?" The lack of any benchmarks
or other study results that justified the work also undermined the
case. It reads more like the authors had something that they wanted
to play with rather than doing serious research. The paper does not
examine the percentage of shell scripts that actually benefit from
JIT compilation; for all the reader may know it's such a small number
that hand-optimizing just those scripts might be a better solution.
I suppose that the paper fits into the apparently modern philosophy
of expecting tools to fix up poorly written code so that programmers
don't have to understand what they're doing.
(d) In my opinion the paper didn't do this at all. There was no
analysis of "the shell" showing weaknesses and an explanation
of why one particular path was taken. And there was no discussion
of what was being done with shells to cause whatever problems you
were addressing and possibly ameliorating problems with some up-front
sanity. Again, being a geezer I'm reminded of past events
that repeat themselves. I joined a start-up company decades ago
that was going to speed up circuit simulation 100x by plugging
custom-designed floating-point processing boards into a multiprocessor
machine. I managed to beat that 100x just by cleverly crafting the
database and memory management code. The fact that the company founder
never verified his idea led to a big waste of resources. But, he did
manage to raise venture capital which is akin to getting DARPA funds.
Nikos Vasilakis writes:
>
> To add to Michael's points, HotOS' "position" papers are often
> intended as provocative, call-to-arms statements targeting primarily
> the research community (academic and industrial research). Our key
> position, which we possibly failed to communicate, is "Hey researchers
> everywhere, let's do more research on the shell! (and here's why)".
While provocative name-calling and false statements seem to have become
the political norm in America I don't think that they're appropriate in
a professional context.
In my experience a call-to-arms isn't productive unless those called
understand the nature of the call. I'm reminded of something that happened
many decades ago; my partner asked me to proof a paper that she was writing
for her masters degree. I read it over with a confused look and asked her
what she was trying to say. She responded, and I told her to write that
down and stop trying to sound like someone else. Turned her into a much
better writer. So if the paper wanted to say "Hey researchers ..." it
should have done so instead of being obtuse.
To continue on this point and Michael's (a) above, I don't see a lot of
value in proclaiming that research can be done. I think that a more
fruitful approach is to cover what has been done, what you're doing,
and what you see but aren't doing.
> For our more conventional research papers related to the shell, which
> might cover your concerns about semantics, correctness, and
> performance please see next. These three papers also provide important
> context around the HotOS paper you read:
> ...
Tracking down your other work was key to understanding this paper. It's
my opinion that my having to do is illustrative of the problems with the
paper.
> Thank you for taking the time to read our paper and comment on it.
> Could you please share the emails of everyone mentioned at the end of
> your email? We are preparing a longer report on a recent shell
> roundtable, and would love to get everyone's feedback!
While I appreciate the offer to send the longer report, it would only be
of interest if it was substantially more professional. There is no interest
in reviewing work that is not clearly presented, has not been proofread
and edited, includes unsubstantiated, pejorative, or meaningless statements,
includes incorrect examples, or statistically irrelevant results. Likewise,
there is also no interest if the homework hasn't been done to put the
report in context with regards to prior art and other current work.
Jon Steinghart <jsacm(a)fourwinds.com>
Warner Losh <imp(a)bsdimp.com>
John Cowan <cowan(a)ccil.org>
Larry McVoy <lm(a)mcvoy.com>
John Dow <jmd(a)nelefa.org>
Andy Kosela <akosela(a)andykosela.com>
Clem Cole <clemc(a)ccc.com>
Steve Bourne does not want to give out his address
On the subject of tac (concatenate and print files in reverse), I can
report that the tool was written by my late friend Jay Lepreau in the
Department of Computer Science (now, School of Computing) at the
University of Utah. The GNU coreutils distribution for src/tac.c
contains a copyright for 1988-2020.
I searched my TOPS-20 PDP-10 archives, and found no source code for
tac, but I did find an older TOPS-20 executable in Jay's personal
directory with a file date of 17-Mar-1987. There isn't much else in
that directory, so I suspect that he just copied over a needed tool
from his Department of Computer Science TOPS-20 system to ours in the
College of Science.
----------------------------------------
P.S. Jay was the first to get Steve Johnson's Portable C Compiler,
pcc, to run on the 36-bit PDP-10, and once we had pcc, we began the
move from writing utilities in Pascal and PDP-10 assembly language to
doing them in C. The oldest C file for pcc in our PDP-10 archives is
dated 17-Mar-1981, with other pcc files dated to mid-1983, and final
compiler executables dated 12-May-1986. Four system header files are
dated as late as 4-Oct-1986, presumably patched after the compiler was
built.
Later, Kok Chen and Ken Harrenstien's kcc provided another C compiler
that added support for byte datatypes, where a byte could be anything
from 1 to 36 bits. The oldest distribution of kcc in our archives is
labeled "Fifth formal distribution snapshot" and dated 20-Apr-1988.
My info-kcc mailing list archives date from the list beginning, with
an initial post from Ken dated 27-Jul-1986 announcing the availability
of kcc at sri-nic.arpa.
By mid-1987, we had a dozen Sun workstations and NFS fileserver; they
marked the beginning of our move to a Unix workstation environment,
away from large, expensive, and electricity-gulping PDP-10 and VAX
mainframes.
By the summer of 1991, those mainframes were retired. I recall
speaking to a used-equipment vendor about our VAX 8600, which cost
about US$450K (discounted academic pricing) in 1986, and was told that
its value was depreciating about 20% per month. Although many of us
missed TOPS-20 features, I don't think anyone was sad to say goodbye
to VMS. We always felt that the VMS developers worked in isolation
from the PDP-10 folks, and thus learned nothing from them.
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: beebe(a)math.utah.edu -
- 155 S 1400 E RM 233 beebe(a)acm.org beebe(a)computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------
Some comments from someone (me) who tends to be pickier than
most about cramming programs together and endless sets of
options:
I, too, had always thought sed was older than head. I stand
corrected. I have a long-standing habit of typing sed 10q but
don't spend much time fussing about head.
When I arrived at Bell Labs in late summer 1984, tail -f was
in /usr/bin and in the manual, readslow was only in /usr/bin.
readslow was like tail -f, except it either printed the entire
file first or (option -e) started at the end of the file.
I was told readslow had come first, and had been invented in a
hurry because people wanted to watch in real time the moves
logged by one of Belle's chess matches. Vague memory says it
was written by pjw; the name and the code style seem consistent
with that.
Personally I feel like tail -r and tail -f both fit reasonably
well within what tail does, since both have to do with the
bottom of the file, though -r's implementation does make for
a special extra code path in tail so maybe a separate program
is better. What I think is a bigger deal is that I have
frequently missed tail -r on Linux systems, and somehow hadn't
spotted tac; thanks to whoever here (was it Ted?) pointed it
out first!
On the other hand, adding data-processing functions to cat has
never made sense to me. It seems to originate from a mistaken
notion that cat's focus is printing data on terminals, rather
than concatenating data from different places. Here is a test:
if cat -v and cat -n and all that make sense, why shouldn't
cat also subsume tr and pr and even grep? What makes converting
control characters and numbering lines so different from swapping
case and adding page headers? I don't see the distinction, and
so I think vis(1) (in later Research) makes more sense than cat -v
and nl(1) (in Linux for a long time) more sense than cat -n.
(I'd also happily argue that given nl, pr shouldn't number lines.
That a program was in V6 or V7 doesn't make it perfect.)
And all those special options to wc that amounted to doing
arithmetic on the output were always just silly. I'm glad
they were retracted.
On the other other hand, why didn't I know about tac? Because
there are so damn many programs in /usr/bin these days. When
I started with UNIX ca. 1980, the manual (even the BSD version)
was still short enough that one could sit down and read it through,
section by section, and keep track of what one had read, and
remember what all the different tools did. That hasn't been
true for decades. This could be an argument for adding to
existing programs (which many people already know about) rather
than adding new programs (which many people will never notice).
The real problem is that the system is just too damn big. On
an Ubuntu 18.04 system I run, ls /usr/bin | wc -l shows 3242
entries. How much of that is redundant? How much is rarely or
never used? Nobody knows, and I suspect few even try to find
out. And because nobody knows, few are brave enough to throw
things away, or even trim out bits of existing things.
One day in the late 1980s, I helped out with an Introduction
to UNIX talk at a DECUS symposium. One of the attendees noticed
the `total' line in the output of ls, and asked why is that there?
doesn't that contradict the principles of tools' output you've
just been talking about? I thought about it, and said yes,
you're right, that's a bit of old history and shouldn't be
there any more. When I got home to New Jersey, I took the
`total' line out of Research ls.
Good luck doing anything like that today.
Norman Wilson
Toronto ON
What was the first machine to run rogue? I understand that it was written
by Glenn Wichman and Michael Toy at UC Santa Cruz ca. 1980, using the
`curses` library (Ken Arnold's original, not Mary Ann's rewrite). I've seen
at least one place that indicates it first ran on 6th Edition, but that
doesn't sound right to me. The first reference I can find in BSD is in 2.79
("rogue.doc"), which also appears to be the first release to ship curses.
Anyone have any info? Thanks!
- Dan C.
In this week's BSDNow.tv podcast, available at
https://www.bsdnow.tv/409
there is a story about a new conference paper on the Unix shell. The
paper is available at
Unix shell programming: the next 50 years
HotOS '21: Workshop on Hot Topics in Operating Systems, Ann
Arbor, Michigan, 1 June, 2021--3 June, 2021
https://doi.org/10.1145/3458336.3465294
The tone is overall negative, though they do say nice things about
Doug McIlroy and Steve Johnson, and they offer ideas about
improvements.
List readers will have their own views of the paper. My own is that,
despite its dark corners, the Bourne shell has served us
extraordinarily well, and I have been writing in it daily for decades
without being particularly bothered by the many issues raised by the
paper's authors. Having dealt with so-called command shells on
numerous other operating systems, at least the Unix shells rarely get
in my way.
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: beebe(a)math.utah.edu -
- 155 S 1400 E RM 233 beebe(a)acm.org beebe(a)computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------