I've wondered about the cat flag war myself, and have a theory. Might
as well air it here since the real McCoy (and McIlroy) are available to
shoot it down. :)
I'm sure the following attempt at knot-slashing is not novel, but people
relentlessly return to this issue as if the presence of _flags_ is the
problem. (Plan 9 fans recite this point ritually, like a mantra.)
I say it isn't.
At 2024-05-14T17:10:38+1000, Rob Pike wrote:
I agree with your (as usual) perceptive analysis. Only
stopping by to
point out that I took the buffering out of cat. I didn't have your
perspicacity on why it should happen, just a desire to remove all the
damn flags. When I was done, cat.c was 35 lines long. Do a read, do a
write, continue until EOF. Guess what? That's all you need if you want
to cat files.
Sad to say Bell Labs's cat door was hard to open and most of the world
still has a cat with flags. And buffers.
I think this dispute is a proxy fight between two communities, or more
precisely two views of what cat(1), and other elementary Unix commands,
primarily exist to achieve. In my opinion both perspectives are valid,
and it's better to consider what each perspective wants than mandate
that either is superior.
Viewpoint 1: Perspective from Pike's Peak
Elementary Unix commands should be elementary. Unix is a kernel.
Programs that do simple things with system calls should remain simple.
This practices makes the system (the kernel interface) easier to learn,
and to motivate and justify to others. Programs therefore test the
simplicity and utility of, and can reveal flaws in, the set of
primitives that the kernel exposes. This is valuable stuff for a
research organization. "Research" was right there in the CSRC's name.
Viewpoint 2: "I Just Want to Serve 5 Terabytes"[1]
cat(1)'s man page did not advertise the traits in the foregoing
viewpoint as objectives, and never did.[2] Its avowed purpose was to
copy, without interruption or separation, 1..n files from storage to and
output channel or stream (which might be redirected).
I don't need to tell convince that this is a worthwhile application.
But when we think about the many possible ways--and destinations--a
person might have in mind for that I/O channel, we have to face the
necessity of buffering or performance goes through the floor.
It is 1978. Some VMS or, ugh, CP/M advocate from those piddly little
toy machines will come along. "Ha ha," they will say, "our OS is way
faster than the storied Unix even at the simple task of dumping files".
Nowhere[citation needed] outside of C tutorials is cat implemented as
int c;
while((c = getchar()) != EOF) putchar(c);
or its read()/write() system call equivalent.
The output channel might be across a network in a distributed computing
environment. Nobody wants to work with one byte at a time in that
situation. Ethernet's minimum packet size is 64 bytes. No one wants
that kind of overhead.
While composing this mail, I had a look at an early, pre-C version of
cat, spelling error in the only comment line and all.
https://minnie.tuhs.org/cgi-bin/utree.pl?file=V2/cmd/cat.s
putc:
movb r0,(r2)+
cmp r2,$obuf+512.
blo 1f
mov $1,r0
sys write; obuf; 512.
mov $obuf,r2
Well, look at that. Buffering. The author of this tool of course knew
the kernel well, including the size of its internal disk buffers (on the
assumption that I/O would mainly be happening to and from disks).
But that's a "leaky abstraction", or a "layering violation".
(That'll
be two tickets to the eternal fires of Brogrammer Hell, thanks.) Once
you sweep away the break room buzzwords we understand that cat is
presuming things that it should not (the size of the kernel's buffers,
and the nature of devices serving as source and sink).
And this, as we all know, is one of the reasons the standard I/O library
came into existence. Mike Lesk, I surmise, understood that the
"applications programmer" having knowledge of kernel internals was in
general neither necessary nor desirable.
What _should_ have happened, IMAO, is that as stdio.h came into
existence and the commercialization and USG/PWB-ification of Unix became
truly inevitable, is that Viewpoint 1 should have been salvaged for the
benefit of continuing operating systems research and kernel development.
But!
We should have kept cat(1), and let it grow as many flags as practical
use demanded--_except_ for `-u`--and at the _same time_ developed a new
kcat(1) command that really was just a thin wrapper around system calls.
Then you'd be a lot closer to measuring what the kernel was really
doing, what you were paying for it, and you could still boast of your
elegance in OS textbooks.
I concede that the name "kcat" would have been twice the length a
certain prominent user of the Unix kernel would have tolerated. Maybe
"kc" would have been better. The remaining 61 alphanumeric sigils that
might follow the 'k' would have been reserved for other exercises of the
kernel interface. If your kernel is sufficiently lean,[3] 62 cases
exercising it ought to be enough for anybody.
Regards,
Branden
[1]
https://news.ycombinator.com/item?id=29082014
[2]
https://minnie.tuhs.org/cgi-bin/utree.pl?file=V1/man/man1/cat.1
[3]
https://dl.acm.org/doi/10.1145/224056.224075