[TUHS] Re: If forking is bad, how about buffering?

15 May 2024

On Tue, May 14, 2024 at 7:10 AM G. Branden Robinson
&lt;g.branden.robinson(a)gmail.com&gt; wrote:
...
  [snip]
 Viewpoint 1: Perspective from Pike's Peak 
Clever.
...
  Elementary Unix commands should be elementary.  Unix
is a kernel.
 Programs that do simple things with system calls should remain simple.
 This practices makes the system (the kernel interface) easier to learn,
 and to motivate and justify to others.  Programs therefore test the
 simplicity and utility of, and can reveal flaws in, the set of
 primitives that the kernel exposes.  This is valuable stuff for a
 research organization.  "Research" was right there in the CSRC's name.

I believe this is at once making a more complex argument than was
proffered, and at the same misses the contextual essence that Unix was
created in.
...
  Viewpoint 2: "I Just Want to Serve 5
Terabytes"[1]
 cat(1)'s man page did not advertise the traits in the foregoing
 viewpoint as objectives, and never did.[2]  Its avowed purpose was to
 copy, without interruption or separation, 1..n files from storage to and
 output channel or stream (which might be redirected).
 I don't need to tell convince that this is a worthwhile application.
 But when we think about the many possible ways--and destinations--a
 person might have in mind for that I/O channel, we have to face the
 necessity of buffering or performance goes through the floor.
 It is 1978.  Some VMS 
I don't know about that; VMS IO is notably slower than Unix IO by
default. Unlike VMS, Unix uses the buffer cache to serialize access to
the underlying storage device(s). Ironically, caching here is a major
win, not just for speed, but to make it relatively easy to reason
about the state of a block, since that state is removed from the
minutiae of the underlying storage device and instead handled in the
bio layer. Treating the block cache as a fixed-size pool yields a
relatively simple state machine for synchronizing between the
in-memory and on-disk representations of data.
...
 [snip]
 And this, as we all know, is one of the reasons the standard I/O library
 came into existence.  Mike Lesk, I surmise, understood that the
 "applications programmer" having knowledge of kernel internals was in
 general neither necessary nor desirable. 
I'm not sure about that.  I suspect that the justification _may_ have
been more along the lines of noting that many programs implemented
their own, largely similar buffering strategies, and that it was
preferable to centralize those into a single library, and also noting
that building some kinds of programs was inconvenient using raw system
calls. For instance, something like `gets` is handy, but is _annoying_
to write using just read(2). It can obviously be done, but if I don't
have to, I'd prefer not to.
...
  [snip]
 We should have kept cat(1), and let it grow as many flags as practical
 use demanded--_except_ for `-u`--and at the _same time_ developed a new
 kcat(1) command that really was just a thin wrapper around system calls.
 Then you'd be a lot closer to measuring what the kernel was really
 doing, what you were paying for it, and you could still boast of your
 elegance in OS textbooks.
 [snip] 
Here's where I think this misses the mark: this focuses too much on
the idea that simple programs exist as to be tests for, and exemplars
of, the kernel system call interface, but what evidence do you have
for that? A simpler explanation is that simple programs are easier to
write, easier to read, easier to reason about, test, and examine for
correctness. Unix amplified this with Doug's "garden hoses of data"
idea and the advent of pipes; here, it was found that small, simple
programs could be combined in often surprisingly unanticipated ways.
Unix built up a philosophy about _how_ to write programs that was
rooted in the problems that were interesting when Unix was first
created. Something we often forget is that research systems are built
to address problems that are interesting _to the researchers who build
them_. This context can shape a system, and we see that with Unix: a
highly synchronous system call interface, because overly elaborate
async interfaces were hard to program; a simple file abstraction that
was easy to use (open/creat/read/write/close/seek/stat) because files
on other contemporary systems were baroque things that were difficult
to use; a simple primitive for the creation of processes because,
again, on other systems processes were very heavy, complicated things
that were difficult to use. Unix took problems related to IO and
processes and made them easy. By the 80s, these were pretty well
understood, so focus shifted to other things (languages, networking,
etc).
Unix is one of those rare beasts that escaped the lab and made it out
there in the wild. It became the workhorse that beget a whole two or
three generations of commercial work; it's unsurprising that when the
web explosion happened, Unix became the basis for it: it was there, it
was familiar, and by then it wasn't a research project anymore, but a
basis for serious commercial work. That it has retained the original
system call interface is almost incidental; perhaps that fits with
your brocolli-man analogy.
        - Dan C.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

[TUHS] Re: If forking is bad, how about buffering?