So fork() is a significant nuisance. How about the far more ubiquitous
problem of IO buffering?
On Sun, May 12, 2024 at 12:34:20PM -0700, Adam Thornton wrote:
> But it does come down to the same argument as
>
https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.…
The Microsoft manifesto says that fork() is an evil hack. One of the cited
evils is that one must remember to flush output buffers before forking, for
fear it will be emitted twice. But buffering is the culprit, not the
victim. Output buffers must be flushed for many other reasons: to avoid
deadlock; to force prompt delivery of urgent output; to keep output from
being lost in case of a subsequent failure. Input buffers can also steal
data by reading ahead into stuff that should go to another consumer. In all
these cases buffering can break compositionality. Yet the manifesto blames
an instance of the hazard on fork()!
To assure compositionality, one must flush output buffers at every possible
point where an unknown downstream consumer might correctly act on the
received data with observable results. And input buffering must never
ingest data that the program will not eventually use. These are tough
criteria to meet in general without sacrificing buffering.
The advent of pipes vividly exposed the non-compositionality of output
buffering. Interactive pipelines froze when users could not provide input
that would force stuff to be flushed until the input was informed by that
very stuff. This phenomenon motivated cat -u, and stdio's convention of
line buffering for stdout. The premier example of input buffering eating
other programs' data was mitigated by "here documents" in the Bourne shell.
These precautions are mere fig leaves that conceal important special cases.
The underlying evil of buffered IO still lurks. The justification is that
it's necessary to match the characteristics of IO devices and to minimize
system-call overhead. The former necessity requires the attention of
hardware designers, but the latter is in the hands of programmers. What can
be done to mitigate the pain of border-crossing into the kernel? L4 and its
ilk have taken a whack. An even more radical approach might flow from the
"whitepaper" at www.codevalley.com.
In any even the abolition of buffering is a grand challenge.
Doug
On Sat, May 11, 2024 at 2:35 PM Theodore Ts'o <tytso(a)mit.edu> wrote:
>
> I bet most of the young'uns would not be trying to do this as a shell
> script, but using the Cloud SDK with perl or python or Go, which is
> *way* more bloaty than using /bin/sh.
>
> So while some of us old farts might be bemoaning the death of the Unix
> philosophy, perhaps part of the reality is that the Unix philosophy
> were ideal for a simpler time, but might not be as good of a fit
> today
I'm finding myself in agreement. I might well do this with jq, but as you
point out, you're using the jq DSL pretty extensively to pull out the
fields. On the other hand, I don't think that's very different than piping
stuff through awk, and I don't think anyone feels like _that_ would be
cheating. And jq -L is pretty much equivalent to awk -F, which is how I
would do this in practice, rather than trying to inline the whole jq bit.
But it does come down to the same argument as
https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.…
And it is true that while fork() is a great model for single-threaded
pipeline-looking tasks, it's not really what you want for an interactive
multithreaded application on your phone's GUI.
Oddly, I'd have a slightly different reason for reaching for Python (which
is probably how I'd do this anyway), and that's the batteries-included
bit. If I write in Python, I've got the gcloud api available as a Python
module, and I've got a JSON parser also available as a Python module (but I
bet all the JSON unmarshalling is already handled in the gcloud library),
and I don't have to context-switch to the same degree that I would if I
were stringing it together in the shell. Instead of "make an HTTP request
to get JSON text back, then parse that with repeated calls to jq", I'd just
get an object back from the instance fetch request, pick out the fields I
wanted, and I'd be done.
I'm afraid only old farts write anything in Perl anymore. The kids just
mutter "OK, Boomer" when you try to tell them how much better CPAN was than
PyPi. And it sure feels like all the cool kids have abandoned Go for Rust,
although Go would be a perfectly reasonable choice for this task as well
(and would look a lot like Python: get an object back, pick off the useful
fields).
Adam
There was nothing unique about the size or the object code of Dennis's C
compiler. In the 1960s, Digitek had a thriving business of making Fortran
compilers for all manner of machines. To optimize space usage, the
compilers' internal memory model comprised variable-size movable tables,
called "rolls". To exploit this non-native architecture, the compilers
themselves were interpreted, although they generated native code. Bob
McClure tells me he used one on an SDS910 that had 8K 16-bit words.
Dennis was one-up on Digitek in having a self-maintaining compiler. Thus,
when he implemented an optimization, the source would grow, but the
compiler binary might even shrink thanks to self-application.
Doug
nl(1) uses the notable character sequences “\:\:\:”, “\:\:”, and “\:” to delimit header, body, and trailer sections within its input.
I wondered if anyone was able to shed light on the reason those were adopted as the defaults?
I would have expected perhaps something compatible with *roff (like, .\” something).
FreeBSD claims nl first appeared in System III (although it previously claimed SVR2), but I haven’t dug into the implementation any further.
Thanks in advance,
d
While the idea of small tools that do one job well is the core tenant of
what I think of as the UNIX philosophy, this goes a bit beyond UNIX, so I
have moved this discussion to COFF and BCCing TUHS for now.
The key is that not all "bloat" is the same (really)—or maybe one person's
bloat is another person's preference. That said, NIH leads to pure bloat
with little to recommend it, while multiple offerings are a choice. Maybe
the difference between the two may be one person's view over another.
On Fri, May 10, 2024 at 6:08 AM Rob Pike <robpike(a)gmail.com> wrote:
> Didn't recognize the command, looked it up. Sigh.
>
Like Rob -- this was a new one for me, too.
I looked, and it is on the SYS3 tape; see:
https://www.tuhs.org/cgi-bin/utree.pl?file=SysIII/usr/src/man/man1/nl.1
> pr -tn <file>
>
> seems sufficient for me, but then that raises the question of your
> question.
>
Agreed, that has been burned into the ROMs in my fingers since the
mid-1970s 😀
BTW: SYS3 has pr(1) with both switches too (more in a minute)
> I've been developing a theory about how the existence of something leads
> to things being added to it that you didn't need at all and only thought of
> when the original thing was created.
>
That is a good point, and I generally agree with you.
> Bloat by example, if you will. I suspect it will not be a popular theory,
> however accurately it may describe the technological world.
>
Of course, sometimes the new features >>are<< easier (more natural *for
some people*). And herein lies the core problem. The bloat is often
repetitive, and I suggest that it is often implemented in the wrong place -
and usually for the wrong reasons.
Bloat comes about because somebody thinks they need some feature and
probably doesn't understand that it is already there or how they can use
it. But they do know about it, their tool must be set up to exploit it - so
they do not need to reinvent it. GUI-based tools are notorious for this
failure. Everyone seems to have a built-in (unique) editor, or a private
way to set up configuration options et al. But ... that walled garden is
comfortable for many users and >>can be<< useful sometimes.
Long ago, UNIX programmers learned that looking for $EDITOR in the
environment was way better than creating one. Configuration was as ASCII
text, stored in /etc for system-wide and dot files in the home for users.
But it also means the >>output<< of each tool needs to be usable by each
other [*i.e.*, docx or xlx files are a no-no).
For example, for many things on my Mac, I do use the GUI-based tools --
there is no doubt they are better integrated with the core Mac system >>for
some tasks.<< But only if I obey a set of rules Apple decrees. For
instance, this email read is easier much of the time than MH (or the HM
front end, for that matter), which I used for probably 25-30 years. But on
my Mac, I always have 4 or 5 iterm2(1) open running zsh(1) these days. And,
much of my typing (and everything I do as a programmer) is done in the shell
(including a simple text editor, not an 'IDE'). People who love IDEs swear
by them -- I'm just not impressed - there is nothing they do for me that
makes it easier, and I have learned yet another scheme.
That said, sadly, Apple is forcing me to learn yet another debugger since
none of the traditional UNIX-based ones still work on the M1-based systems.
But at least LLDB is in the same key as sdb/dbx/gdb *et al*., so it is a
PITA but not a huge thing as, in the end, LLDB is still based on the UNIX
idea of a single well-designed and specific to the task tool, to do each
job and can work with each other.
FWIW: I was recently a tad gob-smacked by the core idea of UNIX and its
tools, which I have taken for a fact since the 1970s.
It turns out that I've been helping with the PiDP-10 users (all of the
PiDPs are cool, BTW). Before I saw UNIX, I was paid to program a PDP-10. In
fact, my first UNIX job was helping move programs from the 10 to the UNIX.
Thus ... I had been thinking that doing a little PDP-10 hacking shouldn't
be too hard to dust off some of that old knowledge. While some of it has,
of course, come back. But daily, I am discovering small things that are so
natural with a few simple tools can be hard on those systems.
I am realizing (rediscovering) that the "build it into my tool" was the
norm in those days. So instead of a pr(1) command, there was a tool that
created output to the lineprinter. You give it a file, and it is its job to
figure out what to do with it, so it has its set of features (switches) -
so "bloat" is that each tool (like many current GUI tools) has private ways
of doing things. If the maker of tool X decided to support some idea, they
would do it like tool Y. The problem, of course, was that tools X and Y
had to 'know about' each type of file (in IBM terms, use its "access
method"). Yes, the engineers at DEC, in their wisdom, tried to
"standardize" those access methods/switches/features >>if you implemented
them<< -- but they are not all there.
This leads me back to the question Rob raises. Years ago, I got into an
argument with Dave Cutler RE: UNIX *vs.* VMS. Dave's #1 complaint about
UNIX in those days was that it was not "standardized." Every program was
different, and more to Dave's point, there was no attempt to make switches
or errors the same [getopt(3) had been introduced but was not being used by
most applications). He hated that tar/tp used "keys" and tools like cpio
used switches. Dave hated that I/O was so simple - in his world all user
programs should use his RMS access method of course [1]. VMS, TOPS, *etc.*,
tried to maintain a system-wide error scheme, and users could look things
like errors up in a system DB by error number, *etc*. Simply put, VMS is
very "top-down."
My point with Dave was that by being "bottom-up," the best ideas in UNIX
were able to rise. And yes, it did mean some rough edges and repeated
implementations of the same idea. But UNIX offered a choice, and while Rob
and I like and find: pr -tn perfectly acceptable thank you, clearly someone
else desired the features that nl provides. The folks that put together
System 3 offer both solutions and let the user choose.
This, of course, comes as bloat, but maybe that is a type of bloat so bad?
My own thinking is this - get things down to the basics and simplest
privatives and then build back up. It's okay to offer choices, as long as
the foundation is simple and clean. To me, bloat becomes an issue when you
do the same thing over and over again, particularly because you can not
utilize what is there already, the worst example is NIH - which happens way
more than it should.
I think the kind of bloat that GUI tools and TOPS et al. created forces
recreation, not reuse. But offering choice and the expense of multiple
tools that do the same things strikes me as reasonable/probably a good
thing.
1.] BTW: One of my favorite DEC stories WRT to VMS engineering has to do
with the RMS I/O system. Supporting C using VMS was a bit of PITA.
Eventually, the VMS engineers added Stream I/O - which simplified the C
runtime, but it was also made available for all technical languages.
Fairly soon after it was released, the DEC Marketing folks discovered
almost all new programs, regardless of language, had started to use Stream
I/O and many older programs were being rewritten by customers to use it. In
fact, inside of DEC itself, the languages group eventually rewrote things
like the FTN runtime to use streams, making it much smaller/easier to
maintain. My line in the old days: "It's not so bad that ever I/O has
offer 1000 options, it's that Dave to check each one for every I/O. It's a
classic example of how you can easily build RMS I/O out of stream-based
I/O, but the other way around is much harder. My point here is to *use
the right primitives*. RMS may have made it easier to build RDB, but it
impeded everything else.
> On Wed, 8 May 2024 14:12:15 -0400,Clem Cole <clemc(a)ccc.com <mailto:clemc@ccc.com>> wrote:
>
> FWIW: The DEC Mod-II and Mod-III
> were new implementations from DEC WRL or SRC (I forget). They targeted
> Alpha and I, maybe Vax. I'd have to ask someone like Larry Stewart or Jeff
> Mogul who might know/remember, but I thought that the font end to the DEC
> MOD2 compiler might have been partly based on Wirths but rewritten and by
> the time of the MOD3 FE was a new one originally written using the previous
> MOD2 compiler -- but I don't remember that detail.
Michael Powell at DEC WRL wrote a Modula 2 compiler that generated VAX code. Here’s an extract from announcement.d accompanying a 1992 release of the compiler from gatekeeper.dec.com <http://gatekeeper.dec.com/>:
The compiler was designed and built by Michael L. Powell, and originally
released in 1984. Joel McCormack sped the compiler up, fixed lots of bugs, and
swiped/wrote a User's Manual. Len Lattanzi ported the compiler to the MIPS.
Later, Paul Rovner and others at DEC SRC designed Modula-2+ (a language extension with exceptions, threads, garbage collection, and runtime type dispatch). The Modula-2+ compiler was originally based on Powell’s compiler. Modula-2+ ran on the VAX.
Here’s a DEC SRC research report on Modula-2+:
http://www.bitsavers.org/pdf/dec/tech_reports/SRC-RR-3.pdf
Modula-3 was designed at DEC SRC and Olivetti Labs. It had a portable implementation (using the GCC back end) and ran on a number of machines including Alpha.
Paul
Sorry for the dual list post, I don’t who monitors COFF, the proper place for this.
There may a good timeline of the early decades of Computer Science and it’s evolution at Universities in some countries, but I’m missing it.
Doug McIlroy lived through all this, I hope he can fill in important gaps in my little timeline.
It seems from the 1967 letter, defining the field was part of the zeitgeist leading up to the NATO conference.
1949 ACM founded
1958 First ‘freshman’ computer course in USA, Perlis @ CMU
1960 IBM 1400 - affordable & ‘reliable’ transistorised computers arrived
1965 MIT / Bell / General Electric begin Multics project.
CMU establishes Computer Sciences Dept.
1967 “What is Computer Science” letter by Newell, Perlis, Simon
1968 “Software Crisis” and 1st NATO Conference
1969 Bell Labs withdraws from Multics
1970 GE's sells computer business, including Multics, to Honeywell
1970 PDP-11/20 released
1974 Unix issue of CACM
=========
The arrival of transistorised computers - cheaper, more reliable, smaller & faster - was a trigger for the accelerated uptake of computers.
The IBM 1400-series was offered for sale in 1960, becoming the first (large?) computer to sell 10,000 units - a marker of both effective marketing & sales and attractive pricing.
The 360-series, IBM’s “bet the company” machine, was in full development when the 1400 was released.
=========
Attached is a text file, a reformatted version of a 1967 letter to ’Science’ by Allen Newell, Alan J. Perlis, and Herbert A. Simon:
"What is computer science?”
<https://www.cs.cmu.edu/~choset/whatiscs.html>
=========
A 1978 masters thesis on Early Australian Computers (back to 1950’s, mainly 1960’s) cites a 17 June 1960 CSIRO report estimating
1,000 computers in the US and 100 in the UK. With no estimate mentioned for Western Europe.
The thesis has a long discussion of what to count as a (digital) ‘computer’ -
sources used different definitions, resulting in very different numbers,
making it difficult to reconcile early estimates, especially across continents & countries.
Reverse estimating to 1960 from the “10,000” NATO estimate of 1968, with a 1- or 2-year doubling time,
gives a range of 200-1,000, including the “100” in the UK.
Licklider and later directors of ARPA’s IPTO threw millions into Computing research in the 1960’s, funding research and University groups directly.
[ UCB had many projects/groups funded, including the CSRG creating BSD & TCP/IP stack & tools ]
Obviously there was more to the “Both sides of the Atlantic” argument of E.W. Dijkstra and Alan Kay - funding and numbers of installations was very different.
The USA had a substantially larger installed base of computers, even per person,
and with more university graduates trained in programming, a higher take-up in private sector, not just the public sector and defence, was possible.
=========
<https://www.acm.org/about-acm/acm-history>
In September 1949, a constitution was instituted by membership approval.
————
<https://web.archive.org/web/20160317070519/https://www.cs.cmu.edu/link/inst…>
In 1958, Perlis began teaching the first freshman-level computer programming course in the United States at Carnegie Tech.
In 1965, Carnegie Tech established its Computer Science Department with a $5 million grant from the R.K. Mellon Foundation. Perlis was the first department head.
=========
From the 1968 NATO report [pg 9 of pdf ]
<http://homepages.cs.ncl.ac.uk/brian.randell/NATO/nato1968.PDF>
Helms:
In Europe alone there are about 10,000 installed computers — this number is increasing at a rate of anywhere from 25 per cent to 50 per cent per year.
The quality of software provided for these computers will soon affect more than a quarter of a million analysts and programmers.
d’Agapeyeff:
In 1958 a European general purpose computer manufacturer often had less than 50 software programmers,
now they probably number 1,000-2,000 people; what will be needed in 1978?
_Yet this growth rate was viewed with more alarm than pride._ (comment)
=========
--
Steve Jenkin, IT Systems and Design
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA
mailto:sjenkin@canb.auug.org.au http://members.tip.net.au/~sjenkin