In the last months, I've spent a little time on curating John Walker's Unix clone and software stack, including an emulator to run it:
https://gitlab.com/marinchip
After creating a basic tool chain (edit, asm, link and a simple executive), John set out to find a compiler. Among the first programs were a port of the META 3 compiler-generator (similar to TMG on early Unix) and a port of Birch-Hansen’s Pascal compiler. META was used to create a compiler that generated threaded code. He found neither compiler good enough for his goals and settled on writing his Unix-like OS in assembler. As the 9900 architecture withered after 1980, this sealed the fate of this OS early on -- had he found a good compiler, the code might have competed alongside Coherent, Idris, and Minix during the 80’s.
This made me realise once more how unique the Ritchie C compiler was. In my view its uniqueness combines three aspects:
1. The C language itself
2. The ability to run natively on small hardware (even an LSI-11 system)
3. Generating code with modest overhead versus handwritten assembler (say 30%)
As has been observed before, working at a higher abstraction level makes it easier to work on algorithms and on refactoring, often earning back the efficiency loss. John Walkers work may be case in point: I estimate that his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as compiled for the 9900 architecture).
There are three papers on DMR’s website about the history of the compiler and a compare-and-contrast with other compilers of the era:
https://www.bell-labs.com/usr/dmr/www/primevalC.htmlhttps://www.bell-labs.com/usr/dmr/www/chist.htmlhttps://www.bell-labs.com/usr/dmr/www/hopl.html
It seems to me that these papers rather understate the importance of generating good quality code. As far as I can tell, BCPL and BLISS came close, but were too large to run on a PDP-11 and only existed as cross-compilers. PL/M was a cross-compiler and generated poorer code. Pascal on small machines compiled to a virtual machine. As far as I can tell, during most of the 70s there was no other compiler that generated good quality code and ran natively on a small (i.e. PDP-11 class) machine.
As far as I can tell the uniqueness was mostly in the “c1” phase of the compiler. The front-end code of the “c0” phase seems to use more or less similar techniques as many contemporary compilers. The “c1” phase seems to have been unique in that it managed to do register allocation and instruction selection with a pattern matcher and associated code tables squeezed into a small address space. On a small machine, other native compilers of the era typically proceeded to generate threaded code, code for a virtual machine or poor quality native code that evaluated expressions using stack operations rather than registers.
I am not sure why DMR's approach was not more widely used in the 1970’s. The algorithms he used do not seem to be new and appear to have their roots in other (larger) compilers of the 1960’s. The basic design seems to have been in place from the very first iterations of his compiler in 1972 (see V2 tree on TUHS) and he does not mention these algorithms as being special or innovative in his later papers.
Any observations / opinions on why DMR’s approach was not more widely used in the 1970’s?
>>> malloc(0) isn't undefined behaviour but implementation defined.
>>
>> In modern C there is no difference between those two concepts.
> Can you explain more about your view
There certainly is a difference, but in this case the practical
implications are the same: avoid malloc(0). malloc(0) lies at the high end
of a range of severity of concerns about implementation-definedness. At the
low end are things like the size of ints, which only affects applications
that may confront very large numbers. In the middle is the default
signedness of chars, which generally may be mitigated by explicit type
declarations.
For the size of ints, C offers guardrails like INT_MAX. There is no test to
discern what an error return from malloc(0) means.
Is there any other C construct that implementation-definedness renders
useless?
Doug
> Who created the "cat" command and did they have the
> word "catenate" or "concatenate" in their heads?
Ken Thompson wrote "cat" for the PDP-7, with "concatenate" in
mind. The cat(1) page in the v1 manual is titled, "concatenate (or
print) files". Only later did someone in Research--I don't know
who--remark on the existence of the shorter synonym. It was
deliberately adopted in v7, perhaps because it better mirrored
the command name.
But brevity is the defensible argument for "catenate", while
familiarity boosts "concatenate". It stll takes some conscious
effort for me to use the former, However, I sense sinister
vibes in "concatenate", driven by the phrase "concatenation
of events", which often is used to explain misfortune.
Doug
> C's refusal to specify dynamic memory allocation in the language runtime
> (as opposed to, eventually, the standard library)
This complaint overlooks one tenet of C: every operation in what you
call "language runtime" takes O(1) time. Dynamic memory allocation
is not such an operation.
Your hobbyhorse awakened one of mine.
malloc was in v7, before the C standard was written. The standard
spinelessly buckled to allow malloc(0) to return 0, as some
implementations gratuitously did. I can't imagine that any program
ever actually wanted the feature. Now it's one more undefined
behavior that lurks in thousands of programs.
There are two arguments for malloc(0), Most importantly, it caters for
a limiting case for aggregates generated at runtime--an instance of
Kernighan's Law, "Do nothing gracefully". It also provides a way to
create a distinctive pointer to impart some meta-information, e.g.
"TBD" or "end of subgroup", distinct from the null pointer, which
merely denotes absence.
Doug
All, I got this e-mail and thought many of you would appreciate the link.
Cheers, Warren
----- Forwarded message from Poul-Henning Kamp -----
I stumbled over this:
https://www.telecomarchive.com/lettermemo.html
is the TUHS crew aware of that resource ?
----- End forwarded message -----
I'm wondering if there are places where people who were in the Unix
Room wrote about the origins and evolution of what people (at least
used to(*)) refer to as "Unix Philosophy", and since some are in THIS
(TUHS) room, what they might have to say about it.
How much was in reaction to the complexity of Multics, and how much
was simply a response to the limited address spaces of
available and affordable hardware?
Eric S. Raymond wrote in "The Art of Unix Programming" quoting
Doug McIlroy and Rob Pike:
http://www.catb.org/esr/writings/taoup/html/ch01s06.html
And I wonder if they care to comment on it?
I have trouble taking ESR as authoritative, as, it seems to me that
Research Unix was more a product of the "Cathedral" (or at least a
contained community) than the "Bazaar" (at least the modern bazaar,
where everyone needs to leave a new feature grafito on the town
walls), and ESR
A side question for Rob Pike, is the "Not only is UNIX dead, it's
starting to smell really bad." quote accurate? Was it in reaction to
BSD, GNU, or all of the above?
(*) I say "used to", because, for the most part, minimalism seems to
have left the building. I can't look at modern GNU utilities, and
many, if not most open source packages and think they've gone WAY past
classic Unix minimalism, especially since I remember hearing that Bell
Research had happily stripped excess features (removal of "cat -s"
sticks in my mind) from later day research Unix, and because Stallman
is said to have coined the term "New Jersey" style as a synonym for
what Richard P. Gabriel called "Worse is Better", which seems, an
attack on minimalism (nothing less than "the right thing" is acceptable)
Worse is.... readings:
https://dreamsongs.com/WorseIsBetter.htmlhttps://dreamsongs.com/RiseOfWorseIsBetter.htmlhttps://dreamsongs.com/Files/IsWorseReallyBetter.pdfhttps://dreamsongs.com/Files/worse-is-worse.pdf
Anti-flamage disclainmers:
Inclusion of links above does not imply any agreement on my part! My
apologies in advance for any offense, misquote, or misunderstanding on
my part.
> From: Rik Farrow <rik(a)rikfarrow.com>
> Was the brevity typical of Unix command names a function of the tiny
> disk and memory available? Or more a function of having a Teletype 33
> for input?
I'm not sure the answer was ever written down (e.g. in a memo); we will
probably have to rely on memory - and memories that far back are now fairly
thin on the ground by now. Perhaps Mr. McIlroy (or Mr. Thompson, if we're
_really_ lucky) will humor us? :-)
I have the impression that some of the names are _possibly_ inherited from
Multics (which the early Unicians all used before Unix existed) - but maybe
not. The command to list a directory, on Multics, is 'ls' (but see below) -
but the Multics qcommand to remove a file is 'del' (not 'rm'); and change working
directory is 'cwd'. So maybe ls' is just chance?
Multics had a 'feature' where a segment (file) could have additional names (to
the main name), and this is used to add short aliases to many commands, so the
'base name'' for the directory list command is 'list'; 'ls' is a short
alias. A list of Multics commands (with short forms) is available here:
https://www.multicians.org/multics-commands.html
I'm not sure how early that alias mechanism came in, though; my copy of
"Introduction to Multics" (February, 1974) doesn't have short names (or, at
least, it doesn't use them).
It won't have anything to do with disk and memory. Having used a Teletype, it
would take noticeably longer to type in a longer name! It's also more effort
and time. I would expect those are the reasons for the short names.
Noel
> I wonder what happened to the amazing library at Murray Hill.
Last I knew, the Bell Labs archives were intact under supervision of a
professional archivist. Formally speaking, the archives and the library
were distinct entities. The library, which was open to self service 24
hours a day, declined rapidly after the bean counters decreed that it
should henceforth support itself on rental fees. Departments immediately
turned to buying books rather than borrowing them. It's very likely that
this was bad for the Labs' bottom line, but the cost (both monetary and
intellectual) was not visible as a budgetary line item.
The 24-hour library contributed to one of Ken's programming feats. Spurred
by a lunchtime remark that it would be nice to have a unit-conversion
program, Ken announced units(1) the next morning. Right from the start, the
program knew more than 200 units, thanks to a book Ken grabbed from the
library in the middle of the night.
Doug
> That CSTR number 1 is nicely formatted, is that troff?
The archive's CSTR 1 is ersatz. It's a 1973 journal article obtained from
JSTOR. I imagine the manuscript was largely copied from the CSTR, but the
printed paper certainly differs in meta-content and in layout, say nothing
of font. Having gone through the usual route of journal submission and
revision, the body text is probably not word-for-word identical to the CSTR
either.
Doug
Clem Cole:
Interesting -- 'Jason' had always been a Pascal hacker when the strip was
first created. As I recall, Berkeley Breathed had Wendell (his hacker
character) comment on that during the time of Pascal/C Wars.
====
But Jason later was revealed to be wearing Unix underpants:
https://www.gocomics.com/foxtrot/2002/02/25
Norman Wilson
Toronto ON