TUHS September 2024

tuhs@tuhs.org

66 participants
36 discussions

by Paul Ruizendaal

In the last months, I've spent a little time on curating John Walker's Unix clone and software stack, including an emulator to run it: https://gitlab.com/marinchip After creating a basic tool chain (edit, asm, link and a simple executive), John set out to find a compiler. Among the first programs were a port of the META 3 compiler-generator (similar to TMG on early Unix) and a port of Birch-Hansen’s Pascal compiler. META was used to create a compiler that generated threaded code. He found neither compiler good enough for his goals and settled on writing his Unix-like OS in assembler. As the 9900 architecture withered after 1980, this sealed the fate of this OS early on -- had he found a good compiler, the code might have competed alongside Coherent, Idris, and Minix during the 80’s. This made me realise once more how unique the Ritchie C compiler was. In my view its uniqueness combines three aspects: 1. The C language itself 2. The ability to run natively on small hardware (even an LSI-11 system) 3. Generating code with modest overhead versus handwritten assembler (say 30%) As has been observed before, working at a higher abstraction level makes it easier to work on algorithms and on refactoring, often earning back the efficiency loss. John Walkers work may be case in point: I estimate that his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as compiled for the 9900 architecture). There are three papers on DMR’s website about the history of the compiler and a compare-and-contrast with other compilers of the era: https://www.bell-labs.com/usr/dmr/www/primevalC.html https://www.bell-labs.com/usr/dmr/www/chist.html https://www.bell-labs.com/usr/dmr/www/hopl.html It seems to me that these papers rather understate the importance of generating good quality code. As far as I can tell, BCPL and BLISS came close, but were too large to run on a PDP-11 and only existed as cross-compilers. PL/M was a cross-compiler and generated poorer code. Pascal on small machines compiled to a virtual machine. As far as I can tell, during most of the 70s there was no other compiler that generated good quality code and ran natively on a small (i.e. PDP-11 class) machine. As far as I can tell the uniqueness was mostly in the “c1” phase of the compiler. The front-end code of the “c0” phase seems to use more or less similar techniques as many contemporary compilers. The “c1” phase seems to have been unique in that it managed to do register allocation and instruction selection with a pattern matcher and associated code tables squeezed into a small address space. On a small machine, other native compilers of the era typically proceeded to generate threaded code, code for a virtual machine or poor quality native code that evaluated expressions using stack operations rather than registers. I am not sure why DMR's approach was not more widely used in the 1970’s. The algorithms he used do not seem to be new and appear to have their roots in other (larger) compilers of the 1960’s. The basic design seems to have been in place from the very first iterations of his compiler in 1972 (see V2 tree on TUHS) and he does not mention these algorithms as being special or innovative in his later papers. Any observations / opinions on why DMR’s approach was not more widely used in the 1970’s?

6 months

Re: Minimum Array Sizes in 16 bit C (was Maximum)

by Douglas McIlroy

>>> malloc(0) isn't undefined behaviour but implementation defined. >> >> In modern C there is no difference between those two concepts. > Can you explain more about your view There certainly is a difference, but in this case the practical implications are the same: avoid malloc(0). malloc(0) lies at the high end of a range of severity of concerns about implementation-definedness. At the low end are things like the size of ints, which only affects applications that may confront very large numbers. In the middle is the default signedness of chars, which generally may be mitigated by explicit type declarations. For the size of ints, C offers guardrails like INT_MAX. There is no test to discern what an error return from malloc(0) means. Is there any other C construct that implementation-definedness renders useless? Doug

7 months, 1 week

Re: On computerese

by Douglas McIlroy

> Who created the "cat" command and did they have the > word "catenate" or "concatenate" in their heads? Ken Thompson wrote "cat" for the PDP-7, with "concatenate" in mind. The cat(1) page in the v1 manual is titled, "concatenate (or print) files". Only later did someone in Research--I don't know who--remark on the existence of the shorter synonym. It was deliberately adopted in v7, perhaps because it better mirrored the command name. But brevity is the defensible argument for "catenate", while familiarity boosts "concatenate". It stll takes some conscious effort for me to use the former, However, I sense sinister vibes in "concatenate", driven by the phrase "concatenation of events", which often is used to explain misfortune. Doug

8 months, 2 weeks

Re: Minimum Array Sizes in 16 bit C (was Maximum)

by Douglas McIlroy

> C's refusal to specify dynamic memory allocation in the language runtime > (as opposed to, eventually, the standard library) This complaint overlooks one tenet of C: every operation in what you call "language runtime" takes O(1) time. Dynamic memory allocation is not such an operation. Your hobbyhorse awakened one of mine. malloc was in v7, before the C standard was written. The standard spinelessly buckled to allow malloc(0) to return 0, as some implementations gratuitously did. I can't imagine that any program ever actually wanted the feature. Now it's one more undefined behavior that lurks in thousands of programs. There are two arguments for malloc(0), Most importantly, it caters for a limiting case for aggregates generated at runtime--an instance of Kernighan's Law, "Do nothing gracefully". It also provides a way to create a distinctive pointer to impart some meta-information, e.g. "TBD" or "end of subgroup", distinct from the null pointer, which merely denotes absence. Doug

8 months, 3 weeks

Fwd: Trove of CSTR's

by Warren Toomey

All, I got this e-mail and thought many of you would appreciate the link. Cheers, Warren ----- Forwarded message from Poul-Henning Kamp ----- I stumbled over this: https://www.telecomarchive.com/lettermemo.html is the TUHS crew aware of that resource ? ----- End forwarded message -----

9 months

Origins of "Unix Philosophy"

by Phil Budne

I'm wondering if there are places where people who were in the Unix Room wrote about the origins and evolution of what people (at least used to(*)) refer to as "Unix Philosophy", and since some are in THIS (TUHS) room, what they might have to say about it. How much was in reaction to the complexity of Multics, and how much was simply a response to the limited address spaces of available and affordable hardware? Eric S. Raymond wrote in "The Art of Unix Programming" quoting Doug McIlroy and Rob Pike: http://www.catb.org/esr/writings/taoup/html/ch01s06.html And I wonder if they care to comment on it? I have trouble taking ESR as authoritative, as, it seems to me that Research Unix was more a product of the "Cathedral" (or at least a contained community) than the "Bazaar" (at least the modern bazaar, where everyone needs to leave a new feature grafito on the town walls), and ESR A side question for Rob Pike, is the "Not only is UNIX dead, it's starting to smell really bad." quote accurate? Was it in reaction to BSD, GNU, or all of the above? (*) I say "used to", because, for the most part, minimalism seems to have left the building. I can't look at modern GNU utilities, and many, if not most open source packages and think they've gone WAY past classic Unix minimalism, especially since I remember hearing that Bell Research had happily stripped excess features (removal of "cat -s" sticks in my mind) from later day research Unix, and because Stallman is said to have coined the term "New Jersey" style as a synonym for what Richard P. Gabriel called "Worse is Better", which seems, an attack on minimalism (nothing less than "the right thing" is acceptable) Worse is.... readings: https://dreamsongs.com/WorseIsBetter.html https://dreamsongs.com/RiseOfWorseIsBetter.html https://dreamsongs.com/Files/IsWorseReallyBetter.pdf https://dreamsongs.com/Files/worse-is-worse.pdf Anti-flamage disclainmers: Inclusion of links above does not imply any agreement on my part! My apologies in advance for any offense, misquote, or misunderstanding on my part.

9 months

Re: On computerese

by jnc＠mercury.lcs.mit.edu

> From: Rik Farrow <rik(a)rikfarrow.com> > Was the brevity typical of Unix command names a function of the tiny > disk and memory available? Or more a function of having a Teletype 33 > for input? I'm not sure the answer was ever written down (e.g. in a memo); we will probably have to rely on memory - and memories that far back are now fairly thin on the ground by now. Perhaps Mr. McIlroy (or Mr. Thompson, if we're _really_ lucky) will humor us? :-) I have the impression that some of the names are _possibly_ inherited from Multics (which the early Unicians all used before Unix existed) - but maybe not. The command to list a directory, on Multics, is 'ls' (but see below) - but the Multics qcommand to remove a file is 'del' (not 'rm'); and change working directory is 'cwd'. So maybe ls' is just chance? Multics had a 'feature' where a segment (file) could have additional names (to the main name), and this is used to add short aliases to many commands, so the 'base name'' for the directory list command is 'list'; 'ls' is a short alias. A list of Multics commands (with short forms) is available here: https://www.multicians.org/multics-commands.html I'm not sure how early that alias mechanism came in, though; my copy of "Introduction to Multics" (February, 1974) doesn't have short names (or, at least, it doesn't use them). It won't have anything to do with disk and memory. Having used a Teletype, it would take noticeably longer to type in a longer name! It's also more effort and time. I would expect those are the reasons for the short names. Noel

9 months

Re: Fwd: Trove of CSTR's

by Douglas McIlroy

> I wonder what happened to the amazing library at Murray Hill. Last I knew, the Bell Labs archives were intact under supervision of a professional archivist. Formally speaking, the archives and the library were distinct entities. The library, which was open to self service 24 hours a day, declined rapidly after the bean counters decreed that it should henceforth support itself on rental fees. Departments immediately turned to buying books rather than borrowing them. It's very likely that this was bad for the Labs' bottom line, but the cost (both monetary and intellectual) was not visible as a budgetary line item. The 24-hour library contributed to one of Ken's programming feats. Spurred by a lunchtime remark that it would be nice to have a unit-conversion program, Ken announced units(1) the next morning. Right from the start, the program knew more than 200 units, thanks to a book Ken grabbed from the library in the middle of the night. Doug

9 months

Re: Fwd: Trove of CSTR's

by Douglas McIlroy

> That CSTR number 1 is nicely formatted, is that troff? The archive's CSTR 1 is ersatz. It's a 1973 journal article obtained from JSTOR. I imagine the manuscript was largely copied from the CSTR, but the printed paper certainly differs in meta-content and in layout, say nothing of font. Having gone through the usual route of journal submission and revision, the body text is probably not word-for-word identical to the CSTR either. Doug

9 months

Re: Classic FoxTrot cartoon

by norman＠oclsc.org

Clem Cole: Interesting -- 'Jason' had always been a Pascal hacker when the strip was first created. As I recall, Berkeley Breathed had Wendell (his hacker character) comment on that during the time of Pascal/C Wars. ==== But Jason later was revealed to be wearing Unix underpants: https://www.gocomics.com/foxtrot/2002/02/25 Norman Wilson Toronto ON

9 months

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

TUHS September 2024