[TUHS] On the uniqueness of DMR's C compiler

7 May 2024

In the last months, I've spent a little time on curating John Walker's Unix
clone and software stack, including an emulator to run it:
https://gitlab.com/marinchip
After creating a basic tool chain (edit, asm, link and a simple executive), John  set out
to find a compiler. Among the first programs were a port of the META 3 compiler-generator
(similar to TMG on early Unix) and a port of Birch-Hansen’s Pascal compiler. META was used
to create a compiler that generated threaded code. He found neither compiler good enough
for his goals and settled on writing his Unix-like OS in assembler. As the 9900
architecture withered after 1980, this sealed the fate of this OS early on -- had he found
a good compiler, the code might have competed alongside Coherent, Idris, and Minix during
the 80’s.
This made me realise once more how unique the Ritchie C compiler was. In my view its
uniqueness combines three aspects:
1. The C language itself
2. The ability to run natively on small hardware (even an LSI-11 system)
3. Generating code with modest overhead versus handwritten assembler (say 30%)
As has been observed before, working at a higher abstraction level makes it easier to work
on algorithms and on refactoring, often earning back the efficiency loss. John Walkers
work may be case in point: I estimate that his hand-coded kernel is 10% larger than an
equivalent V6 Unix kernel (as compiled for the 9900 architecture).
There are three papers on DMR’s website about the history of the compiler and a
compare-and-contrast with other compilers of the era:
https://www.bell-labs.com/usr/dmr/www/primevalC.html
https://www.bell-labs.com/usr/dmr/www/chist.html
https://www.bell-labs.com/usr/dmr/www/hopl.html
It seems to me that these papers rather understate the importance of generating good
quality code. As far as I can tell, BCPL and BLISS came close, but were too large to run
on a PDP-11 and only existed as cross-compilers. PL/M was a cross-compiler and generated
poorer code. Pascal on small machines compiled to a virtual machine. As far as I can tell,
during most of the 70s there was no other compiler that generated good quality code and
ran natively on a small (i.e. PDP-11 class) machine.
As far as I can tell the uniqueness was mostly in the “c1” phase of the compiler. The
front-end code of the “c0” phase seems to use more or less similar techniques as many
contemporary compilers. The “c1” phase seems to have been unique in that it managed to do
register allocation and instruction selection with a pattern matcher and associated code
tables squeezed into a small address space. On a small machine, other native compilers of
the era typically proceeded to generate threaded code, code for a virtual machine or poor
quality native code that evaluated expressions using stack operations rather than
registers.
I am not sure why DMR's approach was not more widely used in the 1970’s. The
algorithms he used do not seem to be new and appear to have their roots in other (larger)
compilers of the 1960’s. The basic design seems to have been in place from the very first
iterations of his compiler in 1972 (see V2 tree on TUHS) and he does not mention these
algorithms as being special or innovative in his later papers.
Any observations / opinions on why DMR’s approach was not more widely used in the 1970’s?

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

[TUHS] On the uniqueness of DMR's C compiler