In the last months, I've spent a little time on curating John Walker's Unix clone and software stack, including an emulator to run it:
https://gitlab.com/marinchip
After creating a basic tool chain (edit, asm, link and a simple executive), John set out to find a compiler. Among the first programs were a port of the META 3 compiler-generator (similar to TMG on early Unix) and a port of Birch-Hansen’s Pascal compiler. META was used to create a compiler that generated threaded code. He found neither compiler good enough for his goals and settled on writing his Unix-like OS in assembler. As the 9900 architecture withered after 1980, this sealed the fate of this OS early on -- had he found a good compiler, the code might have competed alongside Coherent, Idris, and Minix during the 80’s.
This made me realise once more how unique the Ritchie C compiler was. In my view its uniqueness combines three aspects:
1. The C language itself
2. The ability to run natively on small hardware (even an LSI-11 system)
3. Generating code with modest overhead versus handwritten assembler (say 30%)
As has been observed before, working at a higher abstraction level makes it easier to work on algorithms and on refactoring, often earning back the efficiency loss. John Walkers work may be case in point: I estimate that his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as compiled for the 9900 architecture).
There are three papers on DMR’s website about the history of the compiler and a compare-and-contrast with other compilers of the era:
https://www.bell-labs.com/usr/dmr/www/primevalC.htmlhttps://www.bell-labs.com/usr/dmr/www/chist.htmlhttps://www.bell-labs.com/usr/dmr/www/hopl.html
It seems to me that these papers rather understate the importance of generating good quality code. As far as I can tell, BCPL and BLISS came close, but were too large to run on a PDP-11 and only existed as cross-compilers. PL/M was a cross-compiler and generated poorer code. Pascal on small machines compiled to a virtual machine. As far as I can tell, during most of the 70s there was no other compiler that generated good quality code and ran natively on a small (i.e. PDP-11 class) machine.
As far as I can tell the uniqueness was mostly in the “c1” phase of the compiler. The front-end code of the “c0” phase seems to use more or less similar techniques as many contemporary compilers. The “c1” phase seems to have been unique in that it managed to do register allocation and instruction selection with a pattern matcher and associated code tables squeezed into a small address space. On a small machine, other native compilers of the era typically proceeded to generate threaded code, code for a virtual machine or poor quality native code that evaluated expressions using stack operations rather than registers.
I am not sure why DMR's approach was not more widely used in the 1970’s. The algorithms he used do not seem to be new and appear to have their roots in other (larger) compilers of the 1960’s. The basic design seems to have been in place from the very first iterations of his compiler in 1972 (see V2 tree on TUHS) and he does not mention these algorithms as being special or innovative in his later papers.
Any observations / opinions on why DMR’s approach was not more widely used in the 1970’s?
On Sun, Oct 20, 2024 at 01:23:23AM -0400, Dan Plassche wrote:
>
> On Sat, 19 Oct 2024, Jonathan Gray wrote:
>
> > PWB was an early external distribution with troff.
> >
> > Documents for the PWB/UNIX Time-Sharing System
> > https://datamuseum.dk/wiki/Bits:30007124
> > https://bitsavers.org/pdf/att/unix/PWB_UNIX/
> >
> > NROFF/TROFF User's Manual
> > October 11, 1976
> > datamuseum.dk, pp 325-357
> > bitsavers, pp 217-249
> >
> > Addendum to the NROFF/TROFF User's Manual
> > May 1977
> > datamuseum.dk, p 358
> > bitsavers, p 250
> >
> > fonts described in:
> > Administrative Advice for PWB/UNIX
> > 23. PHOTOTYPESETTING EQUIPMENT AND SUPPLIES
> > datamuseum.dk, p 647
>
> Thank you Jonathan. I was previously not sure where to place the
> PWB documentation in the timeline but a clearer picture is
> emerging.
>
> Based on the v6 "NROFF User's Manual" revised in 1974 and
> published in 1975, I can now see that the PWB documentation with
> the "NROFF/TROFF User's Manual" from 1976-77 has most of the
> content that later appears in v7. The major change immediately
> beforehand was the rewrite of troff into C.[1] Some clear
> differences are the combination of nroff and troff manpages and
> the addition of troff specific features like the special fonts
> into the user's manual.
>
> [1]. Apparently in 1976:
> https://www.tuhs.org/Archive/Distributions/USDL/unix_program_description-tr…
"It was rewritten in C around 1975"
Kernighan in CSTR 97, A Typesetter-independent TROFF
I've seen references to
"Documents for Use with the Phototypesetter (Version 7)"
which was likely distributed with the licensed phototypesetter tape in 1977.
What may have been the manual distributed with that tape is also close to v7.
https://www.tuhs.org/cgi-bin/utree.pl?file=Interdata732/usr/source/troff/dochttps://www.tuhs.org/Archive/Distributions/Other/Interdata/
tuhs Applications/Spencer_Tapes/unsw3.tar.gz
usr/source/formatters/troff/doc/
>>> malloc(0) isn't undefined behaviour but implementation defined.
>>
>> In modern C there is no difference between those two concepts.
> Can you explain more about your view
There certainly is a difference, but in this case the practical
implications are the same: avoid malloc(0). malloc(0) lies at the high end
of a range of severity of concerns about implementation-definedness. At the
low end are things like the size of ints, which only affects applications
that may confront very large numbers. In the middle is the default
signedness of chars, which generally may be mitigated by explicit type
declarations.
For the size of ints, C offers guardrails like INT_MAX. There is no test to
discern what an error return from malloc(0) means.
Is there any other C construct that implementation-definedness renders
useless?
Doug
Hi All.
For anyone who's interested, my QED archive at
https://github.com/arnoldrobbins/qed-archive has been updated. Changes
were provided by Sean Jensen.
The usenix-80-caltech subdirectory is now more complete and the README.md
points at Sean's updated QED port which now works with Unicode.
I thank him.
Arnold
So with all that has happened with the Internet Archive lately, I
do find myself a bit concerned regarding the UNIX materials that
I know to only exist there. Selfishly, this includes my own
uploads here: https://archive.org/details/@segaloco
I was curious if anyone has any suggestions on places beyond just
IA and TUHS where I could see about getting this stuff mirrored?
Unfortunately my stuff runs afoul of bitsavers's DPI requirements,
that's the only other source that immediately comes
to mind where these materials would find home. Any thoughts?
Warren, I know you had mentioned a "write only" archive you
maintain regarding materials that need to be mothballed until legal
understandings are reached, would you be comfortable with my
contributing any of my materials the Caldera license does not apply
to there?
- Matt G.
Hi,
A scan of the printed UNIX Version 6 documents set is now online
at the link below since last week. The set consists of documents
accompanying the manual pages in the programmer's manual (similar
to volume 2 in v7).
https://www.computerhistory.org/collections/catalog/102659317
The [nt]roff user manual, tmg compiler-compiler, and m6 macro
processor memos were previously missing from the distributions
in TUHS and later efforts to re-create the documentation.
I have been working on finding this documentation as part of
researching roff history. Still interested in earlier copies of
the internal memoranda from Ossanna that served as the NROFF
User's Manual since v3, the TROFF User's Manual after v5, and
TROFF Made Trivial starting around v4. Based on the manpage
histories, the documentation was revised for v4, 5, and 6.
Best,
Dan Plassche
> Who created the "cat" command and did they have the
> word "catenate" or "concatenate" in their heads?
Ken Thompson wrote "cat" for the PDP-7, with "concatenate" in
mind. The cat(1) page in the v1 manual is titled, "concatenate (or
print) files". Only later did someone in Research--I don't know
who--remark on the existence of the shorter synonym. It was
deliberately adopted in v7, perhaps because it better mirrored
the command name.
But brevity is the defensible argument for "catenate", while
familiarity boosts "concatenate". It stll takes some conscious
effort for me to use the former, However, I sense sinister
vibes in "concatenate", driven by the phrase "concatenation
of events", which often is used to explain misfortune.
Doug
I always forget that TUHS can't handle pictures. Perhaps Warren will let my
post through, but in any case here's a link to the mail, reformatted but
otherwise intact, with photos, on Mastodon.
https://hachyderm.io/@robpike/113322062117546253
To pique your interest, here's the first paragraph.
*In August 1981 we had a persistent problem with the RP06 on our PDP-11/70
crashing disks. It even crashed once while the DEC repairman was standing
next to it trying to figure out why the previous pack had died. We
collected a few dead packs, and they were forming a pile. Lillian, never
one to miss an opportunity, suggested building a mobile.*
-rob
Hello, all.
In 2002, Caldera released Ancient Unix code under Caldera
license:
<https://www.tuhs.org/Archive/Caldera-license.pdf>
based on the four-clause BSD license:
<https://spdx.org/licenses/BSD-4-Clause.html>
Consequently, it was used by derived projects, such as
Traditional Vi:
<https://ex-vi.sourceforge.net/>
This proect having been abandoned and orphaned since 2005, I
wanted to host it on GNU Savanna and there to breath some
life into it. Unfortunately, the 4-clause BSD license is
incompatible with GPL:
<https://www.gnu.org/licenses/license-list.html#OriginalBSD>
The incompatibilty is due entirely to the infamous third
clause about adverising. Three years prior to Caldera's
release of old Unix code, The Berkley Univercity removed
this clause, producing the GNU-compatible modified BSD
License:
<https://opensource.org/license/BSD-3-clause>
They published a notice to that effect on their FTP:
<ftp://ftp.cs.berkeley.edu/pub/4bsd/README.Impt.License.Change>
Although it has been taken down[1], copies exist all over
the internet, e.g.:
<https://raw.githubusercontent.com/abbrev/punix/refs/heads/master/README.Imp…>
That said, is there a chance that the copyright holder of
Ancient Will agree to release a similar note regarding
everying released under Caldera license? If there is, whom
shall I contact about it? It will benefit everybody using
Ancient Unix code.
____________________
1. Why the murrain of FTP servers all over the world?
Hi folks,
A few months ago I reported on my efforts to reconstruct London &
Reiser's paper on their port of Unix to the VAX-11/780.[1]
I formerly characterized this as "the UNIX/32V port", but since London
& Reiser's paper predates the release of Seventh Edition Unix by about
six months, UNIX/32V came _after_ Seventh Edition by about the same
number of months, and the pace of Unix development was particularly
ferocious in this period[2], I felt that my identification of London &
Reiser's work with UNIX/32V may have been hasty. I also may have
overinterpreted Dennis Ritchie's words on the subject.
"Tom London and John Reiser, working from the 7th Edition and the
Interdata 8/32 system, generated a VAX 11/780 version of the system,
which, in its distribution format, would be called 32V."
That phrase "in its distribution format" could cover a variety of
changes, some of which perhaps did not match London & Reiser's
intentions or views expressed in their paper. More conservative
implications seemed prudent.
I'd thus like to present what I consider to be my "final" draft, subject
of course to feedback from these mailing lists. I'm pleased to report
that I've addressed all of the XXX points I identified in the source of
my first draft, points where I felt groff mm could be enhanced to aid
the rendering of historical documents like this. I consequently expect
groff 1.24's mm package to support several new features prompted
specifically by this work. Quoting the forthcoming NEWS file...
* The m (mm) macro package now supports a user-definable hook macro
`AFX`, which if defined is called by `AF` in lieu of the latter's
normal operation. Applications include customization of letterhead.
* The m (mm) macro package now supports a user-definable hook macro
`RPX`, which if defined is called by `RP` to format the reference
list caption string `Rp` instead of the default formatting.
* The m (mm) macro package now supports an `Aumt` string to suppress
the appearance of positional arguments to the `AU` macro in the
document heading used by memorandum types 0-3 and 6. By default, all
such arguments appear, except the second (author initials). For
example, a value of "3 4" more accurately reproduces London &
Reiser's 1978 paper describing the porting of Unix to the VAX-11/780.
* The m (mm) macro package now supports an `Rpfmt` string specifying
the `LB` macro arguments that the package uses to format the items in
a reference list.
* The m (mm) macro package no longer superscripts _and_ brackets a
reference mark (the `Rf` string). Instead, the new `Rfstyle`
register controls its formatting. The default, 0, selects bracketing
in nroff mode and superscripting in troff mode. Set `Rfstyle` to 3
in a document to obtain groff mm's previous mark formatting behavior.
[I might still update or revert the changed default; I want to
research the behavior of historical mm implementations.]
The "32vscan.pdf" document from which I prepared this reconstruction is
available at Dennis Ritchie's memorial home page.[3] I have attached
the reconstructed mm source document and two PDFs, rendered with groff
1.23.0 (the current stable release), and groff Git HEAD (exercising the
new features listed above).[4]
I offer the caveat that these cannot be pixel-perfect recreations
because (1) I have no information about the precise paper dimensions or
margins London & Reiser used[5]; (2) the fonts employed in rendering the
documents are not identical, metrically or otherwise; and (3) AT&T and
GNU troffs use different hyphenation systems and therefore sometimes
break words differently. These factors all impact the placement of line
and page breaks, and these are avowedly and clearly distinguishable.
There are furthermore a few discrepancies that I decided weren't worth
the trouble at this time to reconcile, like selective encroachment of
cover sheet material beyond the page margins. None affect the utility
of the document (in my opinion).
With that large disclaimer in place, I welcome feedback on the quality
of the reproduction.
Finally, I reiterate my encouragement that the document be _read_. In
my opinion, the final two sections "Commands" and "Software portability"
are well worth consideration in hindsight. To the extent that we
continue to boast, sometimes glibly, of C as a "portable assembly
language", be it in its current ISO C23 incarnation; as ANSI C89, the
last revision blessed by Ritchie; or in the form used when London and
Reiser wrote, their experiences and recommendations laid out a program
of better delivering on that promise.
Regards,
Branden
[1] https://www.tuhs.org/pipermail/tuhs/2024-June/030041.html
[2] 1980, for example, saw releases of 3BSD, System III, PWB/UNIX 2.0,
and 4BSD.
[3] https://www.bell-labs.com/usr/dmr/www/portpapers.html
[4] You may notice a difference in the sizes of the two PDFs, surprising
in light of their shared source document. This is thanks to a new
feature forthcoming in Deri James's gropdf(1) output driver: font
subsetting.
[5] ...or where, if anywhere, the authors "cheated" the margins
temporarily, for instance with `ll` or `pl` requests. Even with mm
macro package sources available, such things would be invisible to
the reconstructor.