Hi all, I've just received a set of MP3 recordings from Bob Kridle. He says:
These are recordings of Ken Thompson doing a read through of one of
an early UNIX kernel code listing with a group of grad students at
UC Berkeley while he was a visiting prof. there.
The date is roughly 1975. I've put the recordings here along with his
e-mails about the recordings:
https://www.tuhs.org/Archive/Recordings/1975_Unix_Code_Walkthru/
I've only just listened to the first few minutes of each. The quality
is fine, but I might spend some time reducing the noise, bringing up
the quiet parts and removing a few clicks and pops.
If anybody else has more details of these recording, please let us know!
Cheers, Warren
Someone I know is seeking the original version of an internal Bell Labs
memo from 1974 titled "Webster's Second on the Head of a Pin" by Morris and
Thompson. The topic appears to be related to improving the speed of lookups
or search. It's cited in a few papers as "Unpublished Technical Memo, Bell
Laboratories, Murray Hill, NJ 1974." All I can find online is citations.
Any leads appreciated!
--
Royce
All, Yufeng Gao has done more amazing work at extracting binaries,
source code and text documents from the DECtapes that Dennis Ritchie
provided for the Unix Archive:
https://www.tuhs.org/Archive/Applications/Dennis_Tapes/
His latest e-mail is below. I've temporarily placed his attachments here:
https://minnie.tuhs.org/wktcloud/index.php/s/aWkck2Ljay6c5sB
He needs some help with formatting old *roff documents. If someone could offer
him help, that would be great. His e-mail address is yufeng.gao AT uq.edu.au
Cheers, Warren
----- Forwarded message from Yufeng Gao -----
Date: Tue, 31 Dec 2024
Subject: RE: UNIX DECtapes from dmr
Hi Warren,
Happy New Year! Here's another update. I found more UNIX bins on
another tape ('ken-sky'). They appear to be between V3 and V4. I have
attached them as "ken_sky_bins.tar". I have also attached an updated
tarball of the V2/V3 bins recovered from the 'e-pi' tape (with a few
names corrected), see "identified_v2v3_bins_r2.tar".
So far, the rough timeline of UNIX binaries (RTM hereinafter refers to
the exact version of the OS described by the preserved manuals) is as
follows:
Sys: V1 RTM <= unix-study-src < s1/s2 < V2 RTM < V3 RTM < nsys < V4 RTM
Bin: V1 RTM < s1/s2 < epi-V2 < epi-V3 < ken-sky-bins < V4 RTM
There is a possibility that the V2 bins from the 'e-pi' tape belong to
V2 RTM, as they're all PDP-11/20 bins with V2 headers. In contrast,
most of the bins from the s1/s2 tapes are V1 bins. Some of them are
identical to those from the 's2' tape, and if the timestamps from the
's2' tape can be trusted, they're from May/June 1972.
The V3 bins from the 'e-pi' tape are most likely from late 1972 or
early 1973, but no later than Feb 1973, as they've been overwritten by
files from Feb 1973. This suggests they're from a V3 beta, supported by
the fact that some features described in the V3 manual are missing. The
files were laid out in perfect alphabetical order on the tape.
The bins from the 'ken-sky' tape fall somewhere between V3 RTM and V4
RTM. The directory structure and other elements match the V3 manual, as
do the syscalls (e.g., the arguments for kill(2) differ between V3 and
V4, and these bins use the V3 arguments). The features, however, are
closer to V4. For example, nm(1) had already been rewritten in C and
matches the V4 manual's description. The assembler also matches the V4
manual in terms of the number of temp files, and the C compiler refers
to the assembler as 'nas.' The assembler is located physically between
files starting with "n" and "o," and the files around it follow a weak
alphabetical order, so it is logical to assume that it was named "nas".
It is a bit difficult to version these binaries, especially without any
timestamps. The lines between versions for early UNIX are blurry, and
modern software versioning terms like "beta" and "RTM" don't really
apply well. If these binaries are to be preserved (which I hope they
will be, even though the kernels are long gone), I'd put the V2 bins
from 'e-pi' under V2, the V3 bins from 'e-pi' under V3, and the bins
from 'ken-sky' under V4 (I'd argue that nsys also falls under V4, as
the biggest change between V3 and V4 was the kernel being rewritten in C).
There are other overwritten files on the tapes, and I will address them
later. There are quite a few patents, papers, and memos in *roff
format, but I'm not entirely sure what to do with them. Among those, I
have picked out some V4 distribution documents and attached them as a
ZIP folder :-). If you know of ways to generate PDFs from these ancient
*roff files accurately, please lend a hand - I'm struggling to get
accurate results from groff.
Sincerely,
Yufeng
----- End forwarded message -----
In the last months, I've spent a little time on curating John Walker's Unix clone and software stack, including an emulator to run it:
https://gitlab.com/marinchip
After creating a basic tool chain (edit, asm, link and a simple executive), John set out to find a compiler. Among the first programs were a port of the META 3 compiler-generator (similar to TMG on early Unix) and a port of Birch-Hansen’s Pascal compiler. META was used to create a compiler that generated threaded code. He found neither compiler good enough for his goals and settled on writing his Unix-like OS in assembler. As the 9900 architecture withered after 1980, this sealed the fate of this OS early on -- had he found a good compiler, the code might have competed alongside Coherent, Idris, and Minix during the 80’s.
This made me realise once more how unique the Ritchie C compiler was. In my view its uniqueness combines three aspects:
1. The C language itself
2. The ability to run natively on small hardware (even an LSI-11 system)
3. Generating code with modest overhead versus handwritten assembler (say 30%)
As has been observed before, working at a higher abstraction level makes it easier to work on algorithms and on refactoring, often earning back the efficiency loss. John Walkers work may be case in point: I estimate that his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as compiled for the 9900 architecture).
There are three papers on DMR’s website about the history of the compiler and a compare-and-contrast with other compilers of the era:
https://www.bell-labs.com/usr/dmr/www/primevalC.htmlhttps://www.bell-labs.com/usr/dmr/www/chist.htmlhttps://www.bell-labs.com/usr/dmr/www/hopl.html
It seems to me that these papers rather understate the importance of generating good quality code. As far as I can tell, BCPL and BLISS came close, but were too large to run on a PDP-11 and only existed as cross-compilers. PL/M was a cross-compiler and generated poorer code. Pascal on small machines compiled to a virtual machine. As far as I can tell, during most of the 70s there was no other compiler that generated good quality code and ran natively on a small (i.e. PDP-11 class) machine.
As far as I can tell the uniqueness was mostly in the “c1” phase of the compiler. The front-end code of the “c0” phase seems to use more or less similar techniques as many contemporary compilers. The “c1” phase seems to have been unique in that it managed to do register allocation and instruction selection with a pattern matcher and associated code tables squeezed into a small address space. On a small machine, other native compilers of the era typically proceeded to generate threaded code, code for a virtual machine or poor quality native code that evaluated expressions using stack operations rather than registers.
I am not sure why DMR's approach was not more widely used in the 1970’s. The algorithms he used do not seem to be new and appear to have their roots in other (larger) compilers of the 1960’s. The basic design seems to have been in place from the very first iterations of his compiler in 1972 (see V2 tree on TUHS) and he does not mention these algorithms as being special or innovative in his later papers.
Any observations / opinions on why DMR’s approach was not more widely used in the 1970’s?
As I mentioned in another post, I'm writing an invited paper for an
upcoming issue of IEEE Transactions on Software Engineering that will be a
50-year retrospective of my original 1975 SCCS paper (
mrochkind.com/aup/talks/SCCS-Slideshow.pdf) Can some people here review a
couple of paragraphs for accuracy?
*Decentralized Version Control (DVCS)*
*While VCSs like CVS and Subversion were centralized and had
pre-commit merging, a further advance was towards decentralization, with
post-commit merging. Probably the first DVCS was Sun WorkShop TeamWare,
created by Larry McVoy and announced in 1992 [sun]. It was implemented as a
layer on top of SCCS. McVoy later commercialized a successor system called
BitKeeper [Bitkeeper], which was layered on a re-implementation of SCCS,
which he called BitSCCS. TeamWare and BitKeeper took advantage of the
interleaved delta algorithm, also known as a weave, to implement an
efficient way to represent merged deltas by reference, instead of
reproducing code inside the repository. This is a lot more complicated to
do with reverse deltas, introduced by RCS.*
*In 2005 Linus Torvalds, creator of Linux [linux], invented the DVCS Git
[git] for Linux development, and since then Git has become widely used and
has supplanted BitKeeper.*
[more about DVCS follows]
I don't want to add more detail that would make these paragraphs any
longer, but I do want them to be accurate. Thanks!
Marc Rochkind
--
*My new email address is mrochkind(a)gmail.com <mrochkind(a)gmail.com>*
Rob Pike:
According to the Unix room fortunes file, the actual quote is
SCCS: the source-code motel -- your code checks in but it never checks out. Ken Thompson
====
As a Unix-room-culture aside: I believe this quote was what
inspired Andrew Hume to call his backup system the File Motel.
Norman Wilson
Toronto ON
>> Does anyone know whether there are implementations of mmap that
>> do transparent file sharing? It seems to me that should be possible by
>> making the buffer cache share pages with mmapping processes.
> These days they all do. The POSIX rationale says:
> ... When multiple processes map the same memory object, they can
> share access to the underlying data.
Notice the weasel word "can". It is not guaranteed that they will do so
automatically without delay. Apparently each process may have a physically
distinct copy of the data, not shared access to a single location.
The Linux man page mmap(2), for example, makes it very clear that mmap
has a cache-coherence problem, at least in that system. The existence
of msync(2) is a visible symptom of the problem.
[Weasel words of my own: I have not read the POSIX definition of mmap.]
Doug
On Mon, 16 Dec 2024, Konstantin Belousov wrote:
> On Mon, Dec 16, 2024 at 02:08:43PM -0500, John Levine wrote:
>> PS: I can believe there are some versions of linux that screwed up disk cache
>> coherency, but that just means they don't properly implement the spec, not for
>> the first time. I mean, it's not *that* hard to make all the maps point to the
>> same physical page frame, even on a machine like POWER with reverse page maps.
>
> This is not enough. There are (were ?) architectures, typically with the
> virtually addressed caches, which require all mappings of the same page
> to be suitably aligned, at least. ...
>
> If addresses of different mappings are not aligned, caches were not coherent.
I think we're in "so don't do that" territory. mmap() normally lets the
system pick the memory address to map so it can pick something suitably
aligned. You can pass the MAP_FIXED flag to tell it to map at a
particular address, but it can return EINVAL if the address doesn't work.
The POSIX description says "The use of MAP_FIXED is discouraged, as it may
prevent an implementation from making the most effective use of
resources."
It's not always trivial to make this work. On systems with reverse maps,
a physical page can only be mapped to one virtual address at a time, so
for shared pages it has to mark all of the aliases nonresident and on a
fault remap the page into the map of the process that is running. But
it's not rocket science, either.
R's,
John
> "John Levine" <johnl(a)taugh.com> wrote:
>> M4 was written in the 1970s by Kernighan and Ritchie in C ...
> In private mail, BWK told me that it was DMR who wrote m4. He
t> hen reimplemented it in Ratfor for "Software Tools".
> Arnold
The book says colorfully, "... [and] we are grateful to him for
letting us steal it."
Doug
> well after Unix had fledged, its developers at CSRC found it necessary
> and/or desirable to borrow back a Multics concept: they named it mmap().
As far as I know no Research version of Unix ever had mmap.
Multics had a segmented universal memory. A process incorporated
segments into its address space The universal memory was normally
addressed via a hierachical segment-name directory. With enhancement
to provide for multisegment "files", the directory could serve as a file
system and file I/O became data transfer between segments.
Unix originally imitated the Multics file system, but not the universal
memory. mmap(2) weakly imitates universal memory by allowing a process
to nominally incorporate a portion of a file into the process address space
at page-level granularity. However, an update is guaranteed to be visible
to the file and other processes only upon specific request.
Does anyone know whether there are implementations of mmap that
do transparent file sharing? It seems to me that should be possible by
making the buffer cache share pages with mmapping processes.
Doug