Hi.
The paper on compressing the dictionary was interesting. In the day
of 20 meg disks, compressing a ~ 2.5 meg file down to ~ .5 meg is
a big savings.
Was the compressed dictionary put into use? I could imaging that
spell(1) at least would have needed some library routines to return
a stream of words from it.
Just wondering. Thanks,
Arnold
I am curious about the apposite Bergson quote (intelligence...tools) found
at the beginning of the Forward mentioned in the subject. Specifically, I am
wondering if the quote was originally discovered in _Creative Evolution_
as a part of a course or through private reading.
I am interested in the diffusion of Continental ideas concerning technology
in English speaking countries during the 20th Century.
Dr Rick Hayes
durtal(a)sdf.org SDF Public
Access UNIX System - https://sdf.org
All, Yufeng Gao has done more amazing work at extracting binaries,
source code and text documents from the DECtapes that Dennis Ritchie
provided for the Unix Archive:
https://www.tuhs.org/Archive/Applications/Dennis_Tapes/
His latest e-mail is below. I've temporarily placed his attachments here:
https://minnie.tuhs.org/wktcloud/index.php/s/aWkck2Ljay6c5sB
He needs some help with formatting old *roff documents. If someone could offer
him help, that would be great. His e-mail address is yufeng.gao AT uq.edu.au
Cheers, Warren
----- Forwarded message from Yufeng Gao -----
Date: Tue, 31 Dec 2024
Subject: RE: UNIX DECtapes from dmr
Hi Warren,
Happy New Year! Here's another update. I found more UNIX bins on
another tape ('ken-sky'). They appear to be between V3 and V4. I have
attached them as "ken_sky_bins.tar". I have also attached an updated
tarball of the V2/V3 bins recovered from the 'e-pi' tape (with a few
names corrected), see "identified_v2v3_bins_r2.tar".
So far, the rough timeline of UNIX binaries (RTM hereinafter refers to
the exact version of the OS described by the preserved manuals) is as
follows:
Sys: V1 RTM <= unix-study-src < s1/s2 < V2 RTM < V3 RTM < nsys < V4 RTM
Bin: V1 RTM < s1/s2 < epi-V2 < epi-V3 < ken-sky-bins < V4 RTM
There is a possibility that the V2 bins from the 'e-pi' tape belong to
V2 RTM, as they're all PDP-11/20 bins with V2 headers. In contrast,
most of the bins from the s1/s2 tapes are V1 bins. Some of them are
identical to those from the 's2' tape, and if the timestamps from the
's2' tape can be trusted, they're from May/June 1972.
The V3 bins from the 'e-pi' tape are most likely from late 1972 or
early 1973, but no later than Feb 1973, as they've been overwritten by
files from Feb 1973. This suggests they're from a V3 beta, supported by
the fact that some features described in the V3 manual are missing. The
files were laid out in perfect alphabetical order on the tape.
The bins from the 'ken-sky' tape fall somewhere between V3 RTM and V4
RTM. The directory structure and other elements match the V3 manual, as
do the syscalls (e.g., the arguments for kill(2) differ between V3 and
V4, and these bins use the V3 arguments). The features, however, are
closer to V4. For example, nm(1) had already been rewritten in C and
matches the V4 manual's description. The assembler also matches the V4
manual in terms of the number of temp files, and the C compiler refers
to the assembler as 'nas.' The assembler is located physically between
files starting with "n" and "o," and the files around it follow a weak
alphabetical order, so it is logical to assume that it was named "nas".
It is a bit difficult to version these binaries, especially without any
timestamps. The lines between versions for early UNIX are blurry, and
modern software versioning terms like "beta" and "RTM" don't really
apply well. If these binaries are to be preserved (which I hope they
will be, even though the kernels are long gone), I'd put the V2 bins
from 'e-pi' under V2, the V3 bins from 'e-pi' under V3, and the bins
from 'ken-sky' under V4 (I'd argue that nsys also falls under V4, as
the biggest change between V3 and V4 was the kernel being rewritten in C).
There are other overwritten files on the tapes, and I will address them
later. There are quite a few patents, papers, and memos in *roff
format, but I'm not entirely sure what to do with them. Among those, I
have picked out some V4 distribution documents and attached them as a
ZIP folder :-). If you know of ways to generate PDFs from these ancient
*roff files accurately, please lend a hand - I'm struggling to get
accurate results from groff.
Sincerely,
Yufeng
----- End forwarded message -----
In the last months, I've spent a little time on curating John Walker's Unix clone and software stack, including an emulator to run it:
https://gitlab.com/marinchip
After creating a basic tool chain (edit, asm, link and a simple executive), John set out to find a compiler. Among the first programs were a port of the META 3 compiler-generator (similar to TMG on early Unix) and a port of Birch-Hansen’s Pascal compiler. META was used to create a compiler that generated threaded code. He found neither compiler good enough for his goals and settled on writing his Unix-like OS in assembler. As the 9900 architecture withered after 1980, this sealed the fate of this OS early on -- had he found a good compiler, the code might have competed alongside Coherent, Idris, and Minix during the 80’s.
This made me realise once more how unique the Ritchie C compiler was. In my view its uniqueness combines three aspects:
1. The C language itself
2. The ability to run natively on small hardware (even an LSI-11 system)
3. Generating code with modest overhead versus handwritten assembler (say 30%)
As has been observed before, working at a higher abstraction level makes it easier to work on algorithms and on refactoring, often earning back the efficiency loss. John Walkers work may be case in point: I estimate that his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as compiled for the 9900 architecture).
There are three papers on DMR’s website about the history of the compiler and a compare-and-contrast with other compilers of the era:
https://www.bell-labs.com/usr/dmr/www/primevalC.htmlhttps://www.bell-labs.com/usr/dmr/www/chist.htmlhttps://www.bell-labs.com/usr/dmr/www/hopl.html
It seems to me that these papers rather understate the importance of generating good quality code. As far as I can tell, BCPL and BLISS came close, but were too large to run on a PDP-11 and only existed as cross-compilers. PL/M was a cross-compiler and generated poorer code. Pascal on small machines compiled to a virtual machine. As far as I can tell, during most of the 70s there was no other compiler that generated good quality code and ran natively on a small (i.e. PDP-11 class) machine.
As far as I can tell the uniqueness was mostly in the “c1” phase of the compiler. The front-end code of the “c0” phase seems to use more or less similar techniques as many contemporary compilers. The “c1” phase seems to have been unique in that it managed to do register allocation and instruction selection with a pattern matcher and associated code tables squeezed into a small address space. On a small machine, other native compilers of the era typically proceeded to generate threaded code, code for a virtual machine or poor quality native code that evaluated expressions using stack operations rather than registers.
I am not sure why DMR's approach was not more widely used in the 1970’s. The algorithms he used do not seem to be new and appear to have their roots in other (larger) compilers of the 1960’s. The basic design seems to have been in place from the very first iterations of his compiler in 1972 (see V2 tree on TUHS) and he does not mention these algorithms as being special or innovative in his later papers.
Any observations / opinions on why DMR’s approach was not more widely used in the 1970’s?
As I mentioned in another post, I'm writing an invited paper for an
upcoming issue of IEEE Transactions on Software Engineering that will be a
50-year retrospective of my original 1975 SCCS paper (
mrochkind.com/aup/talks/SCCS-Slideshow.pdf) Can some people here review a
couple of paragraphs for accuracy?
*Decentralized Version Control (DVCS)*
*While VCSs like CVS and Subversion were centralized and had
pre-commit merging, a further advance was towards decentralization, with
post-commit merging. Probably the first DVCS was Sun WorkShop TeamWare,
created by Larry McVoy and announced in 1992 [sun]. It was implemented as a
layer on top of SCCS. McVoy later commercialized a successor system called
BitKeeper [Bitkeeper], which was layered on a re-implementation of SCCS,
which he called BitSCCS. TeamWare and BitKeeper took advantage of the
interleaved delta algorithm, also known as a weave, to implement an
efficient way to represent merged deltas by reference, instead of
reproducing code inside the repository. This is a lot more complicated to
do with reverse deltas, introduced by RCS.*
*In 2005 Linus Torvalds, creator of Linux [linux], invented the DVCS Git
[git] for Linux development, and since then Git has become widely used and
has supplanted BitKeeper.*
[more about DVCS follows]
I don't want to add more detail that would make these paragraphs any
longer, but I do want them to be accurate. Thanks!
Marc Rochkind
--
*My new email address is mrochkind(a)gmail.com <mrochkind(a)gmail.com>*
Rob Pike:
According to the Unix room fortunes file, the actual quote is
SCCS: the source-code motel -- your code checks in but it never checks out. Ken Thompson
====
As a Unix-room-culture aside: I believe this quote was what
inspired Andrew Hume to call his backup system the File Motel.
Norman Wilson
Toronto ON
>> Does anyone know whether there are implementations of mmap that
>> do transparent file sharing? It seems to me that should be possible by
>> making the buffer cache share pages with mmapping processes.
> These days they all do. The POSIX rationale says:
> ... When multiple processes map the same memory object, they can
> share access to the underlying data.
Notice the weasel word "can". It is not guaranteed that they will do so
automatically without delay. Apparently each process may have a physically
distinct copy of the data, not shared access to a single location.
The Linux man page mmap(2), for example, makes it very clear that mmap
has a cache-coherence problem, at least in that system. The existence
of msync(2) is a visible symptom of the problem.
[Weasel words of my own: I have not read the POSIX definition of mmap.]
Doug
On Mon, 16 Dec 2024, Konstantin Belousov wrote:
> On Mon, Dec 16, 2024 at 02:08:43PM -0500, John Levine wrote:
>> PS: I can believe there are some versions of linux that screwed up disk cache
>> coherency, but that just means they don't properly implement the spec, not for
>> the first time. I mean, it's not *that* hard to make all the maps point to the
>> same physical page frame, even on a machine like POWER with reverse page maps.
>
> This is not enough. There are (were ?) architectures, typically with the
> virtually addressed caches, which require all mappings of the same page
> to be suitably aligned, at least. ...
>
> If addresses of different mappings are not aligned, caches were not coherent.
I think we're in "so don't do that" territory. mmap() normally lets the
system pick the memory address to map so it can pick something suitably
aligned. You can pass the MAP_FIXED flag to tell it to map at a
particular address, but it can return EINVAL if the address doesn't work.
The POSIX description says "The use of MAP_FIXED is discouraged, as it may
prevent an implementation from making the most effective use of
resources."
It's not always trivial to make this work. On systems with reverse maps,
a physical page can only be mapped to one virtual address at a time, so
for shared pages it has to mark all of the aliases nonresident and on a
fault remap the page into the map of the process that is running. But
it's not rocket science, either.
R's,
John
> "John Levine" <johnl(a)taugh.com> wrote:
>> M4 was written in the 1970s by Kernighan and Ritchie in C ...
> In private mail, BWK told me that it was DMR who wrote m4. He
t> hen reimplemented it in Ratfor for "Software Tools".
> Arnold
The book says colorfully, "... [and] we are grateful to him for
letting us steal it."
Doug