TUHS December 2024

tuhs@tuhs.org

54 participants
30 discussions

by Warren Toomey

Hi all, I've just received a set of MP3 recordings from Bob Kridle. He says: These are recordings of Ken Thompson doing a read through of one of an early UNIX kernel code listing with a group of grad students at UC Berkeley while he was a visiting prof. there. The date is roughly 1975. I've put the recordings here along with his e-mails about the recordings: https://www.tuhs.org/Archive/Recordings/1975_Unix_Code_Walkthru/ I've only just listened to the first few minutes of each. The quality is fine, but I might spend some time reducing the noise, bringing up the quiet parts and removing a few clicks and pops. If anybody else has more details of these recording, please let us know! Cheers, Warren

4 months, 3 weeks

"Webster's Second on the Head of a Pin"?

by Royce Williams

Someone I know is seeking the original version of an internal Bell Labs memo from 1974 titled "Webster's Second on the Head of a Pin" by Morris and Thompson. The topic appears to be related to improving the speed of lookups or search. It's cited in a few papers as "Unpublished Technical Memo, Bell Laboratories, Murray Hill, NJ 1974." All I can find online is citations. Any leads appreciated! -- Royce

4 months, 4 weeks

Fwd: dmr DECtape Analysis, part N

by Warren Toomey

All, Yufeng Gao has done more amazing work at extracting binaries, source code and text documents from the DECtapes that Dennis Ritchie provided for the Unix Archive: https://www.tuhs.org/Archive/Applications/Dennis_Tapes/ His latest e-mail is below. I've temporarily placed his attachments here: https://minnie.tuhs.org/wktcloud/index.php/s/aWkck2Ljay6c5sB He needs some help with formatting old *roff documents. If someone could offer him help, that would be great. His e-mail address is yufeng.gao AT uq.edu.au Cheers, Warren ----- Forwarded message from Yufeng Gao ----- Date: Tue, 31 Dec 2024 Subject: RE: UNIX DECtapes from dmr Hi Warren, Happy New Year! Here's another update. I found more UNIX bins on another tape ('ken-sky'). They appear to be between V3 and V4. I have attached them as "ken_sky_bins.tar". I have also attached an updated tarball of the V2/V3 bins recovered from the 'e-pi' tape (with a few names corrected), see "identified_v2v3_bins_r2.tar". So far, the rough timeline of UNIX binaries (RTM hereinafter refers to the exact version of the OS described by the preserved manuals) is as follows: Sys: V1 RTM <= unix-study-src < s1/s2 < V2 RTM < V3 RTM < nsys < V4 RTM Bin: V1 RTM < s1/s2 < epi-V2 < epi-V3 < ken-sky-bins < V4 RTM There is a possibility that the V2 bins from the 'e-pi' tape belong to V2 RTM, as they're all PDP-11/20 bins with V2 headers. In contrast, most of the bins from the s1/s2 tapes are V1 bins. Some of them are identical to those from the 's2' tape, and if the timestamps from the 's2' tape can be trusted, they're from May/June 1972. The V3 bins from the 'e-pi' tape are most likely from late 1972 or early 1973, but no later than Feb 1973, as they've been overwritten by files from Feb 1973. This suggests they're from a V3 beta, supported by the fact that some features described in the V3 manual are missing. The files were laid out in perfect alphabetical order on the tape. The bins from the 'ken-sky' tape fall somewhere between V3 RTM and V4 RTM. The directory structure and other elements match the V3 manual, as do the syscalls (e.g., the arguments for kill(2) differ between V3 and V4, and these bins use the V3 arguments). The features, however, are closer to V4. For example, nm(1) had already been rewritten in C and matches the V4 manual's description. The assembler also matches the V4 manual in terms of the number of temp files, and the C compiler refers to the assembler as 'nas.' The assembler is located physically between files starting with "n" and "o," and the files around it follow a weak alphabetical order, so it is logical to assume that it was named "nas". It is a bit difficult to version these binaries, especially without any timestamps. The lines between versions for early UNIX are blurry, and modern software versioning terms like "beta" and "RTM" don't really apply well. If these binaries are to be preserved (which I hope they will be, even though the kernels are long gone), I'd put the V2 bins from 'e-pi' under V2, the V3 bins from 'e-pi' under V3, and the bins from 'ken-sky' under V4 (I'd argue that nsys also falls under V4, as the biggest change between V3 and V4 was the kernel being rewritten in C). There are other overwritten files on the tapes, and I will address them later. There are quite a few patents, papers, and memos in *roff format, but I'm not entirely sure what to do with them. Among those, I have picked out some V4 distribution documents and attached them as a ZIP folder :-). If you know of ways to generate PDFs from these ancient *roff files accurately, please lend a hand - I'm struggling to get accurate results from groff. Sincerely, Yufeng ----- End forwarded message -----

7 months, 1 week

On the uniqueness of DMR's C compiler

by Paul Ruizendaal

In the last months, I've spent a little time on curating John Walker's Unix clone and software stack, including an emulator to run it: https://gitlab.com/marinchip After creating a basic tool chain (edit, asm, link and a simple executive), John set out to find a compiler. Among the first programs were a port of the META 3 compiler-generator (similar to TMG on early Unix) and a port of Birch-Hansen’s Pascal compiler. META was used to create a compiler that generated threaded code. He found neither compiler good enough for his goals and settled on writing his Unix-like OS in assembler. As the 9900 architecture withered after 1980, this sealed the fate of this OS early on -- had he found a good compiler, the code might have competed alongside Coherent, Idris, and Minix during the 80’s. This made me realise once more how unique the Ritchie C compiler was. In my view its uniqueness combines three aspects: 1. The C language itself 2. The ability to run natively on small hardware (even an LSI-11 system) 3. Generating code with modest overhead versus handwritten assembler (say 30%) As has been observed before, working at a higher abstraction level makes it easier to work on algorithms and on refactoring, often earning back the efficiency loss. John Walkers work may be case in point: I estimate that his hand-coded kernel is 10% larger than an equivalent V6 Unix kernel (as compiled for the 9900 architecture). There are three papers on DMR’s website about the history of the compiler and a compare-and-contrast with other compilers of the era: https://www.bell-labs.com/usr/dmr/www/primevalC.html https://www.bell-labs.com/usr/dmr/www/chist.html https://www.bell-labs.com/usr/dmr/www/hopl.html It seems to me that these papers rather understate the importance of generating good quality code. As far as I can tell, BCPL and BLISS came close, but were too large to run on a PDP-11 and only existed as cross-compilers. PL/M was a cross-compiler and generated poorer code. Pascal on small machines compiled to a virtual machine. As far as I can tell, during most of the 70s there was no other compiler that generated good quality code and ran natively on a small (i.e. PDP-11 class) machine. As far as I can tell the uniqueness was mostly in the “c1” phase of the compiler. The front-end code of the “c0” phase seems to use more or less similar techniques as many contemporary compilers. The “c1” phase seems to have been unique in that it managed to do register allocation and instruction selection with a pattern matcher and associated code tables squeezed into a small address space. On a small machine, other native compilers of the era typically proceeded to generate threaded code, code for a virtual machine or poor quality native code that evaluated expressions using stack operations rather than registers. I am not sure why DMR's approach was not more widely used in the 1970’s. The algorithms he used do not seem to be new and appear to have their roots in other (larger) compilers of the 1960’s. The basic design seems to have been in place from the very first iterations of his compiler in 1972 (see V2 tree on TUHS) and he does not mention these algorithms as being special or innovative in his later papers. Any observations / opinions on why DMR’s approach was not more widely used in the 1970’s?

7 months, 1 week

SCCS, TeamWare, BitKeeper, and Git

by Marc Rochkind

As I mentioned in another post, I'm writing an invited paper for an upcoming issue of IEEE Transactions on Software Engineering that will be a 50-year retrospective of my original 1975 SCCS paper ( mrochkind.com/aup/talks/SCCS-Slideshow.pdf) Can some people here review a couple of paragraphs for accuracy? *Decentralized Version Control (DVCS)* *While VCSs like CVS and Subversion were centralized and had pre-commit merging, a further advance was towards decentralization, with post-commit merging. Probably the first DVCS was Sun WorkShop TeamWare, created by Larry McVoy and announced in 1992 [sun]. It was implemented as a layer on top of SCCS. McVoy later commercialized a successor system called BitKeeper [Bitkeeper], which was layered on a re-implementation of SCCS, which he called BitSCCS. TeamWare and BitKeeper took advantage of the interleaved delta algorithm, also known as a weave, to implement an efficient way to represent merged deltas by reference, instead of reproducing code inside the repository. This is a lot more complicated to do with reverse deltas, introduced by RCS.* *In 2005 Linus Torvalds, creator of Linux [linux], invented the DVCS Git [git] for Linux development, and since then Git has become widely used and has supplanted BitKeeper.* [more about DVCS follows] I don't want to add more detail that would make these paragraphs any longer, but I do want them to be accurate. Thanks! Marc Rochkind -- *My new email address is mrochkind(a)gmail.com <mrochkind(a)gmail.com>*

7 months, 3 weeks

Re: SCCS roach motel

by norman＠oclsc.org

Rob Pike: According to the Unix room fortunes file, the actual quote is SCCS: the source-code motel -- your code checks in but it never checks out. Ken Thompson ==== As a Unix-room-culture aside: I believe this quote was what inspired Andrew Hume to call his backup system the File Motel. Norman Wilson Toronto ON

7 months, 3 weeks

Re: mmap, was SCCS, TeamWare, BitKeeper, and Git

by Douglas McIlroy

>> Does anyone know whether there are implementations of mmap that >> do transparent file sharing? It seems to me that should be possible by >> making the buffer cache share pages with mmapping processes. > These days they all do. The POSIX rationale says: > ... When multiple processes map the same memory object, they can > share access to the underlying data. Notice the weasel word "can". It is not guaranteed that they will do so automatically without delay. Apparently each process may have a physically distinct copy of the data, not shared access to a single location. The Linux man page mmap(2), for example, makes it very clear that mmap has a cache-coherence problem, at least in that system. The existence of msync(2) is a visible symptom of the problem. [Weasel words of my own: I have not read the POSIX definition of mmap.] Doug

7 months, 3 weeks

Re: mmap, was SCCS, TeamWare, BitKeeper, and Git

by John R Levine

On Mon, 16 Dec 2024, Konstantin Belousov wrote: > On Mon, Dec 16, 2024 at 02:08:43PM -0500, John Levine wrote: >> PS: I can believe there are some versions of linux that screwed up disk cache >> coherency, but that just means they don't properly implement the spec, not for >> the first time. I mean, it's not *that* hard to make all the maps point to the >> same physical page frame, even on a machine like POWER with reverse page maps. > > This is not enough. There are (were ?) architectures, typically with the > virtually addressed caches, which require all mappings of the same page > to be suitably aligned, at least. ... > > If addresses of different mappings are not aligned, caches were not coherent. I think we're in "so don't do that" territory. mmap() normally lets the system pick the memory address to map so it can pick something suitably aligned. You can pass the MAP_FIXED flag to tell it to map at a particular address, but it can return EINVAL if the address doesn't work. The POSIX description says "The use of MAP_FIXED is discouraged, as it may prevent an implementation from making the most effective use of resources." It's not always trivial to make this work. On systems with reverse maps, a physical page can only be mapped to one virtual address at a time, so for shared pages it has to mark all of the aliases nonresident and on a fault remap the page into the map of the process that is running. But it's not rocket science, either. R's, John

7 months, 3 weeks

Re: M<some number> macros, wasRe: SCCS

by Douglas McIlroy

> "John Levine" <johnl(a)taugh.com> wrote: >> M4 was written in the 1970s by Kernighan and Ritchie in C ... > In private mail, BWK told me that it was DMR who wrote m4. He t> hen reimplemented it in Ratfor for "Software Tools". > Arnold The book says colorfully, "... [and] we are grateful to him for letting us steal it." Doug

7 months, 3 weeks

Re: SCCS, TeamWare, BitKeeper, and Git

by Douglas McIlroy

> well after Unix had fledged, its developers at CSRC found it necessary > and/or desirable to borrow back a Multics concept: they named it mmap(). As far as I know no Research version of Unix ever had mmap. Multics had a segmented universal memory. A process incorporated segments into its address space The universal memory was normally addressed via a hierachical segment-name directory. With enhancement to provide for multisegment "files", the directory could serve as a file system and file I/O became data transfer between segments. Unix originally imitated the Multics file system, but not the universal memory. mmap(2) weakly imitates universal memory by allowing a process to nominally incorporate a portion of a file into the process address space at page-level granularity. However, an update is guaranteed to be visible to the file and other processes only upon specific request. Does anyone know whether there are implementations of mmap that do transparent file sharing? It seems to me that should be possible by making the buffer cache share pages with mmapping processes. Doug

7 months, 3 weeks

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

TUHS December 2024