Wy 32V R3 was fast - TUHS

25 Jul 2021

...
  TUHS list (Rob Pike, Aug 2019)
 I think it was slightly later. I joined mid-1980 and VAXes to replace the
 11/70 were being discussed but had not arrived. We needed to convert a lab
 into a VAX machine room and decide between BSD and Reiser, all of which
 happened in the second half of 1980.
 Reiser Unix got demand paging a little later, and it was spectacularly
 fast. I remember being gobsmacked when I saw a demo in early 1981.
 Dead ends everywhere. 
I think I have figured out why 32V R3 was so fast (assuming my current understanding of
how 32V R3 must have worked is correct).
Its VM subsystem tags each memory frame with its disk mirror location, be it in swap or in
the file system. A page can quickly be found as they are hashed on device and block no.
This is true both for pages in the working set and pages on the 2nd chance list.
Effectively, most core is a disk cache.
In a unified buffer design, the buffer code would first look for an existing buffer header
for the requested disk block, as in V7. If not found, it would check the page frame list
for that block and if found it would connect the frame to an empty buffer header,
increment its use count and move it to the working set. If not found there either, it
would be loaded from disk as per usual. When a buffer is released, the frame use count
would be decremented and if zero the page frame would be put back on the 2nd chance list
and the buffer header would be marked empty. With this approach, up to 4MB of the disk
could be cached in RAM.
Early in 1981 most binaries and files were a few dozen KB in size. All of the shell,
editor, compiler tool chain, library files, intermediate files, etc. would have fitted in
RAM all at once. In a developer focused demo and once memory was primed, the system would
effectively run from RAM, barely hitting the disk, even with tens of concurrent logins.
Also something like “ls -l /bin” would have been much faster on its second run.
It puts a comment from JFR in a clearer context:
<quote>
Strict LRU on 8,192 pages, plus Copy-On-Write, made the second reference to a page
"blindingly fast".
<unquote>
So far I read this in context of the paging algorithm, and then it is hard to understand
(is LRU really that much better than NRU?). In the context of a unified buffer and disk
pages, it makes a lot more sense. Even the CoW part: as the (clean) data segment of
executables would still be in core, they could start without reloading from disk - CoW
would create copies as needed.
===
The interesting question now is: if this buffer unification was so impressive, why was it
abandoned in SVr2-vax? I can think of 3 reasons:
1. Maybe there was a subtle bug that was hard to diagnose. “Research" opting for the
BSD memory system “as it did not want to do the maintenance” suggests that there may have
been lingering issues.
1b. A variation of this: JFR mentioned that his implementation of unified buffers broke
conceptual layering. USG do not strike me as purists, but maybe they thought the code was
too messy to maintain.
2. Maybe there was an unintended semantic issue (e.g. you can lock a buffer, but not a
mmap ‘ed page).
3. Maybe it was hard to come up with a good sync() policy, making database work risky (and
system crashes more devastating to the file system).
JFR mentioned that he did the design and implementation for 32V R3 in about 3 months, with
3 more months for bug fixing and polishing. That is not a lot of time for such a big and
complex kernel mod (for its time).