I think I have figured out why 32V R3 was so fast (assuming my current understanding of how 32V R3 must have worked is correct).

Its VM subsystem tags each memory frame with its disk mirror location, be it in swap or in the file system. A page can quickly be found as they are hashed on device and block no. This is true both for pages in the working set and pages on the 2nd chance list. Effectively, most core is a disk cache.

In a unified buffer design, the buffer code would first look for an existing buffer header for the requested disk block, as in V7. If not found, it would check the page frame list for that block and if found it would connect the frame to an empty buffer header, increment its use count and move it to the working set. If not found there either, it would be loaded from disk as per usual. When a buffer is released, the frame use count would be decremented and if zero the page frame would be put back on the 2nd chance list and the buffer header would be marked empty. With this approach, up to 4MB of the disk could be cached in RAM.

Early in 1981 most binaries and files were a few dozen KB in size. All of the shell, editor, compiler tool chain, library files, intermediate files, etc. would have fitted in RAM all at once. In a developer focused demo and once memory was primed, the system would effectively run from RAM, barely hitting the disk, even with tens of concurrent logins. Also something like “ls -l /bin” would have been much faster on its second run.

So far I read this in context of the paging algorithm, and then it is hard to understand (is LRU really that much better than NRU?). In the context of a unified buffer and disk pages, it makes a lot more sense. Even the CoW part: as the (clean) data segment of executables would still be in core, they could start without reloading from disk - CoW would create copies as needed.

The interesting question now is: if this buffer unification was so impressive, why was it abandoned in SVr2-vax? I can think of 3 reasons:

1. Maybe there was a subtle bug that was hard to diagnose. “Research" opting for the BSD memory system “as it did not want to do the maintenance” suggests that there may have been lingering issues.

1b. A variation of this: JFR mentioned that his implementation of unified buffers broke conceptual layering. USG do not strike me as purists, but maybe they thought the code was too messy to maintain.

3. Maybe it was hard to come up with a good sync() policy, making database work risky (and system crashes more devastating to the file system).

JFR mentioned that he did the design and implementation for 32V R3 in about 3 months, with 3 more months for bug fixing and polishing. That is not a lot of time for such a big and complex kernel mod (for its time).