The demand paging code for SysVR2 was written by Keith
A. Kelleman and Steven J. Buroff, and in contemporary conference talks they were saying
that they wanted to combine the best parts of demand-paged 32V and BSD. They may have some
additional memories that could help with getting a better understanding of the final
version of 32V.
Does anybody have contact details for these two gentlemen?
I’ve managed to contact Keith Kelleman and he had some interesting remarks. The paging
code in SVR2 was all new code, with a focus the 3B dual processor. It does not derive at
the code level from 32V and in fact he does not recall working with the 32V paging code.
This kills the hope that the SVR2 code had clues about the 32V code. Keith did suggest
that I tried to contact Tom Raleigh, who might have worked with the later 32V code base.
Anybody with suggestions for locating him?
===
Besides functionality, the people that remember paged 32V all recall it being very fast. I
wonder what made it fast.
First to consider is “faster than what?”. Maybe Rob P. has a specific memory, but I would
assume faster than 4BSD: if the comparison was with the "scatter loading, partial
swapping” version of 32V people would have expected the better performance and not
remember it as impressive 40 years later. Possibly the comparison is with 8th Edition
which would have used the 4BSD paging code by then.
If the comparison is with 4BSD, then the CoW feature in paging 32V would have been mostly
matched by the vfork mechanism in 4BSD: it covers 80% of the use and it leaves the code
paths simpler. If the comparison is with 8th edition, this may be the difference that
people remember.
The next possibility is that paging 32V had a better page-out algorithm. Joy/Babaoglu
mention that the cost of the clock process is noticable. Maybe paged 32V used a VMS-like
FIFO/second chance algorithm that did not need a separate kernel process/thread. Arguably
this is not enough for a convincing speed difference.
It is also possible that JFR found a more clever way to do LRU approximation. He remembers
that his code used ‘strict LRU’, but not the algorithm. On Tenex - his conceptual
reference - that was natural to do, because the MMU hardware maintains a table with 4
words of 36 bits for each frame with statistical data. With the VAX hardware it is a
challenge. Considering his mathematical prowess it is maybe plausible that JFR found an
efficient way. A slightly better page hit rate gives a significant speed improvement.
All speculation of course: only finding the source will truly tell.