Sat Aug 31 06:21:47 AEST 2019
John Reiser did do his own paging system for UNIX
32/V.
I heard about it from friends at Bell Labs ca. 1982-83,
when I was still running systems for physicists at Caltech.
It sounded very interesting, and I would love to have had
my hands on it--page cache unified with buffer cache,
copy-on-write from the start.
[...]
It is in any case a shame that jfr's code never saw the light
of day. I really hope someone can find it on an old tape
somewhere and we can get it into the archive, if only because
I'd love to look at it.
Norman Wilson
Toronto ON
Norman,
I am getting more optimistic that much of this version of 32V can be ‘triangulated’ from
the surviving sources. For convenience I call this version “32V r3” (with the first
swapping version being "32V r1" and the scatter loading version being "32V
r2").
I’ve been reading my way through the surviving 32/V r2, SysIII-Vax, SVr1-Vax and SVr2-Vax
sources. There seems to be a lot of continuous progression in these versions. From a
communication with Keith Kelleman I thought that VM in SVr2 was unrelated, but that
appears to have been a misunderstanding. In short, it seems that the basic paging
algorithms and data structures in SVr2 (other than the region abstraction) come from 32V
r3.
The strongest clue for the source link is in the SVr2 “page.h” file. In it, the union
describing a page table entry matches with JFR’s description and is otherwise (partially)
unused in SVr2. There are other, similar clues in the other source trees.
To explain more clearly, have a look at this code section:
https://github.com/ryanwoodsmall/oldsysv/blob/master/sysvr2-vax/src/uts/vax…
In this union, the 2nd struct “pgd” is never used, nor is the bit pg_disk in the “pgm”
struct ever used. It matches with JFR’s recollection:
<quote>
My internal design strategy was to use the hardware page table entry
(4 bytes per page, with "page not present" being one bit which freed
the other 31 bits for use by software) as the anchor to track everything
about a page. This resulted in a complicated union of bitfield structs,
which became a headache. When other departments took over the work,
the first thing they did was use 8 bytes per page, which meant shuffling
between the software and the hardware descriptors: its own mess.
<unquote>
In the pte as given, a pte can be in two states: (i) the pg_disk bit is reset in which
case it is in “pgm” format - this format is recognized by the mmu hardware; (ii) the
pg_disk bit is set in which case it is in “pgd” format. When a page is paged in, the disk
form of the pte was I think saved in the page frame descriptor (the “pfdat" table)
and the pte converted to the memory form. Upon page-out the reverse happened. In the SVr2
version of this code the “pgd” form is abandoned and replaced by the separate disk
descriptors (see
https://github.com/ryanwoodsmall/oldsysv/blob/master/sysvr2-vax/src/uts/vax…)
The “pgd” structure is a bit puzzling. There is a 19 bit device block number, which is
less than the 24 bits allowed by the filesystem code and also less than the 20 bits that
would fit in the pte. Maybe this is because the RP04/RP05 disks of the early 80’s worked
with 19 bits. I am not sure what the “pg_iord” field is about. The comment “inode index”
may be misleading. My hypothesis that it is a short form of the device code, for instance
an index into the mount table; magic values could have been used to identify the swap
device, “demand zero”, etc.
It seems probable to me that the paging algorithm in SVr2 was derived from the 32/V r3
algorithm. John's recollection:
<quote>
Our machine started with 512KB of RAM, but within a few months was upgraded
to 4 MB with boards that used a new generation of RAM chips.
The hardware page size was 512 bytes, which is quite small. Strict LRU
on 8,192 pages, plus Copy-On-Write, made the second reference to a page
"blindingly fast".
</unquote>
In a follow up mail conversation with JFR we (I?) arrived at the conclusion that literal
“strict LRU” is not likely on VAX hardware, but that an algorithm that maintains a small
working set combined with a large second chance list amounts to about the same. It seems
to me that this description also applies to the surviving SVr2 implementation: every 4
seconds sweep through the page tables of all processes and pages that were not referenced
are moved to the free/2nd chance list.
To implement this it seems likely to me that 32V r3 used a structure similar to this:
https://github.com/ryanwoodsmall/oldsysv/blob/master/sysvr2-vax/src/uts/vax…
Such a structure is also in line with viewing core as a cache for disk, similar to TENEX
that JFR had in mind.
The big change from SysIII to SVr1 in kernel page table management is the introduction of
the “sptmap”
(
https://chiselapp.com/user/pnr/repository/paging/file?name=os/machdep.c&…)
In 32V r2 and SysIII the user page tables are swapped in and out of kernel space along
with the _u area. This makes it impractical to do the working set sweep every few seconds.
The “sptmap” innovation effectively creates a global page table space that fits with the
needs of the working set sweep. In SVr1 it seems to have no real benefit, and it seems
likely to me that it came from 32V r3.
In general it seems plausible to me that SVr1 derives from 32V r3, but was regressed to
SysIII code where necessary to keep the code PDP-11 compatible. Another clue for this is
in the buffer code of SVr1:
https://chiselapp.com/user/pnr/repository/paging/file?name=os/main.c&ci…
Here, the disk buffers are allocated as virtual memory pages, not as an array. This too is
otherwise unused in SVr1, but makes perfect sense in the context of 32V r3.
So, in summary, it would seem to me that the 32V r3 virtual memory system:
- used sptmap code similar to SVr1/r2 to manage page tables in the kernel
- used the pte structure from SVr2 and a pfdat table similar to SVr2 to manage mappings
- used page stealer and fault code similar to SVr2
Phrased the other way around: SVr2-vax seems to use the 32V r3 virtual memory code with
the region abstraction added on top, and the unified buffer removed.
At the moment I have not found clear clues for the unified buffer algorithms or mmap
implementation. Perhaps careful reading of the IPC shared memory code in SVr1 will yield
some.
To be continued …