On Sat, Nov 17, 2018 at 03:09:08PM -0500, Noel Chiappa wrote:
I looked at how Dave Clark was doing it on Multics, and I was green with envy.
He added, debugged and improved his code _on the running main campus system_,
sharing the machine with dozens of real users! Try doing that on UNIX
(although nowadays it's getting there, with loadable kernel stuff - but this
was in the 70's)!
One of the things that made Multics amazing is that you could replace
a shared library while other processes was using it --- and without
anything crashing. To achieve this, there is a tag field which
identifies the object type and its length. So if a library defines an
expanded version of the data structure, the new fields are tacked onto
the end of the data structure and the length field is bumped. Older
callers of the library might pass in a version of the data structure
with the original length field; hence, fields can't be accessed unless
without first checking the structure tag.
I stole this idea and used in Kerberos v5 and Linux's userspace
ext2/3/4 utilities, where we use a error table code --- another
Multics concept --- as the structure tag. So in the error_table file,
we might have:
ec EXT2_ET_MAGIC_BADBLOCKS_ITERATE,
"Wrong magic number for badblocks_iterate structure"
And that in each function that uses that structure, there'd be
something like this:
EXT2_CHECK_MAGIC(iter, EXT2_ET_MAGIC_BADBLOCKS_ITERATE);
Where EXT2_CHECK_MAGIC is defined as:
#define EXT2_CHECK_MAGIC(struct, code) \
if ((struct)->magic != (code)) return (code)
(All MIT KerberosV5 and libext2fs structures have a 32-bit unsigned
magic field as the first 4 bytes of the structure.)
This technique is also useful so when I needed to add support for
64-bit block numbers, I could use the structure magic numbers to
disambiguate which version of the object we were using. Hence unlike
some shared libraries where the magic number has been incremented to
indicate an ABI break every few months, e2fsprogs has not had an ABI
break in over ten years.
This also made it a bit easier to find use-after-free bugs in an era
before valgrid/purify, by the simple expedient of zeroing the magic
field when deallocating an object.
The security wasn't good, because Multics
didn't have set-uid (so that only
Dave's code would have had access to that state database) - when they later
productized the code, they used Multics rings to make it secure.
So that's a bit misleading. Setuid isn't really a good analogue for
protection rings. The proper analogue for user mode versus kernel
mode in the Unix world. (Where user mode is roughly speaking, Multics
ring 4, and Kernel mode is Multics ring 0 --- the Honeywell hardware
had support for 8 rings, but processes running below ring 4 have so
little access that using them isn't terribly practical for general
purpose programs. Processes ringing at rings 5 and higher wouldn't
have access to most of what we in the Unix world would call "the
standard POSIX system calls".)
Code running in one ring can transition to higher rings via "gates",
which would be the Unix equivalent of a system call. Hence, a Multics
program running at Ring 4 could create its own gates that would
provide an extremeted limited set of system services to programs
running at Ring 5. Those programs wouldn't have access to the normal
system calls, but only via the specified functions in the Ring 4
gates. This is sort of like Capsicum, but it's more powerful --- and
it was designed decades before FreeBSD's Capsicum.
The nice thing was that to call up some subsystem to
perform some service for
you, you didn't have to do IPC and then a process switch - it was a
_subroutine call_, in the CPU's hardware.
Well, when you call a system call, you don't do a process switch
either. So when Ring 4 code calls a ring 0 service, you chan think of
it as a system call. It might not have been any slower than a normal
function call, but remember, this is a CISC system. So another way of
saying things is that normal function calls weren't any faster than a
privilege transition via a system call!
The 386-Pentium actually had support for many
segments, but I gather they are
in the process of deleting it in the latest machines because nobody's using
it. Which is a pity, because when done correctly (which it was - Intel hired
Paul Karger to architect it) it's just what you need for a truly secure system
(which Multics also had) - but that's another long message.
One unfortunate thing about the 386 VM is that a segment plus offset
gets translated to a 32-bit global virtual address, which is then
translated to a physical address via a single page table. With
Multics, each segment had its own page table which translated the
segment+offset to a physical address.
With only 32-bits of virtual address space on the 386, it's not at all
clear aggressive use of segments ala Multics would have worked
terribly well, due to the internal fragmentation of that 32-bit
address space. So I've talked to some Multicians at MIT who might
quibble with the claim that 386's deisgn was "done correctly".
In any case, since no one really used segments on 32-bit x86, segment
support ended up getting mostly dropped in 64-bit mode. (The FS and
GS segments still kinda work, mostly to keep Windows happy. The CS,
DS, ES, and SS segments are basically no-ops in the 64-bit x86 world.)
Which is too bad. I suspect that with a 64-bit address space,
designing an OS with a Multics-style segmentation architecture might
have been possible.
(But see Rob Pike's "Systems Software Research is Irrelevant" rant for
the argument that even if was *possible* it was very unlikely to have
happened, so for AMD and Intel to have neutered segmentation in the
x86-64 architecture might have been a well justified decision.)
Cheers,
- Ted