On 2/9/21, Theodore Ts'o <tytso(a)mit.edu> wrote:
Everything can be implemented in terms of a turing machine tape, so
I'm sure that's true. Whether or not it would be *performant* and
*secure* in the face of application level bugs might be a different
story, though.
seL4 basically provides no primitives other than send and receive, and
UX/RT will just map read()/write()-family APIs onto send and receive
(the IPC transport layer won't be trivial, but it will be simpler than
those under most other microkernel OSes). Basically everything else
will be implemented on top of the read()/write() APIs provided by the
transport layer (memory mapping will sort of bypass it, but all user
memory will be handled as memory-mapped files, even that which is
anonymous on other systems). In order to better map onto seL4 IPC
semantics, variants of read()/write() that operate on message
registers and a shared buffer will be provided, but these will be
compatible with each other and with the traditional versions (messages
will be copied on read when a different variant was used to write
them).
Basically if there were something that couldn't be implemented
efficiently and securely on top of a combination of read()/write() and
shared memory, that would mean that it couldn't be securely
implemented on top of IPC, and I'm not sure that there is anything
like that.
In fact, some of the terrible semantics of the Posix interfaces exist
only because there were traditional Unix vendors on the standards
committee insisting on semantics that *could* be implemented using a
user-mode library on top of normal file API's (I'm looking at you,
fcntl locking semantics, where a close of *any* file descriptor, even
a fd cloned via dup(2) or fork(2) will release the lock). So yes,
Posix fcntl(2) locking *can* be implemented in terms of normal file
API's.... AND IT WAS A TERRIBLE IDEA. (For more details, see [1].)
UX/RT's file locking will be implemented with RPCs to the process
server just like open()/close() and the like (which will use
read()/write()-family APIs underneath; the initial RPC connection to
the process server will be permanently open but it will be possible to
create new connections to manipulate the environment of child
processes before starting them, so that fork() doesn't have to be a
primitive anymore).
AFAIK, little actually depends on those rather broken "close one FD
and release all locks on that file" semantics, so UX/RT will implement
more sane locking semantics by default. There will be a flag to revert
to the traditional semantics (probably just implemented at the library
level) in case anything actually depends on them.
I could go on about other spectactularly bad ideas enshrined in POSIX,
such as telldir(2) and seekdir(2), which date all the way back to the
assumption that directories should only be implemented in terms of
linear linked lists with O(n) lookup performance, but I don't whine
about that as feature bloat imposed by external standards, but just
the cost of doing business. (Or at least, of relevance.)
The directory contents that normal user programs actually see on UX/RT
will be in a standardized format managed by the VFS (since support for
a limited form of union mounts will be built in).
I'm not sure what you're referring to; if you mean the *at(2) system
calls, which is why they exist in Linux (not for !@#!? Windows file
streams support); they are needed to provide secure and performant
user-mode file servers for things like Samba. Trying to implement a
user-space file server using only the V7 Unix primitives will cause
you to have some really horrible Time of Use vs Time of Check (TOUTOC)
security gaps; you can narrow the TOUTOC races with some terrible
performance sucking impacts, but removing them entirely is almost
impossible.
I'm talking about implementing local filesystems in regular processes
(rather than requiring them to be in the kernel) like in QNX or Plan
9, not about network filesystems (although of course network
filesystem clients can be implemented on top of such an API). Linux
has support for them through FUSE, but AFAIK it has performance issues
and isn't very well integrated, so it isn't used all that much.
When it comes to normal server processes, UX/RT will mostly depend on
checking security on open() rather than on read()/write()-family APIs,
which will limit the risk of TOCTTOU vulnerabilities. Where security
does have to be checked on reads or writes (such as the ones
underlying the RPC implementing open() itself), the data will be
copied before checking. Using the traditional read()/write() instead
of the new zero-copy equivalents should usually be good enough AFAIK,
since they copy to a caller-provided buffer.
That's because most people aren't going to port or rewrite application
software for some random OS, whether it is a research OS or someone's
new "simple, clean, reimplementation". And most users do expect to
have a working web browser.... and text editor..., and their favorite
games, whether it's nethack or spacewars, etc., etc., etc.
I'm very well aware of that. UX/RT will implement most Linux APIs
(either in libraries, servers, or combinations of the two) and will
have a Linux binary compatibility layer. The only major
incompatibilities are likely to be with stuff that manages sessions
and logins (since UX/RT will natively have a mostly process-oriented
security model, with no way to fully revert to traditional Unix
security outside of running programs in fakeroot containers).