On 8/1/21, Theodore Ts'o <tytso(a)mit.edu> wrote:
I've seen this argument a number of times, but what's never been clear
to me is what *would* the "normal APIs" be which would allow a parent
to set up the child's state? How would that be accomplished? Lots of
new system calls? Magic files in /proc/<pid>/XXX which get
manipulated somehow? (How, exactly, does one affect the child's
memory map via magic read/write calls to /proc/<pid>/XXX.... How
about environment variables, etc.)
My OS will be microkernel-based and even the RPC channel to the VFS
itself will be a file (with some special semantics). read(), write()
and seek() will bypass the VFS entirely and call the kernel to
directly communicate with the destination process. The call to create
an empty process will return a new RPC channel and there will be an
API to temporarily switch to an alternate channel so that VFS calls
occur in the child context instead of the parent.
All process memory, even the heap and stack, will be implemented as
memory-mapped files in a per-process filesystem under /proc/<pid>.
This will be a special "shadowfs" that allows creating files that
shadow ranges of other files (either on disk or in memory).
Environment variables will also be exposed in /proc of course.
And what are the access rights by which a process gets to reach out
and touch another process's environment? Is it only allowed only for
child processes? And is it only allowed before the child starts
running? What if the child process is going to be running a setuid or
setgid executable?
Any process that has permissions to access the RPC channel file and
memory mapping shadow files in /proc/<pid> will be able to manipulate
the state. The RPC channel will cease to function after the child has
been started. setuid and setgid executables will not be supported at
all (there will instead be a role-based access control system layered
on top of a per-process file permission list, which will allow
privilege escalation on exec in certain situations defined by
configuration).
The phrase "all process state will have a file-based interface" sounds
good on paper, but I think it remains to be seen how well a "echo XXX
/proc/<pid>/magic-file" API would
actually work. The devil is
really in the details....
Even though everything will use a file-based implementation
underneath, there will be a utility library layered on top of it so
that user code doesn't have to contain lots of
open()-read()/write()-close() boilerplate.