On 8/3/21, arnold(a)skeeve.com <arnold(a)skeeve.com> wrote:
I haven't caught up yet in this thread. Apologies if this has been
discussed already.
The Plan 9 folks blazed this trail over 30 years ago with rfork, where
you specify what bits you wish to duplicate. I don't remember details
anymore, but I think it was pretty elegant. IIRC Around that time Rob Pike
said "Threads are the lack of an idea", meaning, if you think you need
threads, you haven't thought about the problem hard enough. (Apologies
to Rob if I am misremembering and/or misrepresenting.)
I've never really been a fan of the rfork()/clone() model, or at least
the Linux implementation of it that requires ugly library-level hacks
to share state between threads that the kernel doesn't support
sharing. Also, I don't really care for the large number of flags
required.
Up until now I was just planning on following the traditional
threading model of associating most state with processes with only
execution state being per-thread in the OS I'm working on, but now I'm
thinking I should reduce the state associated with a process to just
the PID, PPID, PGID, containing cgroup, command line, and list of
threads. All other state would be contained in various types of
context objects that are not tied to a particular process or thread
(other than being destroyed when no more threads are associated with
them). This would include:
Filesystem namespace
File descriptors
Address space
Security context (file permission list, UID, GID)
Signal handlers
Scheduling context
Each of these object types would be completely separate from the
others, allowing full control over which state is shared and which is
private. I'm using seL4 as a microkernel, and it already works like
this (it has no real concept of processes, only threads that are each
associated with an address space, a capability space, and a scheduling
context) so it's a good match for it.
exec() would still replace all threads within a process as on
traditional Unix, unless the exec is performed within a child process
that hasn't yet been started. Sending a signal to an entire process
would send it to every signal group within the process (similarly, it
would be possible to send a signal to an entire cgroup; basically,
processes will really just be a special kind of cgroup in this model).