Re: Terminology query - 'system process'? - COFF

List overview All Threads
Download

newer

Re: Terminology query - 'system process'?

older

Re: Terminology query - 'system...

Terminology query - 'system...

jnc＠mercury.lcs.mit.edu

14 Dec 2023 14 Dec '23

11:29 p.m.

...

From: Bakul Shah

...

Now I'd probably call them kernel threads as they don't have a separate address space.

Makes sense. One query about stacks, and blocking, there. Do kernel threads, in general, have per-thread stacks; so that they can block (and later resume exactly where they were when they blocked)? That was the thing that, I think, made kernel processes really attractive as a kernel structuring tool; you get code ike this (from V6): swap(rp->p_addr, a, rp->p_size, B_READ); mfree(swapmap, (rp->p_size+7)/8, rp->p_addr); The call to swap() blocks until the I/O operation is complete, whereupon that call returns, and away one goes. Very clean and simple code. Use of a kernel process probably makes the BSD pageout daemon code fairly straightforward, too (well, as straightforward as anything done by Berzerkly was :-). Interestingly, other early systems don't seem to have thought of this structuring technique. I assumed that Multics used a similar technique to write 'dirty' pages out, to maintain a free list. However, when I looked in the Multics Storage System Program Logic Manual: http://www.bitsavers.org/pdf/honeywell/large_systems/multics/AN61A_storageS… Multics just writes dirty pages as part of the page fault code: "This starting of writes is performed by the subroutine claim_mod_core in page_fault. This subroutine is invoked at the end of every page fault." (pg. 8-36, pg. 166 of the PDF.) (Which also increases the real-time delay to complete dealing with a page fault.) It makes sense to have a kernel process do this; having the page fault code do it just makes that code more complicated. (The code in V6 to swap processes in and out is beautifully simple.) But it's apparently only obvious in retrospect (like many brilliant ideas :-). Noel

Show replies by date

Larry McVoy

14 Dec 14 Dec

11:54 p.m.

New subject: Terminology query - 'system process'?

On Thu, Dec 14, 2023 at 06:29:35PM -0500, Noel Chiappa wrote:

...

From: Bakul Shah

Now I'd probably call them kernel threads as they don't have a separate address space.

Makes sense. One query about stacks, and blocking, there. Do kernel threads, in general, have per-thread stacks; so that they can block (and later resume exactly where they were when they blocked)?

Yep, threads have stacks, not sure how they could work without them. Which reminds me of some Solaris insanity. Back when I was at Sun, they were threading the VM and I/O system. The kernel was pretty bloated and a stack was 2 8K pages. The I/O people wanted to allocate a kernel thread *per page* that was being sent to disk/network. I pointed out that this means if all of memory wants to head to disk, your dirty page cache is 1/3 of memory because the other 2/3s were thread stacks. They ignored me, implemented it, and it was a miserable failure and they had to start over. They just didn't believe in basic math.

...

Use of a kernel process probably makes the BSD pageout daemon code fairly straightforward, too (well, as straightforward as anything done by Berzerkly was :-).

I have immense respect for the BSD pageout daemon code. When I did UFS clustering to make I/O go at the platter speed (rather than 1/2 the platter speed), it caused a big problem because the pageout daemon could not keep up with UFS, UFS used up page cache much faster than the pageout daemon could free pages. I wrote somewhere around 13 different pageout daemons in an attempt to do better than the BSD one. And, in certain cases, I did do better. All of them did at least a little better. But none of them did better in all cases. That BSD code was subtly awesome, I was at the top of my game and couldn't beat it. I ended up implementing a "free behind" in UFS, I'd watch the variable that controlled kicking the pageout code into running, and I'd start freeing behind when it was getting close to running (but enough ahead that the pageout daemon wouldn't wake up). It's a really gross hack but it was the best that I could come up with. --lm

Bakul Shah

15 Dec 15 Dec

1:15 a.m.

New subject: Terminology query - 'system process'?

On Dec 14, 2023, at 3:29 PM, Noel Chiappa <jnc(a)mercury.lcs.mit.edu> wrote:

...

From: Bakul Shah

Now I'd probably call them kernel threads as they don't have a separate address space.

Makes sense. One query about stacks, and blocking, there. Do kernel threads, in general, have per-thread stacks; so that they can block (and later resume exactly where they were when they blocked)?

Exactly! If blocking was not required, you can do the work in an interrupt handler. If blocking is required, you can't just use the stack of a random process (while in supervisor mode) unless you are doing some work specifically on its behalf.

...

Interestingly, other early systems don't seem to have thought of this structuring technique.

I suspect IBM operating systems probably did use them. At least TSO must have. Once you start *accounting* (and charging) for cpu time, this idea must fall out naturally. You don't want to charge a process for kernel time used for an unrelated work! *Accounting* is an interesting thing to think about. In a microkernel where most of the work is done by user mode services, how to you keep track of time and resources used by a process (or user)? This can matter for things like latency, which may be part of your service level agreement (SLA). This should also have a bearing on modern network based services.

...

It makes sense to have a kernel process do this; having the page fault code do it just makes that code more complicated. (The code in V6 to swap processes in and out is beautifully simple.) But it's apparently only obvious in retrospect (like many brilliant ideas :-).

There was a race condition in V7 swapping code. Once a colleague and I spent two weeks of 16 hour debugging days! We were encrypting before swapping out and decrypting before swapping in. This changed timing enough that the bug manifested itself (in the end about 2-3 times a day when the system was running 5 or 6 kernel builds in parallel!). This is when I really understood "You are not expected to understand this." :-)

Paul Winalski

5:51 p.m.

New subject: Terminology query - 'system process'?

For me, the term "system process" means either: o A conventional, but perhaps privileged user-mode process that performs a system function. An example would be the output side of a spooling system, or an operator communications process. o A process, or at least an address space + execution thread, that runs in privileged mode on the hardware and whose address space is in the resident kernel. Do Unix system processes participate in time-sliced scheduling the way that user processes do? On 12/14/23, Bakul Shah <bakul(a)iitbombay.org> wrote:

...

Interestingly, other early systems don't seem to have thought of this structuring technique.

The usual programming convention for IBM S/360/370 operating systems (OS/360, OS/VS, TOS and DOS/360, DOS/VS) did not involve use of a stack at all, unless one was writing a routine involving recursive calls, and that was rare. Addressing for both program and data was done using a base register + offset. PL/I is the only IBM HLL I know that explicitly supported recursion. I don't know how they implemented automatic variables assigned to memory in recursive routines. It might have been a linked list rather than a stack. I remember when I first went from the IBM world and started programming VAX/VMS, I thought it was really weird to burn an entire register just for a process stack.

...

There was a race condition in V7 swapping code. Once a colleague and I spent two weeks of 16 hour debugging days!

I had a race condition in some multithread code I wrote. I couldn't find it the bug. I even resorted to getting machine code listings of the whole program and marking the critical and non-critical sections with green and red markers. I eventually threw all of the code out and rewrite it from scratch. The second version didn't have the race condition. -Paul W.

Warner Losh

6:08 p.m.

New subject: Terminology query - 'system process'?

On Fri, Dec 15, 2023 at 10:51 AM Paul Winalski <paul.winalski(a)gmail.com> wrote:

...

Yes. At least on FreeBSD they do. They are just processes that get scheduled. They may have different priorities, etc, but all that factors in, and those priorities allow them to compete and/or preempt already running processes depending on a number of things. The only thing special about kernel-only thread/processes is that they are optimized knowing they never have a userland associated with them...

...

On 12/14/23, Bakul Shah <bakul(a)iitbombay.org> wrote:

Interestingly, other early systems don't seem to have thought of this structuring technique.

There was a race condition in V7 swapping code. Once a colleague and I spent two weeks of 16 hour debugging days!

The award for my 'longest bug chased' is at around 3-4 years. We had a product, based on an arm9 CPU (so armv4) that would sometimes hang. Well, individual threads in it would hang waiting for a lock and so weird aspects of the program stopped working in unusual ways. But the root cause was a stuck lock, or missed wakeup. It took months to recreate this problem. I tried all manner of debugging to accelerate it reoccurring (no luck) to audit tall locks/unlocks/wakeups to make sure there was no leaks or subtle mismatches (there wasn't, despite a 100MB log file). It went on and on. I rewrote all the locking / sleeping / etc code, but also no dice. The one day, by chance, I was talking to someone who asked me about atomic operations. I blew them off at first, but then realized the atomic ops weren't implemented in hardware, but in software with the support of the kernel (there were no CPU level atomic ops). Within an hour of realizing this and auditing the code path, I had a fix to a race that was trivial to discover once you looked at the code closely. My friend also found the same race that I had about the same time I was finishing up my fix (which he found another race in, go pair programming). With the corrected fix, the weird hanging went away, only to be reported once again... in a unit that hadn't been updated with the patch! tl;dr: you never know what the root cause might be in weird, racy situations. Warner

Greg 'groggy' Lehey

16 Dec 16 Dec

2:04 a.m.

New subject: Terminology query - 'system process'?

On Friday, 15 December 2023 at 12:51:47 -0500, Paul Winalski wrote:

...

Yes, the 360 architecture doesn't have a hardware stack. Subroutine calls worked with was something like a linked list. Registers were saved in a “save area", and they were linked. At least in assembler (I never programmed HLLs under MVS), by convention R13 pointed to the save area. From memory, subroutine calls worked like: LA 15,SUBR load address of subroutine BALR 14,15 call subroutine, storing address in R14 The subroutine then starts with STM 14,12,12(13) save registers 14 to 12 (wraparound) in old save area LA 14,SAVE load address of our save area ST 14,8(13) save in linkage of old save area LR 13,14 and point to our save areas Returning from the subroutine was then L 13,4(13) restore old save area LM 14,12,12(13) restore the other registers BR 14 and return to the caller Clearly this example isn't recursive, since it uses a static save area. But with dynamic allocation it could be recursive.

...

I remember when I first went from the IBM world and started programming VAX/VMS, I thought it was really weird to burn an entire register just for a process stack.

Heh. Only one register? /370 was an experience for me, one I never wanted to repeat. Greg -- Sent from my desktop computer. Finger grog(a)lemis.com for PGP public key. See complete headers for address and phone numbers. This message is digitally signed. If your Microsoft mail program reports problems, please read http://lemis.com/broken-MUA.php

Paul Winalski

7:21 p.m.

New subject: Terminology query - 'system process'?

On 12/15/23, Greg 'groggy' Lehey <grog(a)lemis.com> wrote:

...

At least in assembler (I never programmed HLLs under MVS), by convention R13 pointed to the save area. From memory, subroutine calls worked like: LA 15,SUBR load address of subroutine BALR 14,15 call subroutine, storing address in R14 The subroutine then starts with STM 14,12,12(13) save registers 14 to 12 (wraparound) in old save area LA 14,SAVE load address of our save area ST 14,8(13) save in linkage of old save area LR 13,14 and point to our save areas Returning from the subroutine was then L 13,4(13) restore old save area LM 14,12,12(13) restore the other registers BR 14 and return to the caller Clearly this example isn't recursive, since it uses a static save area. But with dynamic allocation it could be recursive.

Yes, that was the most common calling convention in S/360/370., and the one that was used if you were implementing a subroutine package for general use. It has the advantage that the (caller-allocated) register save area has room for all of the registers and so there is no need to change the caller code if the callee is changed to use an additional register. It also makes it very convenient to implement unwinding from an exception handler. But it does burn 60 bytes for the register save area and if you're programming for a S/360 model 25 with only 32K of user-available memory that can be significant. Those writing their own assembly code typically cut corners on this convention in order to reduce the memory footprint and the execution time spent saving/restoring registers. There's been long debate by ABI and compiler designers over the relative merits of assigning the duties of allocating the register save area (RSA) and saving/restoring registers to either the caller or the callee. The IBM convention has the caller allocate the RSA and the callee save and restore the register contents. One can also have a convention where the caller allocates an RSA and saves/restores the registers it is actively using. Or a convention where the callee allocates the RSA and saves/restores the registers it has modified. Each convention has its merits and demerits. The IBM PL/I compiler for OS and OS/VS (but not DOS and DOS/VS) had three routine declaration attributes to assist in optimization of routine calls. Absent any other information, the compiler must assume the worst--that the subroutine call may modify any of the global variables, and that it may be recursive. IBM PL/I had a RECURSIVE attribute to flag routines that are recursive. It also had two attributes--USES and SETS--to describe the global variables that are either used by (USES) or changed by (SETS) the routine. Global variables not in the USES list did not have to be spilled before the call. Similarly, global variables not in the SETS list did not have to be re-loaded after the call. IBM dropped USES and SETS from the PL/I language with the S/370 compilers. USES and SETS were something of a maintenance nightmare for application programmers. They were very error-prone. If you didn't keep the USES and SETS declarations up-to-date when you modified a routine you could introduce all manner of subtle stale data bugs. On the compiler writers' side, data flow analysis wasn't yet advanced enough to make good use of the USES and SETS information anyway. Modern compilers perform interprocedural analysis when they can and derive accurate global variable data blow information on their own. -Paul W.

Paul Winalski

7:44 p.m.

New subject: Terminology query - 'system process'?

IBM's OS/360 did not have the modern "process" concept using virtual memory to implement a thread of control with its own, separate address space. Instead they had the concept of threads of control associated with contiguous segments of physical memory called "partitions". You had either a pre-defined set of partitions (OS MFT, multiprogramming with a fixed number of tasks) or the OS allocated partitions on-the-fly as needed for the current mix of jobs (OS/MVT, multiprogramming with a variable number of tasks). OS/VS1, OS/VS2 SVS, and DOS/VS for System/370 operated in the same way, except there was a single virtual address space, usually much larger than physical memory, that was partitioned up. DOS/360 ran one job at a time. DOS/VS had up to 5 partititions: BG (background), and P1-P4. Scheduling in DOS/VS was strictly preemptive, in the order P1, P2, P3, P4, BG. P1 got control whenever it was ready to run. If P1 was stalled, P2 was scheduled, then P3 through BG, which only got to run whenever the higher-priority jobs were stalled. The most sophisticated (and resource-hogging) version of OS/VS was OS/VS MVS (multiple virtual storages) which implemented the modern concept of each partition (process) getting its own, separate 0-based address space. Some of these partitions might be allocated to privileged tasks, most notably the spooling system such as HASP (Houston Automatic Spooling Priorithy), which was developed by IBM contractors working at NASA's Houston space facility in the mid-1960s. HASP provided both spooling and remote job entry services and ran at least partly in partition (i.e., user process) context. DOS/360 had a kernel (supervisor, in IBM-speak) enhancement called POWER that provided spooling capability. DOS/VS had POWER/VS, which ran in a separate partition (typically P1, the highest priority). VAX/VMS (and its successor OpenVMS) had a few privileged user-mode processes to perform system tasks. Two of these processes were OPCOM (provides communication with the operator at the operator's console) and JOB CONTROL (provides spooling and batch job services). I think OPCOM runs entirely in user mode. JOB CONTROL may have some routines that execute in kernel mode. These system processes are similar to daemons in Unix. -Paul W.

Dan Cross

15 Dec 15 Dec

1:43 p.m.

New subject: Terminology query - 'system process'?

On Thu, Dec 14, 2023 at 7:07 PM Noel Chiappa <jnc(a)mercury.lcs.mit.edu> wrote:

...

Now I'd probably call them kernel threads as they don't have a separate address space.

Assuming we're talking about Unix, yes, each process has two stacks: one for userspace, one in the kernel. The way I've always thought about it, every process has two parts: the userspace part, and a matching thread in the kernel. When Unix is running, it is always running in the context of _some_ process (modulo early boot, before any processes have been created, of course). Furthermore, when the process is running in user mode, the kernel stack is empty. When a process traps into the kernel, it's running on the kernel stack for the corresponding kthread. Processes may enter the kernel in one of two ways: directly, by invoking a system call, or indirectly, by taking an interrupt. In the latter case, the kernel simply runs the interrupt handler within the context of whatever process happened to be running when the interrupt occurred. In both cases, one usually says that the process is either "running in userspace" (ie, normal execution of whatever program is running in the process) or "running in the kernel" (that is, the kernel is executing in the context of that process). Note that this affects behavior around blocking operations. Traditionally, Unix device drivers had a notion of an "upper half" and a "lower half." The upper half is the code that is invoked on behalf of a process requesting services from the kernel via some system call; the lower half is the code that runs in response to an interrupt for the corresponding device. Since it's impossible in general to know what process is running when an interrupt fires, it was important not to perform operations that would cause the current process to be unscheduled in an interrupt handler; hence the old adage, "don't sleep in the bottom half of a device driver" (where sleep here means sleep as in "sleep and wakeup", a la a condition variable, not "sleep for some amount of time"): you would block some random process, which may never be woken up again! An interesting aside here is signals. We think of them as an asynchronous mechanism for interrupting a process, but their delivery must be coordinated by the kernel; in particular, if I send a signal to a process that is running in userspace, it (typically) won't be delivered right away; rather, it will be delivered the next time the process is scheduled to run, as the process must enter the kernel before delivery can be effected. Signal delivery is a synthetic event, unlike the delivery of a hardware interrupt, and the upcall happens in userspace.

...

Use of a kernel process probably makes the BSD pageout daemon code fairly straightforward, too (well, as straightforward as anything done by Berzerkly was :-). Interestingly, other early systems don't seem to have thought of this structuring technique. I assumed that Multics used a similar technique to write 'dirty' pages out, to maintain a free list. However, when I looked in the Multics Storage System Program Logic Manual: http://www.bitsavers.org/pdf/honeywell/large_systems/multics/AN61A_storageS… Multics just writes dirty pages as part of the page fault code: "This starting of writes is performed by the subroutine claim_mod_core in page_fault. This subroutine is invoked at the end of every page fault." (pg. 8-36, pg. 166 of the PDF.) (Which also increases the real-time delay to complete dealing with a page fault.)

Note that this says, "starting of writes." Presumably, the writes themselves were asynchronous; this just initiates the operations. It certainly adds latency to the page fault handler, but not as much as waiting for the operations to complete!

...

I can kinda sorta see a method in the madness of the Multics approach. If you think that page faults are relatively rare, and initiating IO is relatively cheap but still more expensive than executing "normal" instructions, then it makes some sense that you might want to amortize the cost of that by piggybacking one on the other. Of course, that's just speculation and I don't really have a sense for how well that worked out in Multics (which I have played around with and read about, but still seems largely mysterious to me). In the Unix model, you've got scheduling latency to deal with to run the pageout daemon; of course, that all happened as part of a context switch, and in early Unix there was no demand paging (and so I suppose page faults were considered fatal). That said, using threads as an organizational metaphor for structured concurrency in the kernel is wonderful compared to many of the alternatives (hand-coded state machines, for example). - Dan C.

Derek Fawcus

19 Dec 19 Dec

1:54 p.m.

New subject: Terminology query - 'system process'?

On Thu, Dec 14, 2023 at 06:29:35PM -0500, Noel Chiappa wrote:

...

Interestingly, other early systems don't seem to have thought of this structuring technique.

How early does that have to be? MP/M-1.0 (1979 spec) mentions this, as "Resident System Processes" http://www.bitsavers.org/pdf/digitalResearch/mpm_I/MPM_1.0_Specification_Au… It was a banked switching, multiuser, multitasking system for a Z80/8080. It mentions 5 such processes. Later versions, and the 8086 version still had them. The MP/M-86 docs mention 'Terminal Message', Clock, Echo and 'System Status' processes. I believe the first was spawned one per console. (Some of the internal structures suggest it was intended to support swapping, but I don't know if that was implemented in terms of disk swapping) DF

606

days inactive

611

days old

coff@tuhs.org

Manage subscription

9 comments

8 participants

tags (0)

participants (8)

Bakul Shah
Dan Cross
Derek Fawcus
Greg 'groggy' Lehey
jnc＠mercury.lcs.mit.edu
Larry McVoy
Paul Winalski
Warner Losh