On Thu, Jun 4, 2020 at 12:51 PM Larry McVoy <lm@mcvoy.com> wrote:
On Thu, Jun 04, 2020 at 08:19:58AM -0600, Warner Losh wrote:
> The kicker is that all of the kernel is callback driven. The
> upper half queues the request and then sleeps until the lower half signals
> it to wakeup. And that signal is often just a wakeup done from the
> completion routine in the original request. All of that would be useful in
> userland for high volume activity, none of it is exposed...

Yeah, I've often wondered why this stuff wasn't exposed.  We already have
signal handlers, seems like that maps. 

Was it Rob who said that signals were really just for SIGKILL? Here, signals would be gang-pressed into service as a general IPC mechanism. In fairness, they've mutated that way, but they didn't start out that way. While I obviously wasn't there, the strong impression I get is that by the time people were seriously thinking about async IO in Unix, the die had already been cast for better or worse.
 
I tried to get the NFS guys at Sun to rethink the biod junk and do it like
UFS does, where it queues something and gets a callback.  I strongly suspect
that two processes, one to queue, one to handle callbacks, would be more
efficient and actually faster than the biod nonsense.

That's one of the arguments I lost unfortunately.

Warner, exposing that stuff in FreeBSD is not really that hard, I suspect.
Might be a fun project for a young kernel hacker with some old dude like
you or me or someone, watching over it and thinking about the API.

I'm going to actually disagree with you here, Larry. While I think a basic mechanism wouldn't be THAT hard to implement, it wouldn't compose nicely with the existing primitives. I suspect the edge cases would be really thorny, particularly without a real AST abstraction. For instance, what happens if you initiate an async IO operation, then block on a `read`? Where does the callback happen? If on the same thread, The real challenge isn't providing the operation, it's integrating it into the existing model.

As a counter-point to the idea that it's completely unruly, in Akaros this was solved in the C library: all IO operations were fundamentally asynchronous, but the C library provided blocking read(), write(), etc by building those from the async primitives. It worked well, but Akaros had something akin to an AST environment and fine-grain scheduling decisions were made in userspace: in Akaros the unit of processor allocation is a CPU core, not a thread, and support exists for determining the status of all cores allocated to a process. There are edge cases (you can't roll-your-own mutex, for example, and the basic threading library does a lot of heavy lifting for you making it challenging to integrate into the runtime of a language that doesn't use the same ABI), but by and large it worked. It was also provided by a kernel that was a pretty radical departure from a Unix-like kernel.

        - Dan C.