On Thu, Sep 16, 2021 at 8:34 PM Theodore Ts'o <tytso@mit.edu> wrote:
On Thu, Sep 16, 2021 at 03:27:17PM -0400, Dan Cross wrote:
> >
> > I'm really not convinced trying to build distributed computing into
> > the OS ala Plan 9 is viable.
>
> It seems like plan9 itself is an existence proof that this is possible.
> What it did not present was an existence proof of its scalability and it
> wasn't successful commercially. It probably bears mentioning that that
> wasn't really the point of plan9, though; it was a research system.

I should have been more clear.  I'm not realliy convinced that
building distributed computing into the OS ala Plan 9 is viable from
the perspective of commercial success.  Of course, Plan 9 did it; but
it did it as a research system.

The problem is that if a particular company is convinced that they
want to use Yellow Pages as their directory service --- or maybe X.509
certificates as their authentication system, or maybe Apollo RPC is
the only RPC system for a particularly opinionated site administrator
--- and these prior biases disagree with the choices made by a
particular OS that had distributed computing services built in as a
core part of its functionality, that might be a reason for a
particular customer *not* to deploy a particular distributed OS.

Ah, I take your meaning. Yes, I can see that being a problem. But we've had similar problems before: "we only buy IBM", or, "does it integrate into our VAXcluster?" Put another way, _every_ system has opinions about how to do things. I suppose the distinction you're making is that we can paper over so many of those by building abstractions on top of the "node" OS. But the node OS is already forcing a shape onto our solutions. Folks working on the Go runtime have told me painful stories about detection of blocking system calls using timers and signals: wouldn't it be easier if the system provided real asynchronous abstractions? But the system call model in Unix/Linux/plan9 etc is highly synchronous. If `open` takes a while for whatever reason (say, blocking on reading directory entries looking up a name?) there's no async IO interface for that, hence shenanigans. But that's what the local node gives me; c'est la vie.

Of course, this doesn't matter if you don't care if anyone uses it
after the paper(s) about said OS has been published.

I suspect most researchers don't expect the actual research artifacts to make it directly into products, but that the ideas will hopefully have some impact. Interestingly, Unix seems to have been an exception to this in that Unix itself did make it into industry.

> Plan 9, as just one example, asked a lot of questions about the issues you
> mentioned above 30 years ago. They came up with _a_ set of answers; that
> set did evolve over time as things progressed. That doesn't mean that those
> questions were resolved definitively, just that there was a group of
> researchers who came up with an approach to them that worked for that group.

There's nothing stopping researchers from creating other research OS's
that try to answer that question.

True, but they aren't. I suspect there are a number of confounding factors at play here; certainly, the breadth and size of the standards they have to implement is an issue, but so is lack of documentation. No one is seriously looking at new system architectures, though.
 
However, creating an entire new
local node OS from scratch is challenging[1], and then if you then
have to recreate new versions of Kerberos, an LDAP directory server,
etc., so they all of these functions can be tightly integrated into a
single distributed OS ala Plan 9, that seems to be a huge amount of
work, requiring a lot of graduate students to pull off.

[1] http://doc.cat-v.org/bell_labs/utah2000/   (Page 14, Standards)

Yup. That is the presentation I meant when I mentioned Rob Pike lamenting the situation 20 years ago in the previous message and earlier in the thread.

An interesting thing here is that we assume that we have to redo _all_ of that, though. A lot of the software out there is just code that does something interesting, but actually touches the system in a pretty small way. gvisor is an interesting example of this; it provides something that looks an awful lot like Linux to an application, and a lot of stuff can run under it. But the number of system calls _it_ in turn makes to the underlying system is much smaller.

> What's changed is that we now take for granted that Linux is there, and
> we've stopped asking questions about anything outside of that model.

It's unclear to me that Linux is blamed as the reason why researchers
have stopped asking questions outside of that model.  Why should Linux
have this effect when the presence of Unix didn't?

a) There's a lot more Linux in the world than there ever was Unix. b) There are more computers now than there were when Unix was popular. c) computers are significantly more complex now than they were when Unix was written.

But to be clear, I don't think this trend started with Linux; I get the impression that by the 1980s, a lot of research focused on a Unix-like model to the exclusion of other architectures. The PDP-10 was basically dead by 1981, and we haven't seen a system like TOPS-20 since the 70s.

Or is the argument that it's Linux's fault that Plan 9 has apparently
failed to compete with it in the marketplace of ideas?

It's hard to make that argument when Linux borrowed so many of plan9's ideas: /proc, per-process namespaces, etc.
 
And arguably,
Plan 9 failed to make headway against Unix (and OSF/DCE, and Sun NFS,
etc.) in the early to mid 90's, which is well before Linux's became
popular, so that argument doesn't really make sense, either.

That wasn't the argument. There are a number of reasons why plan9 failed to achieve commercial success relative to Unix; most of them have little to do with technology. In many ways, AT&T strangled the baby by holding it too tightly to its chest, fearful of losing control the way they "lost" control of Unix (ironically, something that allowed Unix to flourish and become wildly successful). Incompatibility with the rest of the world was likely an issue, but inaccessibility and overly restrictive licensing in the early 90s practically made it a foregone conclusion.

Also, it's a little bit of an aside, but I think we often undercount the impact of individual preference on systems. In so many ways, Linux succeeded because, simply put, people liked working on Linux more than they liked working on other systems. You've mentioned yourself that it was more fun to hack on Linux without having to appease some of the big personalities in the BSD world.

        - Dan C.