ATC/OSDI'21 joint keynote: It's Time for Operating Systems to Rediscover Hardware (Timothy Roscoe)

List overview All Threads
Download

newer

older

Thompson trojan put into practice

RegExp decision for meta...

Dan Cross

1 Sep 2021 1 Sep '21

9:58 p.m.

One of the things I really appreciate about participating in this community and studying Unix history (and the history of other systems) is that it gives one firm intellectual ground from which to evaluate where one is going: without understanding where one is and where one has been, it's difficult to assert that one isn't going sideways or completely backwards. Maybe either of those outcomes is appropriate at times (paradigms shift; we make mistakes; etc) but generally we want to be moving mostly forward. The danger when immersing ourselves in history, where we must consider and appreciate the set of problems that created the evolutionary paths leading to the systems we are studying, is that our thinking can become calcified in assuming that those systems continue to meet the needs of the problems of today. It is therefore always important to reevaluate our base assumptions in light of either disconfirming evidence or (in our specific case) changing environments. To that end, I found Timothy Roscoe's (ETH) joint keynote address at ATC/OSDI'21 particularly compelling. He argues that what we consider the "operating system" is only controlling a fraction of a modern computer these days, and that in many ways our models for what we consider "the computer" are outdated and incomplete, resulting in systems that are artificially constrained, insecure, and with separate components that do not consider each other and therefore frequently conflict. Further, hardware is ossifying around the need to present a system interface that can be controlled by something like Linux (used as a proxy more generally for a Unix-like operating system), simultaneously broadening the divide and making it ever more entrenched. Another theme in the presentation is that, to the limited extent the broader systems research community is actually approaching OS topics at all, it is focusing almost exclusively on Linux in lieu of new, novel systems; where non-Linux systems are featured (something like 3 accepted papers between SOSP and OSDI in the last two years out of $n$), the described systems are largely Linux-like. Here the presentation reminded me of Rob Pike's "Systems Software Research is Irrelevant" talk (slides of which are available in various places, though I know of no recording of that talk). Roscoe's challenge is that all of this should be seen as both a challenge and an opportunity for new research into operating systems specifically: what would it look like to take a holistic approach towards the hardware when architecting a new system to drive all this hardware? We have new tools that can make this tractable, so why don't we do it? Part of it is bias, but part of it is that we've lost sight of the larger picture. My own question is, have we become entrenched in the world of systems that are "good enough"? Things he does NOT mention are system interfaces to userspace software; he doesn't seem to have any quibbles with, say, the Linux system call interface, the process model, etc. He's mostly talking about taking into account the hardware. Also, in fairness, his highlighting a "small" portion of the system and saying, "that's what the OS drives!" sort of reminds me of the US voter maps that show vast tracts of largely unpopulated land colored a certain shade as having voted for a particular candidate, without normalizing for population (land doesn't vote, people do, though in the US there is a relationship between how these things impact the overall election for, say, the presidency). I'm curious about other peoples' thoughts on the talk and the overall topic? https://www.youtube.com/watch?v=36myc8wQhLo - Dan C.

Attachments:

attachment.html (text/html — 4.0 KB)

Show replies by date

Tony Finch

2 Sep 2 Sep

8:42 a.m.

Dan Cross <crossd(a)gmail.com> wrote:

...

I'm curious about other peoples' thoughts on the talk and the overall topic?

I saw this talk yesterday and I twote some off-the-cuff thoughts (https://twitter.com/fanf/status/1433004863818960898) I wanted to hear more about the problem of closed firmware (Roscoe's research platform uses OpenBMC). I hope the gradual rise of open firmware will help to fix the problem of platform management controllers treating the OS as the enemy. The talk focuses on Linux because ~everything runs Linux, but Linux was designed for a platform that was defined by Intel and Microsoft, and the disfunctional split that Roscoe points out is exactly the split in design responsibilities between Intel and Microsoft. In the Arm world the split has been reproduced, but with Arm and the SoC manufacturers instead of Intel and the PC clones, and Linux instead of Windows. There’s also Conway’s law, “you ship your org chart”, and in this case Roscoe is talking about the organisation of the computer industry as a whole. So if someone comes up with a better OS architecture, is the implied org chart also successful under capitalism? (end paste) I suppose the historical perspective would be to ask if the way that OS and driver software was developed in the past in vertically-integratewd companies can provide insight into the hardware complexity of today's systems... Tony. -- f.anthony.n.finch <dot(a)dotat.at> https://dotat.at/ Bailey: South or southwest, becoming variable, 2 to 4. Slight. Showers. Good.

John Cowan

3 Sep 3 Sep

12:19 a.m.

On Thu, Sep 2, 2021 at 5:01 AM Tony Finch <dot(a)dotat.at> wrote:

...

Linux was designed for a platform that was defined by Intel and Microsoft, and the disfunctional split that Roscoe points out is exactly the split in design responsibilities between Intel and Microsoft.

As the saying has it: "What Andy giveth, Bill taketh away."

Douglas McIlroy

3:24 a.m.

I set out to write a reply, then found that Marshall had said it all, better..Alas, the crucial central principle of Plan 9 got ignored, while its ancillary contributions were absorbed into Linux, making Linux fatter but still oriented to a bygone milieu. Another entrant in the distributable computing arena is Jay Misra's "orchestrator", Orc, in which practice goes hand-in-hand with theory: https://www.cs.utexas.edu/users/misra/OrcBook.pdf. Doug On Thu, Sep 2, 2021 at 8:19 PM John Cowan <cowan(a)ccil.org> wrote:

...

On Thu, Sep 2, 2021 at 5:01 AM Tony Finch <dot(a)dotat.at> wrote:

As the saying has it: "What Andy giveth, Bill taketh away."

Theodore Ts'o

1:21 p.m.

On Thu, Sep 02, 2021 at 11:24:37PM -0400, Douglas McIlroy wrote:

...

I'm really not convinced trying to build distributed computing into the OS ala Plan 9 is viable. The moment the OS has to span multiple TCB's (Trusted Computing Bases), you have to make some very opinionated decisions on a number of issues for which we do not have consensus after decades of trial and error: * What kind of directory service do you use? x.500/LDAP? Yellow Pages? Project Athena's Hesiod? * What kind of distributed authentication do you use? Kerboers? Trust on first use authentication ala ssh? .rhosts style "trust the network" style authentication? * What kind of distributed authorization service do you use? Unix-style numeric user-id/group-id's? X.500 Distinguished Names in ACL's? Windows-style Security ID's? * Do you assume that all of the machines in your distributed computation system belong to the same administrative domain? What if individuals owning their own workstations want to have system administrator privs on their system? Or is your distributed OS a niche system which only works when you have clusters of machines that are all centrally and administratively owned? * What scale should the distributed system work at? 10's of machines in a cluster? 100's of machines? 1000's of machines? Tens of thousands of machines? Distributed systems that work well on football-sized data centers may not work that well when you only have a few racks in colo facility. The "I forgot how to count that low" challenge is a real one.... There have been many, many proposals in the distributed computing arena which all try to answer these questions differently. Solaris had an answer with Yellow Pages, NFS, etc. OSF/DCE had an answer involving Kerberos, DCE/RPC, DCE/DFS, etc. More recently we have Docker's Swarm and Kubernetes, etc. None have achieved dominance, and that should tell us something. The advantage of trying push all of these questions into the OS is that you can try to provide the illusion that there is no difference between local and remote resources. But that either means that you have a toy (sorry, "research") system which ignores all of the ways in which remote computation which extends to a different node that may or may not be up, which may or may not have belong to a different administration domain, which may or may not have an adversary on the network between you and the remote node, etc. OR, you have to make access to local resources just as painful as access to remote resources. Furthermore, since supporting access remote resources is going to have more overhead, the illusion that access to local and remote resources can be the same can't be comfortably sustained in any case. When you add to that the complexities of building an OS that tries to do a really good job supporting local resources --- see all of the observations in Rob Pike's Systems Software Research is Dead slides about why this is hard --- it seems to me the solution of trying to build a hard dividing line between the Local OS and Distributed Computation infrastructure is the right one. There is a huge difference between creating a local OS that can live on a single developer's machine in their house --- and a distributed OS which requires setting up a directory server, and an authentication server, and a secure distributed time server, etc., before you set up the first useful node that can actually run user workloads. You can try to do both under a single source tree, but it's going to result in a huge amount of bloat, and a huge amount of maintenance burden to keep it all working. By keeping the local node OS and the distributed computation system separate, it can help control complexity, and that's a big part of computer science, isn't it? - Ted

Tony Finch

8 Sep 8 Sep

11:14 a.m.

Theodore Ts'o <tytso(a)mit.edu> wrote:

...

There have been many, many proposals in the distributed computing arena which all try to answer these questions differently. Solaris had an answer with Yellow Pages, NFS, etc. OSF/DCE had an answer involving Kerberos, DCE/RPC, DCE/DFS, etc. More recently we have Docker's Swarm and Kubernetes, etc. None have achieved dominance, and that should tell us something.

I think there are two different kinds of distributed computing there. Distributed authentication and administration is dominated by Microsoft Active Directory (LDAP, Kerberos, DNS, SMB, ...) which I think can reasonably be regarded as part of Windows (even if many Windows machines aren't part of an AD). That kind of distributed system doesn't try to help you stop caring that there are lots of computers. Whereas Kubernetes and Docker Swarm do automatic lifecycle management for distributed workloads. I have not yet had the pleasure (?) of working with them but I get the impression that it's difficult to set up their access control to stop giving everything root on everything else. They try much harder to make a cluster work as a single system. Tony. -- f.anthony.n.finch <dot(a)dotat.at> https://dotat.at/ St Davids Head to Great Orme Head, including St Georges Channel: Easterly or southeasterly 2 to 4, occasionally 5 at first in north, becoming variable 2 to 4 later. Smooth or slight. Showers, perhaps thundery, fog patches later. Moderate or good, occasionally very poor later.

Dan Cross

16 Sep 16 Sep

7:27 p.m.

On Fri, Sep 3, 2021 at 9:23 AM Theodore Ts'o <tytso(a)mit.edu> wrote:

...

On Thu, Sep 02, 2021 at 11:24:37PM -0400, Douglas McIlroy wrote:

I'm really not convinced trying to build distributed computing into the OS ala Plan 9 is viable.

It seems like plan9 itself is an existence proof that this is possible. What it did not present was an existence proof of its scalability and it wasn't successful commercially. It probably bears mentioning that that wasn't really the point of plan9, though; it was a research system. I'll try to address the plan9 specific bits below. The moment the OS has to span multiple

...

TCB's (Trusted Computing Bases), you have to make some very opinionated decisions on a number of issues for which we do not have consensus after decades of trial and error:

Interestingly, plan9 did make opinionated decisions about all of the things mentioned below. Largely, those decisions worked out pretty well. * What kind of directory service do you use? x.500/LDAP? Yellow Pages?

...

Project Athena's Hesiod?

In the plan9 world, the directory services were provided by the filesystem and a user-level library that consumed databases of information resident in the filesystem itself (ndb(6) -- https://9p.io/magic/man2html/6/ndb) It also provided DNS services for interacting with the larger world. The connection server was an idea that was ahead of it's time (ndb(8) -- https://9p.io/magic/man2html/8/ndb) Also, https://9p.io/sys/doc/net/net.html. * What kind of distributed authentication do you use? Kerboers?

...

Trust on first use authentication ala ssh? .rhosts style "trust the network" style authentication?

Plan 9 specifically used a Kerberos-like model, but did not use the Kerberos protocol. I say "Kerberos-like" in that there was a trusted agent on the network that provided authentication services using a protocol based on shared secrets. * What kind of distributed authorization service do you use? Unix-style

...

numeric user-id/group-id's? X.500 Distinguished Names in ACL's? Windows-style Security ID's?

User and group names were simple strings. There were no numeric UIDs associated with processes, though the original file server had a user<->integer mapping for directory entries. * Do you assume that all of the machines in your distributed

...

computation system belong to the same administrative domain?

Not necessarily, no. The model is one of resource sharing, rather than remote access. You usually pull resources into your local namespace, and those can come from anywhere you have access to take them from. They may come from a different "administrative domain". For example, towards the end of the BTL reign, there was a file server at Bell Labs that some folks had accounts on and that one could "import" into one's namespace. That was the main distribution point for the larger community. What if individuals owning their own workstations want to have

...

system administrator privs on their system?

When a user logs into a Plan 9 terminal, they become the "host owner" of that terminal. The host owner is distinguished only in owning the hardware resources of the hosting machine; they have no other special privileges, nor is there an administrative user like `root`. It's slightly unusual, though not unheard of, for a terminal to have a local disk; the disk device is implicitly owned by the host owner. If the user puts a filesystem on that device (say, to cache a dataset locally or something), that's on them, though host owner status doesn't really give any special privileges over the permissions on the files on that filesystem, modulo them going through the raw disk device, of course. That is, the uid==0 implies you bypass all access permission checking is gone in Plan 9. CPU servers have a mechanism where a remote user can start a process on the server that becomes owned by the calling user; this is similar to remote login, except that the user brings their environment with them; the model is more of importing the CPU server's computational resources into the local environment than, say, ssh'ing into a machine. Or is your

...

distributed OS a niche system which only works when you have clusters of machines that are all centrally and administratively owned?

I'm not sure how to parse that; I mean, arguably networks of Unix machines associated with any given organization are "centrally owned and administered"? To get access to any given plan9 network, someone would have to create an account on the file and auth servers, but the system was also installable on a standalone machine with a local filesystem. If folks wanted to connect in from other types of systems, there were mechanisms for doing so: `ssh` and `telnet` servers were distributed and could be used, though the experience for an interactive user was pretty anemic. It was more typical to use a program called `drawterm` that runs as an application on e.g. a Mac or Unix machine and emulates enough of a Plan 9 terminal kernel that a user can effectively `cpu` to a plan9 CPU server. Once logged in via drawterm, one can run an environment including a window system and all the graphical stuff from there. Perhaps the aforementioned Bell Labs file server example clarifies things a bit? * What scale should the distributed system work at? 10's of machines

...

in a cluster? 100's of machines? 1000's of machines? Tens of thousands of machines?

This is, I think, where one gets to the crux of the problem. Plan 9 worked _really_ well for small clusters of machines (10s) and well enough for larger clusters (up to 100s or 1000s). Distributed systems that work

...

well on football-sized data centers may not work that well when you only have a few racks in colo facility. The "I forgot how to count that low" challenge is a real one...

And how. Plan 9 _was_ eventually ported to football-field sized machines (the BlueGene port for DoE was on that scale); Ron can be able to speak to that in more detail, if he is so inclined. In fairness, I do think that required significant effort and it was, of course, highly specialized to HPC applications. My subjective impression was that any given plan9 network would break down at the scale of single-digit thousands of machines and, perhaps, tens of thousands of users. Growing beyond that for general use would probably require some pretty fundamental changes; for example, 9P (the file protocol) includes a client-chosen "FID" in transactions related to any open file, so that file servers must keep track of client state to associate fids to actual files, whether those file refer to disk-resident stable storage or software synthesized "files" for other things (IPC end points; process memory; whatever). There have been many, many proposals in the distributed computing

...

arena which all try to answer these questions differently. Solaris had an answer with Yellow Pages, NFS, etc. OSF/DCE had an answer involving Kerberos, DCE/RPC, DCE/DFS, etc. More recently we have Docker's Swarm and Kubernetes, etc. None have achieved dominance, and that should tell us something. The advantage of trying push all of these questions into the OS is that you can try to provide the illusion that there is no difference between local and remote resources.

Is that the case, or is it that system designers try to provide uniform access to different classes of resources? Unix treats socket descriptors very similar to file descriptors very similar to pipes; why shouldn't named resources be handled in similar ways? But that either means that you

...

have a toy (sorry, "research") system which ignores all of the ways in which remote computation which extends to a different node that may or may not be up, which may or may not have belong to a different administration domain, which may or may not have an adversary on the network between you and the remote node, etc. OR, you have to make access to local resources just as painful as access to remote resources. Furthermore, since supporting access remote resources is going to have more overhead, the illusion that access to local and remote resources can be the same can't be comfortably sustained in any case.

...or some other way, which we'll never know about because no one thinks to ask the question, "how could we do this differently?" I think that's the crux of Mothy's point. Plan 9, as just one example, asked a lot of questions about the issues you mentioned above 30 years ago. They came up with _a_ set of answers; that set did evolve over time as things progressed. That doesn't mean that those questions were resolved definitively, just that there was a group of researchers who came up with an approach to them that worked for that group. What's changed is that we now take for granted that Linux is there, and we've stopped asking questions about anything outside of that model. When you add to that the complexities of building an OS that tries to

...

do a really good job supporting local resources --- see all of the observations in Rob Pike's Systems Software Research is Dead slides about why this is hard --- it seems to me the solution of trying to build a hard dividing line between the Local OS and Distributed Computation infrastructure is the right one. There is a huge difference between creating a local OS that can live on a single developer's machine in their house --- and a distributed OS which requires setting up a directory server, and an authentication server, and a secure distributed time server, etc., before you set up the first useful node that can actually run user workloads. You can try to do both under a single source tree, but it's going to result in a huge amount of bloat, and a huge amount of maintenance burden to keep it all working.

I agree with the first part of this paragraph, but then we're talking about researchers, not necessarily unaffiliated open-source developers. Hopefully researchers have some organizational and infrastructure support! By keeping the local node OS and the distributed computation system

...

separate, it can help control complexity, and that's a big part of computer science, isn't it?

I don't know. It seems like this whole idea of distributed systems built on networks of loosely coupled, miniature timesharing systems has led to enormous amounts of inescapable complexity. I'll bet the Kubernetes by itself is larger than all of plan9. - Dan C.

Theodore Ts'o

17 Sep 17 Sep

12:34 a.m.

On Thu, Sep 16, 2021 at 03:27:17PM -0400, Dan Cross wrote:

...

I'm really not convinced trying to build distributed computing into the OS ala Plan 9 is viable.

I should have been more clear. I'm not realliy convinced that building distributed computing into the OS ala Plan 9 is viable from the perspective of commercial success. Of course, Plan 9 did it; but it did it as a research system. The problem is that if a particular company is convinced that they want to use Yellow Pages as their directory service --- or maybe X.509 certificates as their authentication system, or maybe Apollo RPC is the only RPC system for a particularly opinionated site administrator --- and these prior biases disagree with the choices made by a particular OS that had distributed computing services built in as a core part of its functionality, that might be a reason for a particular customer *not* to deploy a particular distributed OS. Of course, this doesn't matter if you don't care if anyone uses it after the paper(s) about said OS has been published.

...

Plan 9, as just one example, asked a lot of questions about the issues you mentioned above 30 years ago. They came up with _a_ set of answers; that set did evolve over time as things progressed. That doesn't mean that those questions were resolved definitively, just that there was a group of researchers who came up with an approach to them that worked for that group.

There's nothing stopping researchers from creating other research OS's that try to answer that question. However, creating an entire new local node OS from scratch is challenging[1], and then if you then have to recreate new versions of Kerberos, an LDAP directory server, etc., so they all of these functions can be tightly integrated into a single distributed OS ala Plan 9, that seems to be a huge amount of work, requiring a lot of graduate students to pull off. [1] http://doc.cat-v.org/bell_labs/utah2000/ (Page 14, Standards)

...

What's changed is that we now take for granted that Linux is there, and we've stopped asking questions about anything outside of that model.

It's unclear to me that Linux is blamed as the reason why researchers have stopped asking questions outside of that model. Why should Linux have this effect when the presence of Unix didn't? Or is the argument that it's Linux's fault that Plan 9 has apparently failed to compete with it in the marketplace of ideas? And arguably, Plan 9 failed to make headway against Unix (and OSF/DCE, and Sun NFS, etc.) in the early to mid 90's, which is well before Linux's became popular, so that argument doesn't really make sense, either. - Ted

Larry McVoy

12:44 a.m.

On Thu, Sep 16, 2021 at 08:34:52PM -0400, Theodore Ts'o wrote:

...

What's changed is that we now take for granted that Linux is there, and we've stopped asking questions about anything outside of that model.

It's unclear to me that Linux is blamed as the reason why researchers have stopped asking questions outside of that model. Why should Linux have this effect when the presence of Unix didn't?

Linux runs on _everything_. From your phone to the top 500 HPC clusters. Unix, for all that it did, never had that level of success. I credit Unix for a lot, including showing Linux what it should be, but Linux took that model and ran with it. Plan 9 is very cool but I am channeling my inner Clem, Plan 9 didn't meet Clem's law. It was never compelling enough to make the masses love it. Linux was good enough. We can argue about if that is a good thing or not, I've watched Linux become more complex and seen docker et al react to that.

Bakul Shah

5:07 p.m.

On Sep 16, 2021, at 5:44 PM, Larry McVoy <lm(a)mcvoy.com> wrote:

...

On Thu, Sep 16, 2021 at 08:34:52PM -0400, Theodore Ts'o wrote:

What's changed is that we now take for granted that Linux is there, and we've stopped asking questions about anything outside of that model.

It's unclear to me that Linux is blamed as the reason why researchers have stopped asking questions outside of that model. Why should Linux have this effect when the presence of Unix didn't?

Things might have been different if Plan9 was open sourced in the same time frame as 386BSD and early Linux. Back then Linux was not good enough. plan9 was not all that different from unix and simpler to understand and use.

...

We can argue about if that is a good thing or not, I've watched Linux become more complex and seen docker et al react to that.

As Marshall Conover (& later I) said, containers & docker filled a genuine need. Not much to do with the complexity of Linux. Plan9 could've provided a much simpler and cleaner platform than linux but that was not to be. [plan has a couple of global names spaces that might have needed changing. Specifically the process id and '#' driver spaces] -- Bakul

Dan Cross

1:33 a.m.

On Thu, Sep 16, 2021 at 8:34 PM Theodore Ts'o <tytso(a)mit.edu> wrote:

...

On Thu, Sep 16, 2021 at 03:27:17PM -0400, Dan Cross wrote:

I'm really not convinced trying to build distributed computing into the OS ala Plan 9 is viable.

Ah, I take your meaning. Yes, I can see that being a problem. But we've had similar problems before: "we only buy IBM", or, "does it integrate into our VAXcluster?" Put another way, _every_ system has opinions about how to do things. I suppose the distinction you're making is that we can paper over so many of those by building abstractions on top of the "node" OS. But the node OS is already forcing a shape onto our solutions. Folks working on the Go runtime have told me painful stories about detection of blocking system calls using timers and signals: wouldn't it be easier if the system provided real asynchronous abstractions? But the system call model in Unix/Linux/plan9 etc is highly synchronous. If `open` takes a while for whatever reason (say, blocking on reading directory entries looking up a name?) there's no async IO interface for that, hence shenanigans. But that's what the local node gives me; c'est la vie. Of course, this doesn't matter if you don't care if anyone uses it

...

after the paper(s) about said OS has been published.

I suspect most researchers don't expect the actual research artifacts to make it directly into products, but that the ideas will hopefully have some impact. Interestingly, Unix seems to have been an exception to this in that Unix itself did make it into industry.

...

Plan 9, as just one example, asked a lot of questions about the issues you

mentioned above 30 years ago. They came up with _a_ set of answers; that set did evolve over time as things progressed. That doesn't mean that

those

questions were resolved definitively, just that there was a group of researchers who came up with an approach to them that worked for that

group. There's nothing stopping researchers from creating other research OS's that try to answer that question.

True, but they aren't. I suspect there are a number of confounding factors at play here; certainly, the breadth and size of the standards they have to implement is an issue, but so is lack of documentation. No one is seriously looking at new system architectures, though.

...

However, creating an entire new local node OS from scratch is challenging[1], and then if you then have to recreate new versions of Kerberos, an LDAP directory server, etc., so they all of these functions can be tightly integrated into a single distributed OS ala Plan 9, that seems to be a huge amount of work, requiring a lot of graduate students to pull off. [1] http://doc.cat-v.org/bell_labs/utah2000/ (Page 14, Standards)

Yup. That is the presentation I meant when I mentioned Rob Pike lamenting the situation 20 years ago in the previous message and earlier in the thread. An interesting thing here is that we assume that we have to redo _all_ of that, though. A lot of the software out there is just code that does something interesting, but actually touches the system in a pretty small way. gvisor is an interesting example of this; it provides something that looks an awful lot like Linux to an application, and a lot of stuff can run under it. But the number of system calls _it_ in turn makes to the underlying system is much smaller.

...

What's changed is that we now take for granted that Linux is there, and

we've stopped asking questions about anything outside of that model.

It's unclear to me that Linux is blamed as the reason why researchers have stopped asking questions outside of that model. Why should Linux have this effect when the presence of Unix didn't?

a) There's a lot more Linux in the world than there ever was Unix. b) There are more computers now than there were when Unix was popular. c) computers are significantly more complex now than they were when Unix was written. But to be clear, I don't think this trend started with Linux; I get the impression that by the 1980s, a lot of research focused on a Unix-like model to the exclusion of other architectures. The PDP-10 was basically dead by 1981, and we haven't seen a system like TOPS-20 since the 70s. Or is the argument that it's Linux's fault that Plan 9 has apparently

...

failed to compete with it in the marketplace of ideas?

It's hard to make that argument when Linux borrowed so many of plan9's ideas: /proc, per-process namespaces, etc.

...

And arguably, Plan 9 failed to make headway against Unix (and OSF/DCE, and Sun NFS, etc.) in the early to mid 90's, which is well before Linux's became popular, so that argument doesn't really make sense, either.

That wasn't the argument. There are a number of reasons why plan9 failed to achieve commercial success relative to Unix; most of them have little to do with technology. In many ways, AT&T strangled the baby by holding it too tightly to its chest, fearful of losing control the way they "lost" control of Unix (ironically, something that allowed Unix to flourish and become wildly successful). Incompatibility with the rest of the world was likely an issue, but inaccessibility and overly restrictive licensing in the early 90s practically made it a foregone conclusion. Also, it's a little bit of an aside, but I think we often undercount the impact of individual preference on systems. In so many ways, Linux succeeded because, simply put, people liked working on Linux more than they liked working on other systems. You've mentioned yourself that it was more fun to hack on Linux without having to appease some of the big personalities in the BSD world. - Dan C.

Kevin Bowling

2 Sep 2 Sep

3:41 p.m.

On Wed, Sep 1, 2021 at 3:00 PM Dan Cross <crossd(a)gmail.com> wrote:

...

One thing I've realized as the unit of computing becomes more and more abundant (one off HW->mainframes->minis->micros->servers->VMs->containers) the OS increasingly becomes less visible and other software components become more important. It's an implementation detail like a language runtime and software developers are increasingly ill equipped to work at this layer. Public cloud/*aaS is a major blow to interesting general purpose OS work in commercial computing since businesses increasingly outsource more and more of their workloads. The embedded (which includes phones/Fuschia, accelerator firmware/payload, RTOS etc) and academic (i.e. Cambridge CHERI) world may have to sustain OS research for the foreseeable future. There is plenty of systems work going on but it takes place in different ways, userspace systems are completely viable and do not require switching to microkernels. Intel's DPDK/SPDK as one ecosystem, Kubernetes as another - there is a ton of rich systems work in this ecosystem with eBPF/XDP etc, and I used to dismiss it but it is no longer possible to do so rationally. I would go as far as saying Kubernetes is _the_ datacenter OS and has subsumed Linux itself as the primary system abstraction for the next while.. even Microsoft has a native implementation on Server 2022. It looks different and smells different, but being able to program compute/storage/network fabric with one abstraction is the holy grail of cluster computing and interestingly it lets you swap the lower layer implementations out with less risk but also less fanfare. Regards, Kevin

Marshall Conover

8:12 p.m.

Kevin, I think that's a great framing of why this talk actually seemed inverted in its focus for me, and a good identification of why the presenter might see OS development stalling out and ossifying around Linux. I come from the opposite side of the presenter here: my frustration as a backend dev and user has been that modern OSs still think presenting an abstraction over my resources means making it easy to use one single machine (or, as the presenter brings up, a subset of the machine). Instead, my resources are spread out among many machines and a number of remote web services, for which I'd like to have one seamless interface - both for development and use. From an OS perspective, Plan 9 and its daughter systems have come the closest I've seen to addressing this by intentionally thinking about the problem and creating an API system for representing resources that reaches across networks, and a mutable namespace for using and manipulating those APIs. Despite pulling other ideas from 9, the importance of having an opinion on the distributed nature of modern computing seems to have been missed by prominent operating systems today. As a result, their development has been relegated to what they do: be a platform for things that actually provide an abstraction for my resources. And userspace systems have filled the demand for abstracting distributed resource usage to demonstrable business success, if questionable architectural success (as in, they can still be a confusing pain in the buns and require excess work sometimes). As a dev, the systems that have come the closest to presenting one unified abstraction over my resources are the meta-services offered by Google, MS and Amazon such as Azure and AWS. I think the distributed nature of things today is also potentially why the focus of the conference is on distributed systems now, as lamented by the presenter. Granted that I'm not the sharpest bulb in the drawer, but I can't think of a way an OS taking more direct control of the internal hardware of an individual computer would impact me beyond the security issues mentioned in the talk. However, I can think of a number of ways an OS being opinionated about working with networked machines would greatly improve my situation. Boy, it would be great to just spin up a cluster of machines, install one OS on all of them, and treat them as one resource. That's the dream the k8s mentality promises, and MS and Amazon are already walking towards being this sort of one-stop shop: "want cluster computing? Press a button to spin up a cluster with ECS, and store your containers in ECR. Want to run a program or twelve somewhere on the cluster? Just tell us which one and how many. Worried about storage? Just tell us what size storage it needs. We've got you covered!" None of it is perfect, but it shows that there's heavy demand for a system where users don't have to think about how to architect and maintain arbitrary groupings of their resources as necessitated by how OSs think of their job now, and instead just want to feel as if they're writing and running programs on one big 'thing'. So I think the ossification around Linux mentioned in the talk might be that unless operating systems start doing something more than being a host for the tools that actually provide an abstraction over all my resources, there's no real reason to make them do anything else. If you're not making it easier to use my resources than k8s or Azure, why would I want you? Cheers, Marshall On Thu, Sep 2, 2021 at 11:42 AM Kevin Bowling <kevin.bowling(a)kev009.com> wrote:

...

On Wed, Sep 1, 2021 at 3:00 PM Dan Cross <crossd(a)gmail.com> wrote:

Warner Losh

3 Sep 3 Sep

3:56 p.m.

On Wed, Sep 1, 2021 at 4:00 PM Dan Cross <crossd(a)gmail.com> wrote:

...

I'm curious about other peoples' thoughts on the talk and the overall topic?

My comment is that the mental map that he presents has always been a lie. At least it's been a lie from a very early time. Even in Unibus/Qbus days, the add-in cards had some kind of processor on it from an early time. Several of the VAX boards had 68000 or similar CPUs that managed memory. Even the simpler MFM boards had buffer memory that needed to be managed before the DMA/PIO pulled it out of the card. There's always been an element of different address spaces with different degrees of visibility into those address spaces. What has changed is all of these things are now on the SoC die so you have good visibility (well, as good as the docs) into these things. The number of different things has increased, and the for cross domain knowledge has increased. The simplistic world view was even inaccurate at the start.... Warner

Adam Thornton

5:10 p.m.

...

On Wed, Sep 1, 2021 at 4:00 PM Dan Cross <crossd(a)gmail.com> wrote:

I'm curious about other peoples' thoughts on the talk and the overall topic?

Larry McVoy

5:28 p.m.

I am exactly as Adam described, still thinking like it is a PDP-11. Such an understandable machine. For me, out of order execution kind of blew up my brain, that's when I stopped doing serious kernel work, I just couldn't get to a mental model of how you reasoned about that. Though I was talking to someone about it, maybe Clem, recently and came to the conclusion that it is fine, we already sort of had this mess with pipelines. So maybe it is fine, but out of order bugs my brain. On Fri, Sep 03, 2021 at 10:10:57AM -0700, Adam Thornton wrote:

...

Much of the problem, I think, is that: 1) an idealized PDP-11 (I absolutely take Warner's point that that idealization never really existed) is a sufficiently simple model that a Bear Of Little Brain, such as myself, can reason about what's going to happen in response to a particular sequence of instructions, and get fairly proficient in instructing the machine to do so in a non-geological timeframe. 2) a modern CPU? Let alone SoC? Fuggedaboutit unless you're way, way smarter than I am. (I mean, I do realize that this particular venue has a lot of those people in it...but, really, those are people with extraordinary minds.) There are enough people in the world capable of doing 1 and not 2 that we can write software that usually mostly kinda works and often gets stuff done before collapsing in a puddle of nasty-smelling goo. There aren't many people at all capable of 2, and as the complexity of systems increases, that number shrinks. In short, this ends up being the same argument that comes around every so often, "why are you people still pretending that the computer is a PDP-11 when it clearly isn't?" Because, as with the keys and the streetlight, that's what we have available to us. Only a grossly oversimplified model fits into our heads. Adam On Fri, Sep 3, 2021 at 8:57 AM Warner Losh <imp(a)bsdimp.com> wrote: > > > On Wed, Sep 1, 2021 at 4:00 PM Dan Cross <crossd(a)gmail.com> wrote: > >> I'm curious about other peoples' thoughts on the talk and the overall >> topic? >> > > My comment is that the mental map that he presents has always been a lie. > At least it's been a lie from a very early time. > > Even in Unibus/Qbus days, the add-in cards had some kind of processor > on it from an early time. Several of the VAX boards had 68000 or similar > CPUs that managed memory. Even the simpler MFM boards had buffer > memory that needed to be managed before the DMA/PIO pulled it out > of the card. There's always been an element of different address spaces > with different degrees of visibility into those address spaces. > > What has changed is all of these things are now on the SoC die so > you have good visibility (well, as good as the docs) into these things. > The number of different things has increased, and the for cross domain > knowledge has increased. > > The simplistic world view was even inaccurate at the start.... > > Warner >

-- --- Larry McVoy lm at mcvoy.com http://www.mcvoy.com/lm

John Floren

5:42 p.m.

When I took Computer Architecture, "reasoning" about out-of-order execution involved 30-page worksheets where we could track the state of the Tomasulo algorithm through each iteration. It was ludicrously slow work, and wouldn't be a lot of fun even if you had a computerized tool to help step through things instead. If you're talking about a modern Intel CPU where your compiler emits CISC instructions which are actually implemented in RISC instructions in the microcode, which in turn get rewritten and reordered internally by the CPU... it's hard to fault programmers for thinking at the level of the instruction set that's presented to them, even if it looks like a PDP-11. The above should not be read as an endorsement of the CPU status quo, of course :) john ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, September 3rd, 2021 at 10:28 AM, Larry McVoy <lm(a)mcvoy.com> wrote:

...

Much of the problem, I think, is that: 1. an idealized PDP-11 (I absolutely take Warner's point that that idealization never really existed) is a sufficiently simple model that a Bear Of Little Brain, such as myself, can reason about what's going to happen in response to a particular sequence of instructions, and get fairly proficient in instructing the machine to do so in a non-geological timeframe. 2. a modern CPU? Let alone SoC? Fuggedaboutit unless you're way, way smarter than I am. (I mean, I do realize that this particular venue has a lot of those people in it...but, really, those are people with extraordinary minds.) There are enough people in the world capable of doing 1 and not 2 that we can write software that usually mostly kinda works and often gets stuff done before collapsing in a puddle of nasty-smelling goo. There aren't many people at all capable of 2, and as the complexity of systems increases, that number shrinks. In short, this ends up being the same argument that comes around every so often, "why are you people still pretending that the computer is a PDP-11 when it clearly isn't?" Because, as with the keys and the streetlight, that's what we have available to us. Only a grossly oversimplified model fits into our heads. Adam On Fri, Sep 3, 2021 at 8:57 AM Warner Losh imp(a)bsdimp.com wrote: > On Wed, Sep 1, 2021 at 4:00 PM Dan Cross crossd(a)gmail.com wrote: > > > I'm curious about other peoples' thoughts on the talk and the overall > > > > topic? > > My comment is that the mental map that he presents has always been a lie. > > At least it's been a lie from a very early time. > > Even in Unibus/Qbus days, the add-in cards had some kind of processor > > on it from an early time. Several of the VAX boards had 68000 or similar > > CPUs that managed memory. Even the simpler MFM boards had buffer > > memory that needed to be managed before the DMA/PIO pulled it out > > of the card. There's always been an element of different address spaces > > with different degrees of visibility into those address spaces. > > What has changed is all of these things are now on the SoC die so > > you have good visibility (well, as good as the docs) into these things. > > The number of different things has increased, and the for cross domain > > knowledge has increased. > > The simplistic world view was even inaccurate at the start.... > > Warner

Larry McVoy lm at mcvoy.com http://www.mcvoy.com/lm

Lawrence Stewart

7:02 p.m.

I don’t really think out-of-order in hardware causes trouble for programmers that wasn’t already there when you use -O3. Compilers will already promote memory to registers and do interprocedural optimization and reorder memory references. You have to sprinkle asm volatile("" ::: "memory"); around like pixie dust to make sure the compiler does things in the order you want, nevermind the hardware. x86 has wildly complex microarchitecture, but the model for a single thread is perfectly sensible. It seems to work like it was in-order. OOO isn’t changing that model of execution at all. I mean you care about it when performance tuning, but not for correctness. Other architectures, ARM, IBM, Alpha, are far worse in this respect. The real problems are when you have multicore in a shared memory space and you have to read all the fine print in the memory ordering chapter and understand the subtle differences between LFENCE and SFENCE and MFENCE. I get that, but I also think shared memory is a failed experiment and we should have gone with distributed memory and clusters and message passing. It is possible for mortals to program those. Regular folks can make 1000 rank MPI programs work, where almost noone other than Maurice Herlihy or like that can reason about locks and threads and shared data structures. My wish for a better world is to integrate messaging into the architecture rather than have an I/O device model for communications. It is criminal that machine to machine comms are still stuck at 800 nanoseconds or so latency. It takes 200 instructions or so to send a message under the best circumstances and a similar number to receive it, plus bus, adapter, wire, and switch time. -L

Clem Cole

7:11 p.m.

Larry - it's the compiler (code generator) folks that I really feel bad for. They had to deal with the realities of the ISA in many ways more than we do and for them, it's getting worse and worse. BTW - there was a misstatement in a previous message. A current CISC system like the INTEL*64 is not implemented as a RISC µcode, nor are current more RISCy machines like the SPARX and Alpha much less the StrongARM and its followers. What they are internally are *data flow machines *which is why you getting a mixing of instruction ordering, scoreboarding, and all sorts of complexities that blows our mind. At least at the OS we have been used to doing things in parallel, exceptions and interrupts occurring and we have reasoned our ways through things. Butler Lampson and Leslie Lamport gave a parallel calculus to help verify things (although Butler once observed at an old SOSP talk that the problem with parallel is what does 'single step the processor mean anymore.' ). So the idea while the processor is not a PDP-10 or PDP-11 much less a 360/91 or a CDC-6600, we build a model in our heads that does simplify the machine(s) as much as possible. We ensure at least that is correct and then, build up more complexity from there. To me, the problem is that we too often do a poor job of what should be the simple stuff and we continue to make it too complicated. Not to pick on any one group/code base, but Jon's recent observation about the Linux kernel FS interface is a prime point. It's not the processor that was made complex, it's the SW wanting to be all things to all people. To me what Unix started and succeed at its time, and clearly Plan9 was attempted in its time (but failed commercially) was to mask if not toss out as much of the complexity of the HW and get to a couple of simple and common ideas and all programs could agree. Going back to the idea of the bear of the 'slittle brain and try to expose the simplest way to computation. Two of the best Unix talks/papers ever, Rob's "cat -v is a bad idea" and Tom's "All Chips that Fit" has morphed into "I have 64-bits of address space I can link anything into my framework" and "what I power and cool in my current process technology" [a SoC is not different that the board level products that some of us lived]. I recently read a suggestion that the best way to teach begging students to be "good programmers" was to "introduce them to as many frameworks as possible and teach as little theory as they need." I nearly lost my dinner. Is this what programming has come to? Framework/Access Methods/Smart Objects .... To be fair, my own employer is betting on DPC++ and believing OneAPI as the one ring to rule them all. There is a lot to be said of "small is beautiful." How did we get from Sixth Edition UNIX with K&R1 to today? One transistor and one line a code at a time. ᐧ On Fri, Sep 3, 2021 at 1:29 PM Larry McVoy <lm(a)mcvoy.com> wrote:

...

fairly

proficient in instructing the machine to do so in a non-geological timeframe. 2) a modern CPU? Let alone SoC? Fuggedaboutit unless you're way, way smarter than I am. (I mean, I do realize that this particular venue has

lot of those people in it...but, really, those are people with extraordinary minds.) There are enough people in the world capable of doing 1 and not 2 that we can write software that usually mostly kinda works and often gets stuff done before collapsing in a puddle of nasty-smelling goo. There aren't many people at all capable of 2, and as the complexity of systems increases, that number shrinks. In short, this ends up being the same argument that comes around every so often, "why are you people still pretending that the computer is a PDP-11 when it clearly isn't?" Because, as with the keys and the streetlight, that's what we have available to us. Only a grossly oversimplified model fits into our heads. Adam On Fri, Sep 3, 2021 at 8:57 AM Warner Losh <imp(a)bsdimp.com> wrote: > > > On Wed, Sep 1, 2021 at 4:00 PM Dan Cross <crossd(a)gmail.com> wrote: > >> I'm curious about other peoples' thoughts on the talk and the overall >> topic? >> > > My comment is that the mental map that he presents has always been a

lie.

> At least it's been a lie from a very early time. > > Even in Unibus/Qbus days, the add-in cards had some kind of processor > on it from an early time. Several of the VAX boards had 68000 or

similar

> CPUs that managed memory. Even the simpler MFM boards had buffer > memory that needed to be managed before the DMA/PIO pulled it out > of the card. There's always been an element of different address spaces > with different degrees of visibility into those address spaces. > > What has changed is all of these things are now on the SoC die so > you have good visibility (well, as good as the docs) into these things. > The number of different things has increased, and the for cross domain > knowledge has increased. > > The simplistic world view was even inaccurate at the start.... > > Warner >

-- --- Larry McVoy lm at mcvoy.com http://www.mcvoy.com/lm

Jon Steinhart

5:46 p.m.

New subject: ATC/OSDI'21 joint keynote: It's Time for Operating Systems to Rediscover Hardware [ really a comment on SoCs ]

Adam Thornton writes:

...

2) a modern CPU? Let alone SoC? Fuggedaboutit unless you're way, way

Don't know how many of you are hardware/software folks as opposed to just software, but it turns out that SoCs are not immune to the problems that result from bolting disparate stuff together in the software world. A few years ago I was working on a project that included an Atmel SoC. Was having some weirdness, and contacted Atmel to ask them about timing issues between two of the DMA controllers that was unspecified in their documentation but critical to making the project work. Their initial response was "Well, we don't know, that's IP that we bought from ARM, call them and ask." I replied "I don't think that they're gonna talk to me because I'm not the one that bought that IP; you did and it's your job." Fortunately, at that time I knew someone who could escalate the issue for me, and only two months later got the timing numbers that I needed. One could be understandably think that more attention is given to bolting IP blocks together on an SoC than is given when bolting a thousand libraries onto a simple program but that turns out to not be true. Software folks need to be prepared for the fact that there may actually be nobody who knows how portions of the hardware actually work because it's not all designed at one place anymore. We have to be prepared for the SoC version of node.js. Jon

Dan Cross

16 Sep 16 Sep

6:38 p.m.

On Wed, Sep 1, 2021 at 5:58 PM Dan Cross <crossd(a)gmail.com> wrote:

...

[snip]

First, thank you for all of the thoughtful responses, both on-list and off. An interesting theme in many of the responses was essentially questioning whether the underlying OS still matters, since the focus on development has shifted to higher levels? E.g., we now provision components of our enormously large and complicated distributed applications with building blocks like containers, less physical machines, let alone processes etc. That is certainly a trend, but it strikes me that those containers have to run somewhere, and at some point, we've still got instructions executing on some CPU, modifying words of memory, registers, etc; presumably all of that runs under the control of an operating system. It is a worthwhile question to ask whether that operating system still matters at all: what we have works, and since it's so hidden behind layers upon layers of abstraction, do we really care what it is? But I claim that it does perhaps more than most folks realize. Certainly, there are metrics that people care about (tail latency, jitter, efficiency at the 90th, 95th, 99th percentile...) and OS effects can have outsized impacts there; Mothy's talk alludes to this when he talks about all of the hidden processing that's happening all over a modern computer, eventually some of that trickles onto the cores that are running one's containerized Node application or whatever (lookin' at you, SMM mode...). At the end of the day, the code we care about still runs in some process under some OS on some bit of physical hardware, regardless of all of the abstractions we've placed on top of those things. What that system does, and the abstractions that its interface provides to programs, still matters. Perhaps another question worth asking is, does it make sense to look at different models for those systems? My subjective impression is that, back in the 60s and 70s, there was much greater variation in system architectures than today. A common explanation for this is that we didn't know how to build systems at the time, so folks threw a lot of stuff at the wall to see what would stick. But we no longer do that...again, Mothy alludes to this in his brief survey of OSDI papers: basically, new systems aren't being presented. Rob Pike also lamented that state of affairs 20 years ago, so it's been going on for a while. Does that mean that we've come up with a recipe for systems that work and work well, and therefore we don't need to rethink those basic building blocks? Or does that mean that we're so used to our systems working well enough that we've become myopic about their architecture, and thus blind to their faults? - Dan C.

Jon Steinhart

7:34 p.m.

Dan Cross writes:

...

Well, I have two different thoughts on this question. The obvious one to me is that of course it matters. I'll start by drawing a lame analogy as our 12 week deck reconstruction project just finished up an hour ago. In a conversation with the contractor he said "You're technically doing a repair so you don't have to pull a permit and bring it up to current code. But, there are some things that I recommend that you do anyway because the technology is much better than what it was when this thing was built 30 years ago." My point here is that sure, we can survive with what's there, but that doesn't mean that we should ignore new technology. I had dinner with one of the people I mentor a few weeks ago. He told me that while he was generally ok working for Google and making gobs of money that he felt sort of empty. He told me that if his project got cancelled he didn't think that the world would notice, and as a mid-20s person seeing the planet that he's living on burning up, he's thinking about changing careers to something that's more meaningful (to him). I think that there is still plenty of work to do on things like efficiency that would be easy to justify if politically energy wasn't so heavily subsidized. Think back a few decades to all of the power-saving changes that the OLPC spawned. Most felt like things were fine the way they were back then too. I think that layers upon layers is wasteful. As I've said before, I'm having difficulty distinguishing the "full stack" in full stack programming from a compost heap. It's not OK to me from a security, safety, and reliability perspective to build on a rotting foundation. It's my opinion that the whole container thing sort of started as a "we can't secure the underlying system so we'll build something secure on top" combined with "it's no fun to fix the unnecessary incompatible mess among virtually identical systems that we've made so we'll build a new fix-it layer" ideologies. How long until problems are found with containers it's decided that the way to fix it is to build "safe deposit boxes" that run in container? Is there ever an end in sight? My second train of thought is to ask the question "what is a computer?" Having started in an era where computing meant basking in the warm glow of vacuum tubes, it's easy to think that it's the hardware. But somewhere along the line microcode made an appearance, followed by nanocode and so on. I'm guessing, especially from spending time with current CS students, that to many a computer is some high-level abstraction. So the question is, below what layer is considered a computer today?

Larry McVoy

7:41 p.m.

On Thu, Sep 16, 2021 at 12:34:15PM -0700, Jon Steinhart wrote:

...

As I've said before, I'm having difficulty distinguishing the "full stack" in full stack programming from a compost heap. It's not OK to me from a security, safety, and reliability perspective to build on a rotting foundation.

Amen.

...

It's my opinion that the whole container thing sort of started as a "we can't secure the underlying system so we'll build something secure on top" combined with "it's no fun to fix the unnecessary incompatible mess among virtually identical systems that we've made so we'll build a new fix-it layer" ideologies. How long until problems are found with containers it's decided that the way to fix it is to build "safe deposit boxes" that run in container? Is there ever an end in sight?

I think it is that the newer kids are less willing to understand stuff. So they build something on top that they understand. I agree that they will hit problems and likely build "safe deposit boxes" because the containers are "too complex". Oh, and get off my lawn!

Marshall Conover

11:14 p.m.

While I got a chuckle from this, it focuses on security, and I don't think security sold docker containers. I think what really sold containers was their ability to solve the old, hard problems of configuring and maintaining servers. Docker's use of per-process namespaces is a godsend for running different services on the same machine. I no longer run into two applications fighting over dependency versions, because both applications are running in their own world. This was somewhat possible in chroots, but as someone who tried to use chroots that way a decade ago, docker's made it trivial. Containers are also a godsend for applications that have to be deployed somewhere else. I know a container I deploy will have everything it needs wherever it goes, and will be exactly the thing I built and tested. It's hard to understate the benefits of this: when deploying, I no longer run into issues like "oh shoot, there was some configuration I forgot about on the dev server that I need for prod." I no longer have to create server configuration documentation either, as the documentation is "docker build," followed by "docker run." When we were first starting out on our current project, we built a container that runs our build system's agents. At one point the VM on which we were running those agents went down, and our stop-gap fix was to download and run a few copies of that container locally. As a result, we had builds going the entire time we worked to fix the issue. --------------- Separately, for the larger discussion, I think the abstraction-aerospace-engineering seen over the last few decades comes from the adage "necessity is the mother of invention." People writing business logic today are targeting an OS-independent platform: the browser. That's where developers need solutions, and that's where we see movement. Considering this, it's no surprise the browser has stumbled backwards from a markup language-renderer into a full platform for downloading and running applications and managing their resources, as well as providing complex abstractions for interacting with distributed systems. And it's no surprise those distributed systems have separated as much as possible from whatever's not the browser. In fact, we're seeing agreement in the browser ecosystem for problems like the directory system choice mentioned above. The OIDC workflow was born out of the internet's many-users-to-many-services issue. Now, it's such a decided approach for managing users' access to services that big names like Amazon and Google offer identity provider services using it, and I, as a service writer, can swap between any of them transparently. The services I run only care that the token they're handed is signed by the auth server they're configured to use, and that the token says the user is allowed to use the service contacted. The applications I write and use have no clue what the OS' permissions are for anything they deal with. For them, OS permissions have been made redundant. With this context, I think most of us here have learned by experience why the OS gets no more development, in every discussion they've had with management where they've said "we need to refactor some code that is wonky, but mostly works, because there will probably be errors and bugs and security issues in the future if we don't." Management - which in this case, means the world at large - demands new features, not unspecified heisen-benefits from redoing things that already work. For new features, the browser is their only recourse. And, to boot - if you change the thing under the browser, what if it breaks the browser? Cheers! Marshall On Thu, Sep 16, 2021 at 3:41 PM Larry McVoy <lm(a)mcvoy.com> wrote:

...

On Thu, Sep 16, 2021 at 12:34:15PM -0700, Jon Steinhart wrote:

Amen.

Rob Pike

11:44 p.m.

I believe that Docker would not have been nearly as successful if shared libraries had not taken over the Unix world. Docker would have been helpful for putting together Python and Java installations, with all their ancillary bits, but less so for C binaries. Now, though, we run an OS on the machine, put a VM on that, install a container on that, and run another VM (or JVM) inside the container, then maybe we're at the level where we can execute actual code. And that misses all the extra pieces that are distributed and layered and indirected and virtualized for management, logging, security, data warehousing, ... It's a wonder any of it works at all. Here's a map to guide you: https://landscape.cncf.io/ -rob On Fri, Sep 17, 2021 at 9:15 AM Marshall Conover <marzhall.o(a)gmail.com> wrote:

...

On Thu, Sep 16, 2021 at 12:34:15PM -0700, Jon Steinhart wrote: > As I've said before, I'm having difficulty distinguishing the "full

stack"

> in full stack programming from a compost heap. It's not OK to me from

security, safety, and reliability perspective to build on a rotting foundation.

Amen. > It's my opinion that the whole container thing sort of started as a "we > can't secure the underlying system so we'll build something secure on

top"

> combined with "it's no fun to fix the unnecessary incompatible mess

among

> virtually identical systems that we've made so we'll build a new fix-it > layer" ideologies. How long until problems are found with containers > it's decided that the way to fix it is to build "safe deposit boxes"

that

run in container? Is there ever an end in sight?

Larry McVoy

17 Sep 17 Sep

12:37 a.m.

On Fri, Sep 17, 2021 at 09:44:42AM +1000, Rob Pike wrote:

...

Now, though, we run an OS on the machine, put a VM on that, install a container on that, and run another VM (or JVM) inside the container, then maybe we're at the level where we can execute actual code. And that misses all the extra pieces that are distributed and layered and indirected and virtualized for management, logging, security, data warehousing, ... It's a wonder any of it works at all.

I think a huge part of why it works is Pugs exporting the IOMMU to user space. If I understand that work correctly, it really doesn't matter how many layers you have, if the layers use what he did, then the layers don't matter for I/O and I/O is usually where things get slow. Rob, that link you sent makes my head spin. Which was probably your intent. --lm

Jon Steinhart

1:38 a.m.

Marshall Conover writes:

...

Separately, for the larger discussion, I think the abstraction-aerospace-engineering seen over the last few decades comes from the adage "necessity is the mother of invention." People writing business logic today are targeting an OS-independent platform: the browser.

Wow. I think that it would be more accurate to say that people writing business logic today are targeting the browser because other people are going through the trouble of porting it to different platforms. Doesn't seem to me the best use of resources given that browsers are more complex than operating systems. And I've had many an experience with things that are not portable among browsers. Of course, given that firefox is trying to die by regularly alienating users and that to a first approximation much everything else is chrome or chrome based, you're effectively saying that people are targeting a single operating system even though we don't call a brower an OS. And while there's no going back, I think that browsers suck. Doesn't seem that anybody had the foresight in the early days to realize that they were really building a virtual machine, so the one that we have ended up with is a Rube Goldberg contraption. CSS is one of the brower components that I find especially appalling. I understand its genesis and all that. Would be lovely to be able to make stuff just work without having to "program". Despite revision after revision it's still impossible to lay out things as desired without reverting to JavaScript. While it didn't start that way, at this point there are so many properties with twisty and often undocumented interactions that it would be better to toss it and just write programs. Of course, programming is "hard" and it's supposedly easier to pore though online forums looking for answers to questions like "I think that this can be done but I have no idea how so can someone please share an incantation with me. I personally prefer a language that has a small number of primitives that can be combined in an understandable manner than a huge number of poorly documented primitives that no one person fully understands. And don't tell me that JavaScript is the answer; while it has some good, it suffers from being the dumping ground for people who were never able to get their favorite feature into other languages; it's an incoherent mess. I know that Larry and Clem and I agree on the value of past work. I was a proofreader for Alan Wirf-Brock's 20 years of JavaScript article. I was busy with other stuff when JavaScript began so wasn't familiar with some of the history. Kind of shook my head reading about Eich's big week-long sprint to get the parser up and running. Though to myself that it would have only been a half-day sprint at most had he used existing tools such as lex and yacc, and had he done so we wouldn't still be suffering from the optional semicolon problem 20 years later. Don't mean to offend anybody here, all just my opinion. Jon

John Cowan

3:54 a.m.

On Thu, Sep 16, 2021 at 7:15 PM Marshall Conover <marzhall.o(a)gmail.com> wrote:

...

I know a container I deploy will have everything it needs wherever it goes, and will be exactly the thing I built and tested.

Up to a point, Minister. You can mount a scratch monkey, but you can't mount a scratch Internet or a scratch AWS or a scratch Google.

...

At one point the VM on which we were running those agents went down, and our stop-gap fix was to download and run a few copies of that container locally.

That's true if the container isn't too vulgar big. I can run $EMPLOYER's whole application on my laptop in the root environment, but running it in Docker is too costly even though that's how it's deployed on AWS.

...

from the adage "necessity is the mother of invention." People writing business logic today are targeting an OS-independent platform: the browser.

Most actual business logic is still in the back end, at least at my part of the coal face. The browser is more of a programmable platform as time goes by, but it's still a Blit even if no longer just a 3270.

...

Management - which in this case, means the world at large - demands new features, not unspecified heisen-benefits from redoing things that already work.

There is a pressure toward that. But when $CLIENTS (who are a lot bigger than $EMPLOYER) start to complain about how often the application they are paying $$$$$$$ for falls over due to lack of robustness, things change. Not everything can be startup-grade.

Jon Steinhart

16 Sep 16 Sep

11:45 p.m.

Larry McVoy writes:

...

Like usual, we agree on this sort of stuff. A conundrum for me is that this stuff that "they understand" is in my opinion way more complicated than understanding computer hardware and/or an operating system. So I'm not sure where the win is. Jon

Al Kossow

17 Sep 17 Sep

12:06 a.m.

On 9/16/21 4:45 PM, Jon Steinhart wrote:

...

A conundrum for me is that this stuff that "they understand" is in my opinion way more complicated than understanding computer hardware and/or an operating system.

When a young'un says something like this to snapshot the data in a structure, I'm convinced my title at CHM should be "20th Century Software Curator" because it took me an hour to figure out what that sentence even meant. "You specialize some template for the structures you want to be able to save, and use a concept for a visitor to register members."

John Cowan

4:06 a.m.

On Thu, Sep 16, 2021 at 8:13 PM Al Kossow <aek(a)bitsavers.org> wrote:

...

Y'know, the guy who invented that jargon is a younker of 70 (and from the Labs at that).

Al Kossow

4:18 a.m.

On 9/16/21 9:06 PM, John Cowan wrote:

...

Y'know, the guy who invented that jargon is a younker of 70 (and from the Labs at that).

and we had a version of "Talking Moose" called "Talking Barney" at Apple in the late 80's making fun of him and his language. The tuft of hair on top would pop up when he said something exceptional. I've not been able to find a copy of it on my backups, sadly.

Larry McVoy

12:32 a.m.

On Thu, Sep 16, 2021 at 04:45:15PM -0700, Jon Steinhart wrote:

...

Larry McVoy writes:

Someone, sorry, I suck at names, I think he is in aerospace or similar, had a pretty rational view on why docker made things easier. It was today so should be easy to find. The part that I don't understand is why it seems so hard to deploy stuff today. We supported the same application, a pretty complicated one, 636K lines of code, on every Unix variant, Linux {32,64} {every arch including IBM 360}, MacOS {PPC, x86, and I'm working on M1}, Windows {XP..} and it wasn't that hard. Granted, of the core team, I'm the least intelligent so I hired well, but still.

David Arnold

16 Sep 16 Sep

11:54 p.m.

...

On 17 Sep 2021, at 05:41, Larry McVoy <lm(a)mcvoy.com> wrote:

<…>

...

Writing a new OS in the 70s or even early 80s meant that you had to replace or port a compiler toolchain, an editor, an email client, a news reader, an IRC client, a couple of simple games, and whatever applications your university/research lab needed to justify its grant money. It was a chunk of work, but it was surmountable by a small team or even a dedicated individual. It was demonstrably possible to build your own machine from CPU upward within a reasonable timeframe (eg. Wirth’s Oberon). It’s still possible (and perhaps even easier) to do that today, but no-one’s really happy with an OS that only provides a handful of applications. In particular, as has been widely stated, a modern web browser is orders of magnitude more work than an OS. But expectations for every type of application have moved on, and a modern editor, chat/messaging app, or game is also likely orders of magnitude more complex and feature-rich than what was once acceptable. For a while, it was possible to have a “POSIX emulation” module/layer/whatever (was Mach the first to go this route?) as a shortcut to this but the breadth of the APIs needed to run, eg. Chrome/ium is again orders of magnitude more work than what was needed to port vi/emacs/rn/etc. And it’s not just those applications: to have your new OS be useful, you need to support a dozen languages, a hundred protocols, thousands of libraries … a vast range of stuff that would take years, perhaps decades, to port over or reinvent in your new paradigm. The idea that you’d turn your back on the accumulated value of 50 years of countless people’s work because your set of system calls is slightly better than the one you’ve got now … that’s a very, very big call. So I think the notion that “the kids” are less willing to understand, or to drill deep, is doing them a disservice. They do understand, and they (mostly) make the choice to leverage that body of work rather than embark on the futility of starting afresh. d

Jon Steinhart

17 Sep 17 Sep

1:10 a.m.

David Arnold writes:

...

And it’s not just those applications: to have your new OS be useful, you need to support a dozen languages, a hundred protocols, thousands of libraries … a vast range of stuff that would take years, perhaps decades, to port over or reinvent in your new paradigm. The idea that you’d turn your back on the accumulated value of 50 years of countless people’s work because your set of system calls is slightly better than the one you’ve got now … that’s a very, very big call. So I think the notion that “the kids” are less willing to understand, or to drill deep, is doing them a disservice. They do understand, and they (mostly) make the choice to leverage that body of work rather than embark on the futility of starting afresh.

I have to respectfully disagree which is a challenge because being disagreeable comes more naturally than being respectful :-) We already have this. I kind of wonder what actual value could have been created with the resources that went into supporting the dozen languages, hundred protocols, and so on. Is there value to me that a library exists that lets me to something in python that is identical to the library that lets me do that same thing in perl or the one that lets me do it in php or the one that lets me do it in ...? No. You made a big assumption that I was suggesting tossing prior work and API specs which I wasn't. Would you have wanted to have the 32 bit system call API frozen because it worked and not wanted 64 bit versions? History shows plenty of good work going into compatibility when the underlying technology evolves. Don't know how much time you spend with "the kids" these days. I make it a point to do so when I can; SIGCOVID has cut into that unfortunately. One can get a CS degree without ever taking an OS course at many respectable institutions. Many are not so much making a choice as doing what they can which is inadequate in my opinion. Was discussing this with someone the other day. I'm glad that I have an engineering degree instead of a computer science degree. And I'm also glad that I prevailed with one of my mentees to double major in EE and CS when he wanted to bail on the EE. While it's a generalization, as an engineeer I was educated on how the universe worked - chemistry, physics, and so on. It was up to me to figure out how to apply that knowledge - I wasn't taught how to lay out a circuit board or to use a CAD package or to write an app. A modern CS degree at many institutions is vocational training in programming. It's not the same thing. Jon

Larry McVoy

1:28 a.m.

On Thu, Sep 16, 2021 at 06:10:11PM -0700, Jon Steinhart wrote:

...

Was discussing this with someone the other day. I'm glad that I have an engineering degree instead of a computer science degree.

I have and BS and MS in computer science, with most of a minor in systems arch from the EE department (but not all, I did it for fun, I should have finished it). That said, when people ask me what I am, I say I'm an engineer. And proud of it, I love being an engineer, to me it just means you are someone who figures stuff out.

Jon Steinhart

1:40 a.m.

Larry McVoy writes:

...

On Thu, Sep 16, 2021 at 06:10:11PM -0700, Jon Steinhart wrote:

Was discussing this with someone the other day. I'm glad that I have an engineering degree instead of a computer science degree.

I'm the opposite. Would have double-majored in EE and CS but my school wouldn't allow double majoring unless you stayed for 9 semesters. I finished my EE in 7 semesters and was 2 classes shy of the CS degree and couldn't justify another year just for 2 classes. In hindsight I'm not sure that that was the correct choice because I would have partied my ass off.

Larry McVoy

2:04 a.m.

On Thu, Sep 16, 2021 at 06:40:47PM -0700, Jon Steinhart wrote:

...

Larry McVoy writes:

On Thu, Sep 16, 2021 at 06:10:11PM -0700, Jon Steinhart wrote:

Was discussing this with someone the other day. I'm glad that I have an engineering degree instead of a computer science degree.

You do you, I had undiagnosed ADD so partys in college were just an awkward cringe fest for me. Looking backwards, now that I've figured that out, yeah, I can see it, sort of, if I knew then what I know now. I was pretty committed to learning in college. I'm not trying to judge or anything, it was just such a fun focussed time for me, I'd happily give up a party (where I was gonna get nothing) for a few hours on slovax where the BSD source was. Details aside, I think we both self identify as engineers and love it.

Jon Steinhart

2:21 a.m.

Larry McVoy writes:

...

OK, we're getting pretty far afield from UNIX. Probably wasn't using the word "party" in the traditional sense - of the drugs, sex, and rock-n-roll college was consumed by the last two. BSD wasn't a thing when I was in school; I do have a memory of coming in to work at BTL after dinner one evening when Ken was walking out the door with mag tapes under his arm on his way to Berkeley for sabbatical. I got to play with computers and UNIX at BTL, college was building and operating radio stations. Jon

Theodore Ts'o

2:48 a.m.

On Thu, Sep 16, 2021 at 06:10:11PM -0700, Jon Steinhart wrote:

...

You made a big assumption that I was suggesting tossing prior work and API specs which I wasn't. Would you have wanted to have the 32 bit system call API frozen because it worked and not wanted 64 bit versions? History shows plenty of good work going into compatibility when the underlying technology evolves.

Unfortunately, Implementing all of the old API specs is not easy if you're starting from scratch as you create your new OS. As David wrote in an earlier paragraph:

...

For a while, it was possible to have a “POSIX emulation” module/layer/whatever (was Mach the first to go this route?) as a shortcut to this but the breadth of the APIs needed to run, eg. Chrome/ium is again orders of magnitude more work than what was needed to port vi/emacs/rn/etc.

And this is the same observation Rob Pike made in his cri du coeur in 2000: * To be a viable computer system, one must honor a huge list of large, and often changing, standards: TCP/IP, HTTP, HTML, XML, CORBA, Unicode, POSIX, NFS, SMB, MIME, POP, IMAP, X, ... * A huge amount of work, but if you don't honor the standards, you're marginalized. * I estimate that 90-95% of the work in Plan 9 was directly or indirectly to honor externally imposed standards.

...

Don't know how much time you spend with "the kids" these days. I make it a point to do so when I can; SIGCOVID has cut into that unfortunately. One can get a CS degree without ever taking an OS course at many respectable institutions. Many are not so much making a choice as doing what they can which is inadequate in my opinion. Was discussing this with someone the other day. I'm glad that I have an engineering degree instead of a computer science degree. And I'm also glad that I prevailed with one of my mentees to double major in EE and CS when he wanted to bail on the EE. While it's a generalization, as an engineeer I was educated on how the universe worked - chemistry, physics, and so on. It was up to me to figure out how to apply that knowledge - I wasn't taught how to lay out a circuit board or to use a CAD package or to write an app. A modern CS degree at many institutions is vocational training in programming. It's not the same thing.

When I studied CS in the late 80's, MIT's EE/CS department required all EE's and CS's to take 2 foundational CS classes, and 2 foundational EE classes. This meant that CS folks needed to understand how to build op-amps from transitors (6.002) and descrete and continuous FFT's (6.003). And EE folks needed to be able understand Lambda calculus (6.001), and to build stack and general register computers using 74XX TTL chips on a breadbroad (6.004). Furthermore, CS students needed to take an OS course and/or a compiler course, so by the time you graduated, you understood computers from a "full stack" perspective --- from transitors, to AND/OR gates, to CPU design, to compilers, to OS's, to systems issues around desining big systems like Multics and the SABRE Airline Reservations systems. These days, at MIT, one of things students are taught is how figure out what an under-documented abstraction (implemented in Python), partially by reading the code (but it's way too complex), so it's mostly by deducing the abstraction by running experiments on the Python Library code in question. Given how complex computers have gotten, that's probably more realistic anticipation of what students will need once they graduate, but yeah, it does seem a bit like "vocational training in programming". And that's quite a change. When I was an undergraduate, MIT was proud of the fact that they didn't teach CS students the C language; after all, that would be *way* too practical/vocational. The assumption was after you learned Scheme and CLU, you'd be able pick up other languages on the fly. (And they didn't really *teach* Scheme per se; the first lecture in 6.001 was about the Lambda calculus, and if you couldn't figure out Scheme syntax from the reference manual so you could do the first problem set, well, the EE/CS department was heavily over-subscribed anyway, and it was a good way of filtering out the less committed undergraduates. :-) - Ted

Bakul Shah

5:39 p.m.

On Sep 16, 2021, at 4:54 PM, David Arnold <davida(a)pobox.com> wrote:

...

I have mixed feelings about this. Unix didn't "throw away" the mainframe world of computing. It simply created a new ecosystem, more suited for the microprocessor age. For IBM it was perhaps the classic Innovator's Dilemma. Similarly now we have (mostly) the Linux ecosystem, while the actual hardware has diverged a lot from the C memory model. There are security issues. There is firmware running on these system about which the OS knows nothing. We have processors like Esperanto Tech's 1088 64 bit Risc-V cores, each with its own vector/tensor unit, 160MB onchip sram and 23.8B transistors but can take only limited advantage of it. We have super performant GPUs but programming them is vendor dependent and a pain. If someone can see a clear path through all this, and create a new software system, they will simply generate a new ecosystem and not worry about 50 years worth of work.

Jon Steinhart

5:51 p.m.

Bakul Shah writes:

...

You're kind of reminding me of the HEP (heterogeneous element processor) talk that I saw at I think Usenix in Santa Monica. My opinion is that it was a "kitchen sink" project - let's throw in a few of these and a few of those and so on. Also analogous to what I saw in the housing market up here when people started cashing in their California huts for Oregon mansions - when we lived in California we could afford two columns out front but now we can afford 6 columns, 8 poticos, 6 dormers, 4 turrets, and so on. Just because you can built it doesn't keep it from being an ugly mess. So my question on many of these processors is, has anybody given any thought to system architecture? Most likely all of us have had to suffer with some piece of spiffy hardware that was pretty much unprogrammable. Do the performance numbers mean anything if they can't be achieved in an actual system configuration? Jon

Larry McVoy

6:07 p.m.

On Fri, Sep 17, 2021 at 10:51:38AM -0700, Jon Steinhart wrote:

...

So my question on many of these processors is, has anybody given any thought to system architecture? Most likely all of us have had to suffer with some piece of spiffy hardware that was pretty much unprogrammable. Do the performance numbers mean anything if they can't be achieved in an actual system configuration?

Apple, much to my surprise, has been thinking about this stuff. The M1 is one of the better thought out chips in a long time. I especially liked the idea of slow (and low battery draw) processors for some apps, and fast ones for apps that need fast. It's an obvious idea in retrospect but I hadn't thought to do it that way. Clever.

Derek Fawcus

9:03 p.m.

On Fri, Sep 17, 2021 at 11:07:10AM -0700, Larry McVoy wrote:

...

Apple, much to my surprise, has been thinking about this stuff. The M1 is one of the better thought out chips in a long time.

...

I especially liked the idea of slow (and low battery draw) processors for some apps, and fast ones for apps that need fast. It's an obvious idea in retrospect but I hadn't thought to do it that way. Clever.

I'd suggest that is not Apple, but ARM. That was sort of the whole point of their BIG.little architecture with performance and efficiency cores. DF

Larry McVoy

10:11 p.m.

On Fri, Sep 17, 2021 at 10:03:42PM +0100, Derek Fawcus wrote:

...

On Fri, Sep 17, 2021 at 11:07:10AM -0700, Larry McVoy wrote:

Apple, much to my surprise, has been thinking about this stuff. The M1 is one of the better thought out chips in a long time.

I'd suggest that is not Apple, but ARM. That was sort of the whole point of their BIG.little architecture with performance and efficiency cores.

Credit to ARM, but M1 was where I saw it first. The M1 performance is pretty impressive as well. Reminds me of LED bulbs: "Wait, these are brighter, use less power, and cost less? How is that a thing?"

Theodore Ts'o

19 Sep 19 Sep

4:05 a.m.

On Fri, Sep 17, 2021 at 03:11:49PM -0700, Larry McVoy wrote:

...

I'd suggest that is not Apple, but ARM. That was sort of the whole point of their BIG.little architecture with performance and efficiency cores.

ARM's BIG.little architecture dates back to 2011. Of course, they didn't *tell* anyone that they were doing this when before the chip was released, lest it get copied by their competitors. So they released a hacked-up version of the Linux kernel that supported their chip. And successive versions of BIG.little implemented by different ARM vendors had their own vendor-specific tweaks, with "board support kernels" that were heavily hacked up Linux kernels in code that was never sent upstream, since by the time vendors had released that chip, their kernel team was moved to working on a new project, where they would fork the current kernel version, and add completely different hacks for the next year's version of their System on a Chip (SOC). Proper support for differntly sized/powered cores didn't land in theb ppstream Linux until 2019, when Linux 5.0 finally acquired support for "Energy Aware Scheduling". So not only does hardware engineering take longer, but it's done in Sekrit, by software teams employed by chip manufacturers who often don't have any long-time attachment to the OS, and who will get reassigned to work the next year's SOC as soon as the current year's SOC is released. :-( Things have gotten a bit better in the last five years, but that's a pretty low bar, considering how horrible things were before that! - Ted

Bakul Shah

17 Sep 17 Sep

6:34 p.m.

On Sep 17, 2021, at 10:51 AM, Jon Steinhart <jon(a)fourwinds.com> wrote:

...

Bakul Shah writes:

If you look at the chip architecture, it is pretty regular. https://www.esperanto.ai/technology/ and very low power (0.01W/core as opposed to 7W/core on X86-64) https://www.servethehome.com/esperanto-et-soc-1-1092-risc-v-ai-accelerator-… The equivalent of turrets and porticos and columns and dormers are IO ports like USB, ethernet, and various GPIOs etc. but they use only a small portion of the available gates. IMHO the real issue is that the software folks are *not* providing and *can no*t provide any sort of guidance for general purpose computing, as the memory underlying modern programming languages is so far removed from reality. The situation is sorta like what happens with people with newly acquired incredible wealth but no background in how to spend or manage it wisely (and I don't mean *investing* to get more wealth). At least in that case there are people who can help you and a tremendous need. Here we can put billions and billions of gates on a chip and even do wafer scale integration but these gates are not fungible like money.

Jon Steinhart

6:56 p.m.

Bakul Shah writes:

...

IMHO the real issue is that the software folks are *not* providing and *can no*t provide any sort of guidance for general purpose computing, as the memory underlying modern programming languages is so far removed from reality. The situation is sorta like what happens with people with newly acquired incredible wealth but no background in how to spend or manage it wisely (and I don't mean *investing* to get more wealth). At least in that case there are people who can help you and a tremendous need. Here we can put billions and billions of gates on a chip and even do wafer scale integration but these gates are not fungible like money.

Are you sure about this? Or is it just that the hardware folks got to work without ever asking software/systems people for input? Reminds me of a UNIX workstation project that I was working on at a startup in the mid-1980s. The graphics hardware guy came from DEC and had his designs well under way before I was hired. I looked at his specs and asked WTF? He had put a lot of work, for example, into blazingly fast frame-buffer clears. I asked him why and he told me that that was a very important thing to DECs customers. He was crestfallen when I told him that I would never use that hunk of hardware because we were building a windowing system and not a standalone graphics terminal. So yeah, he built useless hardware because he never asked for system design help. It's not just compute hardware. Was working on a project in the 1990s that you might be connected to if you're ever in the emergency room. After I hired on, I looked over the schematics for the device and kept asking "What's this for?" Answer was consistently "Thought that it would be useful for you software folks." My response was "Hey, we're trying to build a low-cost device here. We don't need it, take it out." The common thread here, which I don't know if it applies to your example, is that hardware has traditionally had a longer lead time than software so commonly a hardware team is assembled and busy designing long before any software/systems types are brought on board. Sometimes it makes sense. I'll never forget a day that I expected to be walked to the door at a company but to my surprise management backed me up. A big failing of the company was that the lead chip designer was also a founder and on the board so was hard to hold to account. This person could always find a better way to do something that had already been done, so perpetual chip redesigns meant that the chip never happened. I was in a weekly status meeting where the hardware team presented its status and schedule. When it was question time, I asked something like "So, if I look at your schedule this has 2 weeks to go and this has 5 and so on, meaning that there's 20 weeks of work left. Does that mean that we're not gonna tape out next week like it says on the schedule?" I got reamed up and down for not showing respect for the ever so hard working hardware team. I think that what saved my butt was pointing out that I was not criticizing the hardware team, but that we had a big hiring plan for the software team once the hardware was on its way, and only so much money in the bank, and maybe we shouldn't be spending that money quite yet. Jon

Bakul Shah

7:16 p.m.

On Sep 17, 2021, at 11:56 AM, Jon Steinhart <jon(a)fourwinds.com> wrote:

...

Bakul Shah writes:

Are you sure about this? Or is it just that the hardware folks got to work without ever asking software/systems people for input?

Let me ask you. You have a budget of 24 Billion transistors (& a much more limited power budget). How would you spend it to maximize general purpose computing? Note that Esperanto Tech. was founded by Dave Ditzel, not exactly a h/w newbie. [EE/CS, worked with Dave Patterson, CTO @ Sun, Transmeta founder etc. etc.]. Now it is entirely possible that Esperanto folks are building such processors for special purpose applications like Machine learning but the fact is we simply have an incredible riches of transistors that we don't how to spend wisely, [My opinion, of course]

Jon Steinhart

7:35 p.m.

Bakul Shah writes:

...

On Sep 17, 2021, at 11:56 AM, Jon Steinhart <jon(a)fourwinds.com> wrote:

Bakul Shah writes:

Are you sure about this? Or is it just that the hardware folks got to work without ever asking software/systems people for input?

Tough question to answer. I would say that maybe we're getting to the point where a dream of mine could happen, which is eliminating the traditional multi-tasking operating system and just having one processor per process. One could even call it "hardware containers". Gak! Big unsolved problem the incompatibility between the processes for this sort of stuff and DRAM processes. Had hoped that some of the 3D stuff that I saw Dave D. talk about a few years ago would have panned out by now. So just because I've never heard anybody else say this, and because we have named laws like Cole's law, I have Steinhart's law. It states that a bad investment to give money to people who have had successful startup companies. My opinion is that people are lucky to get it right once, and very few get it right more than once, so I wouldn't bet on a company founded by people who have already done successful companies. A corollary is that to find a successful startup, look for someone who has been perfecting an idea over a handful of failed startup companies. In many respects, Apple fits that model; they let other people invent stuff and then they "do it right". The more interesting question that you raise is, why would you expect better stuff because we can now do tens of billions of transistors?

...

From the software side, we have a mind-boggling about of memory, at

least for those of us who began with 4K or less, and I can't see that it's used wisely. I've had people tell me that I was wasting my time being efficient because memory was cheaper than my time. Which I guess was true until you multiplied it by the number of systems. As an EE/CS guy, I don't really expect the HW people to be any different than the SW people now that designing HW is just writing SW. Jon

Bakul Shah

3:56 p.m.

On Sep 16, 2021, at 12:34 PM, Jon Steinhart <jon(a)fourwinds.com> wrote:

...

Recall that previously sysadmins used programs such as ghost to image a system. A completely operational system with all the required software could be created fairly quickly with minimum configuration. If your h/w crashed, you can get up and running fairly quickly on a new machine (provided your unique bits were backed up & restored). The same thing could be done for server machines. By minimizing differences you can apply security patches or put new machines in service quickly. A server machine needs much more than the main service program before it can be put in real service but machines providing the same service need pretty much the same things. When VMs and containers started getting used, the same model could be used for provisioning them. The docker folks simplified this further. Now you can spin up new servers almost trivially (even if later tooling via Kubernetes and such got quite complicated). Seems to me, this provisioning of whole systems is what users of technologies such as jail never quite got it. A couple of points on this: 1) I think this can be simplified even further if one can rely on a fast control plane connection by basically lazily pulling in the contents of each container. 2) If the underlying system provides a capability architecture, you can probably achieve the exact same functionality without containers as the required "many worlds" functionality is already built in. -- Bakul

ron minnich

6:24 p.m.

As a student weekend operator at Dupont, I saw the last 2 weeks of the 705 vacuum tube machine that ran payroll and then, over two years: the 7080 that ran the 705 in emulation; the 360 that ran the 7080 that ran the 705; the 370 that ran the 360 that ran the 7800 that ran the 705. The 370 sacrificed ASCII support so it could emulate a 360 (IBM ran out of bits in the PSW, so ASCII had to go). The Blue Gene compute node kernel (CNK) started out supporting the "most commonly used" 80 or so linux system calls; that grew by about 1 system call a month. As CNK became asymptotic to Linux, some sites threw in the towel and ran linux. Frequently, emulation is not enough. Linux itself supported SCO for a time; freebsd has a good linux subsystem; one could argue that Linux emulates Linux, given its binary compatibility guarantee. Unix had a JCL command through v5 and still has a GECOS field in the password. The x86 allocates 1/256 of its gigantic opcode space to the DAA instruction nobody uses. A multi-hundred core server can still boot CP/M and DOS 1.0 RISC-V started as clean-ish break with the past, and has followed the standard evilutionary path to x86-inspired 4k page sizes; big-endian support; and runtime BIOS. Successful systems rarely come as a complete break with the past. People frequently expect that the Unix "clean break" experience can be repeated, but. if anything, the history of Unix shows why that experience is unlikely to be repeated. But if you're going to do a new kernel, and you want broad usage, you're going to provide a VM to run Linux or something like it, or you're going to fail. Or target something invisible, like a security token, then you are free to do what you will -- the Google Open Titan security token runs a kernel written in Rust. Nobody knows about it so nobody cares. ron p.s. I remember the person who finally ported payroll from the 705 binary. "There's a bug in there, maybe, but i can't find it" -- 2 weeks later, in a not-to-be-repeated occurence, the paycheck of all us weekend students was 2x normal. We couldn't figure out who to report it to so we let it slide. p.p.s. The 7080 was the last mainframe I worked with that smelled of machine oil. On Fri, Sep 17, 2021 at 8:57 AM Bakul Shah <bakul(a)iitbombay.org> wrote:

...

On Sep 16, 2021, at 12:34 PM, Jon Steinhart <jon(a)fourwinds.com> wrote:

1433

days inactive

1451

days old

tuhs@tuhs.org

Manage subscription

51 comments

20 participants

tags (0)

participants (20)

Adam Thornton
Al Kossow
Bakul Shah
Clem Cole
Dan Cross
David Arnold
Derek Fawcus
Douglas McIlroy
John Cowan
John Floren
Jon Steinhart
Kevin Bowling
Larry McVoy
Lawrence Stewart
Marshall Conover
Rob Pike
ron minnich
Theodore Ts'o
Tony Finch
Warner Losh