Kevin, I think that's a great framing of why this talk actually seemed
inverted in its focus for me, and a good identification of why the
presenter might see OS development stalling out and ossifying around
Linux.
I come from the opposite side of the presenter here: my frustration as
a backend dev and user has been that modern OSs still think presenting
an abstraction over my resources means making it easy to use one
single machine (or, as the presenter brings up, a subset of the
machine). Instead, my resources are spread out among many machines and
a number of remote web services, for which I'd like to have one
seamless interface - both for development and use. From an OS
perspective, Plan 9 and its daughter systems have come the closest
I've seen to addressing this by intentionally thinking about the
problem and creating an API system for representing resources that
reaches across networks, and a mutable namespace for using and
manipulating those APIs. Despite pulling other ideas from 9, the
importance of having an opinion on the distributed nature of modern
computing seems to have been missed by prominent operating systems
today. As a result, their development has been relegated to what they
do: be a platform for things that actually provide an abstraction for
my resources.
And userspace systems have filled the demand for abstracting
distributed resource usage to demonstrable business success, if
questionable architectural success (as in, they can still be a
confusing pain in the buns and require excess work sometimes). As a
dev, the systems that have come the closest to presenting one unified
abstraction over my resources are the meta-services offered by Google,
MS and Amazon such as Azure and AWS.
I think the distributed nature of things today is also potentially why
the focus of the conference is on distributed systems now, as lamented
by the presenter. Granted that I'm not the sharpest bulb in the
drawer, but I can't think of a way an OS taking more direct control of
the internal hardware of an individual computer would impact me beyond
the security issues mentioned in the talk. However, I can think of a
number of ways an OS being opinionated about working with networked
machines would greatly improve my situation. Boy, it would be great to
just spin up a cluster of machines, install one OS on all of them, and
treat them as one resource. That's the dream the k8s mentality
promises, and MS and Amazon are already walking towards being this
sort of one-stop shop: "want cluster computing? Press a button to spin
up a cluster with ECS, and store your containers in ECR. Want to run a
program or twelve somewhere on the cluster? Just tell us which one and
how many. Worried about storage? Just tell us what size storage it
needs. We've got you covered!" None of it is perfect, but it shows
that there's heavy demand for a system where users don't have to think
about how to architect and maintain arbitrary groupings of their
resources as necessitated by how OSs think of their job now, and
instead just want to feel as if they're writing and running programs
on one big 'thing'.
So I think the ossification around Linux mentioned in the talk might
be that unless operating systems start doing something more than being
a host for the tools that actually provide an abstraction over all my
resources, there's no real reason to make them do anything else. If
you're not making it easier to use my resources than k8s or Azure, why
would I want you?
Cheers,
Marshall
On Thu, Sep 2, 2021 at 11:42 AM Kevin Bowling <kevin.bowling(a)kev009.com> wrote:
On Wed, Sep 1, 2021 at 3:00 PM Dan Cross <crossd(a)gmail.com> wrote:
One of the things I really appreciate about participating in this community and studying
Unix history (and the history of other systems) is that it gives one firm intellectual
ground from which to evaluate where one is going: without understanding where one is and
where one has been, it's difficult to assert that one isn't going sideways or
completely backwards. Maybe either of those outcomes is appropriate at times (paradigms
shift; we make mistakes; etc) but generally we want to be moving mostly forward.
The danger when immersing ourselves in history, where we must consider and appreciate the
set of problems that created the evolutionary paths leading to the systems we are
studying, is that our thinking can become calcified in assuming that those systems
continue to meet the needs of the problems of today. It is therefore always important to
reevaluate our base assumptions in light of either disconfirming evidence or (in our
specific case) changing environments.
To that end, I found Timothy Roscoe's (ETH) joint keynote address at
ATC/OSDI'21 particularly compelling. He argues that what we consider the
"operating system" is only controlling a fraction of a modern computer these
days, and that in many ways our models for what we consider "the computer" are
outdated and incomplete, resulting in systems that are artificially constrained, insecure,
and with separate components that do not consider each other and therefore frequently
conflict. Further, hardware is ossifying around the need to present a system interface
that can be controlled by something like Linux (used as a proxy more generally for a
Unix-like operating system), simultaneously broadening the divide and making it ever more
entrenched.
Another theme in the presentation is that, to the limited extent the broader systems
research community is actually approaching OS topics at all, it is focusing almost
exclusively on Linux in lieu of new, novel systems; where non-Linux systems are featured
(something like 3 accepted papers between SOSP and OSDI in the last two years out of $n$),
the described systems are largely Linux-like. Here the presentation reminded me of Rob
Pike's "Systems Software Research is Irrelevant" talk (slides of which are
available in various places, though I know of no recording of that talk).
Roscoe's challenge is that all of this should be seen as both a challenge and an
opportunity for new research into operating systems specifically: what would it look like
to take a holistic approach towards the hardware when architecting a new system to drive
all this hardware? We have new tools that can make this tractable, so why don't we do
it? Part of it is bias, but part of it is that we've lost sight of the larger
picture. My own question is, have we become entrenched in the world of systems that are
"good enough"?
Things he does NOT mention are system interfaces to userspace software; he doesn't
seem to have any quibbles with, say, the Linux system call interface, the process model,
etc. He's mostly talking about taking into account the hardware. Also, in fairness,
his highlighting a "small" portion of the system and saying, "that's
what the OS drives!" sort of reminds me of the US voter maps that show vast tracts of
largely unpopulated land colored a certain shade as having voted for a particular
candidate, without normalizing for population (land doesn't vote, people do, though
in the US there is a relationship between how these things impact the overall election
for, say, the presidency).
I'm curious about other peoples' thoughts on the talk and the overall topic?
https://www.youtube.com/watch?v=36myc8wQhLo
- Dan C.
One thing I've realized as the unit of computing becomes more and more
abundant (one off
HW->mainframes->minis->micros->servers->VMs->containers) the OS
increasingly becomes less visible and other software components become
more important. It's an implementation detail like a language runtime
and software developers are increasingly ill equipped to work at this
layer. Public cloud/*aaS is a major blow to interesting general
purpose OS work in commercial computing since businesses increasingly
outsource more and more of their workloads. The embedded (which
includes phones/Fuschia, accelerator firmware/payload, RTOS etc) and
academic (i.e. Cambridge CHERI) world may have to sustain OS research
for the foreseeable future.
There is plenty of systems work going on but it takes place in
different ways, userspace systems are completely viable and do not
require switching to microkernels. Intel's DPDK/SPDK as one
ecosystem, Kubernetes as another - there is a ton of rich systems work
in this ecosystem with eBPF/XDP etc, and I used to dismiss it but it
is no longer possible to do so rationally. I would go as far as
saying Kubernetes is _the_ datacenter OS and has subsumed Linux itself
as the primary system abstraction for the next while.. even Microsoft
has a native implementation on Server 2022. It looks different and
smells different, but being able to program compute/storage/network
fabric with one abstraction is the holy grail of cluster computing and
interestingly it lets you swap the lower layer implementations out
with less risk but also less fanfare.
Regards,
Kevin