On 28 Jun 2018, at 18:09, Larry McVoy <lm(a)mcvoy.com> wrote:
I'm not sure how people keep missing the original point. Which was:
the market won't choose a bunch of wimpy cpus when it can get faster
ones. It wasn't about the physics (which I'm not arguing with), it
was about a choice between lots of wimpy cpus and a smaller number of
fast cpus. The market wants the latter, as Ted said, Sun bet heavily
on the former and is no more.
[I said I wouldn't reply more: I'm weak.]
I think we have been talking at cross-purposes, which is probably my fault. I think
you've been using 'wimpy' to mean 'intentionally slower than they
could be' while I have been using it to mean 'of very tiny computational power
compared to the power of the whole system'. Your usage is probably more correct in
terms of the way the term has been used historically.
But I think my usage tells you something important: that the performance of individual
cores will, inevitably, become increasingly tiny compared to the performance of the system
they are in, and will almost certainly become asymptotically constant (ie however much
money you spend on an individual core it will not be very much faster than the one you can
buy off-the-shelf). So, if you want to keep seeing performance improvements (especially if
you want to keep seeing exponential improvements for any significant time), then you have
no choice but to start thinking about parallelism.
The place I work now is an example of this. Our machines have the fastest cores we could
get. But we need nearly half a million of them to do the work we want to do (this is
across three systems).
I certainly don't want to argue that choosing intentionally slower cores than you can
get is a good idea in general (although there are cases where it may be, including,
perhaps, some HPC workloads).
---
However let me add something about the Sun T-series machines, which were 'wimpy
cores' in the 'intentionally slower' sense. When these started appearing I
worked in a canonical Sun customer at the time: a big retail bank. And the reason we did
not buy lots of them was nothing to do with how fast they were (which was more than fast
enough), it was because Sun's software was inadequate.
To see why, consider what retail banks IT looked like in the late 2000s. We had a great
mass of applications, the majority of which ran on individual Solaris instances (at least
two, live and DR, per application). A very high proportion (not all) of these
applications had utterly negligible computational requirements. But they had very strong
requirements on availability, or at least the parts of the business which owned them said
they did and we could not argue with that, especially given that this was 2008 and we knew
that if we had a visible outage there was a fair chance that it would be misread as the
bank failing, resulting in a cascade failure of the banking system and the inevitable
zombie apocalypse. No one wanted that.
Some consolidation had already been done: we had a bunch of 25ks, many of which were split
into lot of domains. The smallest domain on a 25k was a single CPU board which was 4
sockets and therefore 8 or 16 (I forget how many cores there were per socket) cores. I
think you could not partition a 25k like that completely because you ran out of IO
assemblies, so some domains had to be bigger.
This smallest domain was huge overkill for many of these applications, and 25ks were
terribly expensive as well.
So, along came the first T-series boxes and they were just obviously ideal: we could
consolidate lots and lots of these things onto a single T-series box, with a DR partner,
and it would cost some tiny fraction of what a 25k cost, and use almost no power (DC power
was and is a real problem).
But we didn't do that: we did some experiments, and some things moved I think, but on
the whole we didn't move. The reason we didn't move was nothing, at all, to do
with performance, it was, as I said, software, and in particular virtualisation. Sun had
two approaches to this, neither of which solved the problems that everyone had.
At the firmware level there were LDOMs (which I think did not work very well early on or
may not have existed) which let you cut up a machine into lots of smaller ones with a
hypervisor in the usual way. But all of these smaller machines shared the same hardware
of course. So if you had a serious problem on the machine, then all of your LDOMs went
away, and all of the services on that machine had an outage, at once. This was not the
case on a 25k: if a CPU or an IO board died it would affect the domain it was part of, but
everything else would carry on.
At the OS level there were zones (containers). Zones had the advantage that they could
look like Solaris 8 (the machine itself, and therefore the LDOMs it got split into, could
only run Solaris 10), which all the old applications were running, and they could be very
fine-grained. But they weren't really very isolated from each other (especially in
hindsight), they didn't look *enough* like Solaris 8 for people to be willing to
certify the applications on them, and they still had the all-your-eggs-in-one-basket
problem if the hardware died.
The thing that really killed it was the eggs-in-one-basket problem. We had previous
experience with consolidating a lot of applications onto one OS & hardware instance,
and no-one wanted to go anywhere near that. If you needed to get an outage (say to
install a critical security patch, or because of failing hardware) you had to negotiate
this with *all* the application teams, all of whom had different requirements and all of
whom regarded their application as the most important thing the bank ran (some of them
might be right). It could very easily take more than a year to get an outage on the big
shared-services machines, and when the outage happened you would have at least 50 people
involved to stop and restat everything. It was just a scarring nightmare.
So, to move to the T-series machines what we would have needed was a way of partitioning
the machine in such a way that the partitions ran Solaris 8 natively, and in such a way
that the partitions could be moved, live, to other systems to deal with the
eggs-in-one-basket problem. Sun didn't have that, anywhere near (they knew this I
think, and they got closer later on, but it was too late).
So, the machines failed for us. But this failure was nothing, at all, to do with
performance, let alone performance per core, which was generally more than adequate. Lots
of wimpy, low power, CPUs was what we needed, in fact: we just needed the right software
on top of them, which was not there.
(As an addentum: what eventually happened / is happening I think is that applications are
getting recertified on Linux/x86 sitting on top of ESX, *which can move VMs live between
hosts*, thus solving the problem.)