Clem suggests I comment on mixing ISA. I'm not sure how to respond. I
saw Bruce and Jerry demo process migration many times, particularly
during our dramatic Santa Monica meetings in October 1987, coincident
with the Whittier earthquake. However, I never got a chance to work with
this myself. (During the strongest aftershocks, Bruce and I would just
stare and hold on to our chairs. Having us Austin IBM folks in Santa
Monica to try to resolve the Austin/LCC disagreements seemed historic,
but probably not the cause of the October 19 Wall Street crash.)
In general, I was always impressed by what Bruce and Jerry did, but the
assertions that LCC could do everything exacerbated the ongoing
political challenges within IBM. To repeat from
https://notes.technologists.com/notes/2017/03/08/lets-start-at-the-very-beg…:
o "The former LCC person has mentioned that IBM then seemed like N
competing companies. Actually, it was more like M<sub>n</sub> competing
factions within N competing companies."
o "The traditional product organizations, e.g., those associated with
the 370 and the System 3x, saw little need for UNIX or a new hardware
architecture. The renegade but surprisingly successful PC organizations
looked askance for their own reasons. Even the Yorktown partners were
partly detrimental because of disdain for UNIX." [To amplify on this, in
1984 CEO John Akers told a gathering of Austin IBM managers that he
questioned the need for RISC processors and UNIX.]
o "Besides our technical concerns about distributed system issues, the
implicit question seemed an all or nothing proposition of continuing AIX
vs. IBM depending on LCC for UNIX."
And we could dwell on OSF, DCE, etc. On the day OSF was announced, with
Akers on stage with Ken Olsen, Akers flew across the country to an
awards event, where Glenn Henry, Larry Loucks and I received substantial
checks in recognition of AIX. When Akers shook my hand, he told me how
proud he was of what had happened that day.
When I saw the Register article, I knew that systemd folks hadn't
boasted '42% less Unix philosophy', that it was really someone on
mas.to, but I felt like stirring up discussion. Seems to have worked...
Charlie
On 6/17/2024 11:00 AM, Clem Cole wrote:
typo... like the VFS layer (not CFS layer)
ᐧ
On Mon, Jun 17, 2024 at 11:56 AM Clem Cole <clemc(a)ccc.com> wrote:
On Mon, Jun 17, 2024 at 1:51 AM Bakul Shah via TUHS
<tuhs(a)tuhs.org> wrote:
Forgot to mention LOCUS, which was the only distributed Unix
compatible OS I am aware of. To anyone who has
user/implementer experience, I would love to hear what worked
well, what didn't, what was easy to implement, what was very
hard and what you wished was added to it.
Jerry and Bruce's book is the complete reference:
https://www.amazon.com/Distributed-System-Architecture-Computer-Systems/dp/…
There were basically 3/4 versions... the original version of the
PDP 11 which is the SOSP paper, which morphed to include a VAX at
UCLA; IBM's AIX/370 and AIX/PS2 which included TCF (Transparent
Computing Facility), and LCC's TNC Transparent Networking
Computing "product" which were the 14 core technologies used to
built it. Part of them landed in other systems from Tru64, HPUX,
the Paragon and even a later a Linux implementation (which sadly
was done on the V2 kernel so was lost when Linus did not
understand it).
What worked well was different flavors of the DFS and the later
core idea of the VPROCS layer which I sorely miss, which allowed
process migration - which w worked well and boy did I miss later
in my career. Admin of a Locus based system was a dream because
it was just one system for up to 4096 nodes in a Paragon. It
also means you could migrate processes off a node, take the node
down, reboot/change and bring it back. Very cool. After the first
system was installed, adding a node was trivial, by the way. You
booted the node, "joined" the cluster, and were up. AIX used file
replication to then build the local disks as needed. BTW:
"checkpointing" was a freebie -- you just migrated the file to a disk.
Mixing ISA like the 370 and PS/2 was a mixed bag -- I'll let
Charlie comment. With TNC we redid that model a bit, I'm not
sure we ever got it 100% right. The HP-UX version was probably
the best.
The biggest implementation issue is that UNIX has too many
different namespaces with all sorts of rules that are particular
to each. For all of the concept of "everything is a file," - when
you start to try to bring it together, you discover new and
werid^H^H^H^H^Hintersting name spaces from System V IPC to signals
to FIFOs and Name Pipes (similar but different). It seemed like
everything we looked, we would find another NS we needed to
handle, and when we started to try to look at non-UNIX process
layers, it got even stranger. The original UNIX protection model
is a tad weak, but most people had started to add ACLs, and POSIX
was in the throughs of standardizing them -- so we based it on an
early POSIX proposal (mostly based on HP-UX since they had them
before the others did).
To be more specific, the virtual process layer (VPROC) attempted
to do what VFS had done for the FS layer to the core kernel. If
you look at both the original 2 Locus schemes, process control was
ad hoc and thus very messy. LCC realized if we were going to
succeed, we needed to make that cleaner. But that still took
major surgery - although, like the CFS layer, things were a lot
clearer once done. Bruce, Roman, and I came up with VPROCs.
BTW: one of the cool parts of VPROC is like VFS. It conceptually
made it possible to have other process models. We did a prototype
for OS/2 running inside of the OSF uK and were trying to get a
contract from DEC to do it to Tru64 and adding VMS before we got
sold (we had already developed CFS for DEC as part of Tru64 -
which TNC's Cluster File System). Truth is, cheap VMs killed the
need for this idea, but it worked fairly well.
After the core VPROCs layer, the hardest thing was distributed
shared memory (DSM) and the distributed lock manager (DLM). DSM
was an example that offered pure transparency in operation,
/i.e.,/ test and set worked (operationally) correctly across the
DSM, but it was not "speed transparent." But if you rewrote to
use DLM, then you could get full transparency and speed. The DLM
is one of the TNC technology which lives on today. It ended up in
a number of systems - Oracle wrote their own based on the specs
for the DEC DLM we built for the CFS for Tru64 (which is from
TNC). I believe a few other folks used it. It was in OSF's DCE,
and ISTR Microsoft picked it up.
So a good question is if TNC was so cool, why did Beowulf (a real
hack in comparison) stick around and TNC die? Well, a few
things. LCC/HP did not open-source the code until it was too
late. So Beowulf, which was around, was what folks (like me) used
to build big scientific clusters. And while Popek was "right," --
it takes something like Locus/TNC to make a cluster fully
transparent. Beowulf ignored the seams and i the end, that was
"good enough." But it makes setup and admin a PITA, and the
program needs to be careful -- the dragons are all over the
place. So, when I went to Intel, I was the Architect of Cluster
Ready, which defined away many of those seams and then provided
tools to test for them and help you admin.
Tools like the Cluster Checker and the whole ClusterReady program
would not be needed if TNC had "stuck," and I think clusters, in
general, a cluster of small computers on a LAN, not just clusters
on a high-speed/special interconnect like a supercomputer, would
be more available today.
Clem
ᐧ
--
voice: +1.512.784.7526 e-mail:sauer@technologists.com
fax: +1.512.346.5240
Web:https://technologists.com/sauer/
Facebook/Google/LinkedIn/Twitter: CharlesHSauer