[TUHS] Re: Version 256 of systemd boasts '42% less Unix philosophy' The Register

17 Jun 2024

typo...  like the VFS layer (not CFS layer)
ᐧ
On Mon, Jun 17, 2024 at 11:56 AM Clem Cole &lt;clemc(a)ccc.com&gt; wrote:
...

 On Mon, Jun 17, 2024 at 1:51 AM Bakul Shah via TUHS &lt;tuhs(a)tuhs.org&gt; wrote:
  Forgot to mention LOCUS, which was the only
distributed Unix compatible
 OS I am aware of. To anyone who has user/implementer experience, I would
 love to hear what worked well, what didn't, what was easy to implement,
 what was very hard and what you wished was added to it.
  Jerry and Bruce's book is the complete reference:
 https://www.amazon.com/Distributed-System-Architecture-Computer-Systems/dp/…
 There were basically 3/4 versions...  the original version of the PDP 11
 which is the SOSP paper, which morphed to include a VAX at UCLA; IBM's
 AIX/370 and AIX/PS2 which included TCF (Transparent Computing Facility),
 and LCC's TNC Transparent Networking Computing "product" which were the 14
 core technologies used to built it.  Part of them landed in other systems
 from Tru64, HPUX, the Paragon and even a later a Linux implementation
 (which sadly was done on the V2  kernel so was lost when Linus did not
 understand it).
 What worked well was different flavors of the DFS and the later core idea
 of the VPROCS layer which I sorely miss, which allowed process migration -
 which w worked well and boy did I miss later in my career.  Admin of a
 Locus based system was a dream because it was just one system for up to
 4096 nodes in a Paragon.   It also means you could migrate processes off a
 node, take the node down, reboot/change and bring it back. Very cool.
 After the first system was installed, adding a node was trivial, by the
 way.  You booted the node, "joined" the cluster, and were up. AIX used file
 replication to then build the local disks as needed.    BTW:
 "checkpointing" was a freebie -- you just migrated the file to a disk.
 Mixing ISA like the 370 and PS/2  was a mixed bag -- I'll let Charlie
 comment.   With TNC we redid that model a bit, I'm not sure we ever got it
 100% right.  The HP-UX version was probably the best.
 The biggest implementation issue is that UNIX has too many different
 namespaces with all sorts of rules that are particular to each.  For all of
 the concept of "everything is a file," - when you start to try to bring it
 together, you discover new and werid^H^H^H^H^Hintersting name spaces from
 System V IPC to signals to FIFOs and Name Pipes (similar but different).
 It seemed like everything we looked, we would find another NS we needed to
 handle, and when we started to try to look at non-UNIX process layers, it
 got even stranger.  The original UNIX protection model is a tad weak, but
 most people had started to add ACLs, and POSIX was in the throughs of
 standardizing them -- so we based it on an early POSIX proposal (mostly
 based on HP-UX since they had them before the others did).
 To be more specific, the virtual process layer (VPROC) attempted to do
 what VFS had done for the FS layer to the core kernel.   If you look at
 both the original 2 Locus schemes, process control was ad hoc and thus very
 messy.   LCC realized if we were going to succeed, we needed to make that
 cleaner.  But that still took major surgery - although, like the CFS layer,
 things were a lot clearer once done.   Bruce, Roman, and I came up with
 VPROCs.  BTW: one of the cool parts of VPROC is like VFS. It conceptually
 made it possible to have other process models. We did a prototype for OS/2
 running inside of the OSF uK and were trying to get a contract from DEC to
 do it to Tru64 and adding VMS before we got sold (we had already developed
 CFS for DEC as part of Tru64 - which TNC's Cluster File System). Truth is,
 cheap VMs killed the need for this idea, but it worked fairly well.
 After the core VPROCs layer, the hardest thing was distributed
 shared memory (DSM) and the distributed lock manager (DLM).   DSM was an
 example that offered pure transparency in operation, *i.e.,* test and set
 worked (operationally) correctly across the DSM, but it was not "speed
 transparent."  But if you rewrote to use DLM, then you could get full
 transparency and speed.  The DLM is one of the TNC technology which lives
 on today.  It ended up in a number of systems - Oracle wrote their own
 based on the specs for the DEC DLM we built for the CFS for Tru64 (which is
 from TNC). I believe a few other folks used it.  It was in OSF's DCE, and
 ISTR Microsoft picked it up.
 So a good question is if TNC was so cool, why did Beowulf (a real hack in
 comparison) stick around and TNC die?   Well, a few things.  LCC/HP did not
 open-source the code until it was too late.  So Beowulf, which was around,
 was what folks (like me) used to build big scientific clusters. And while
 Popek was "right," -- it takes something like Locus/TNC to make a cluster
 fully transparent.  Beowulf ignored the seams and i the end, that was "good
 enough."   But it makes setup and admin a PITA, and the program needs to be
 careful -- the dragons are all over the place. So, when I went to Intel, I
 was the Architect of Cluster Ready, which defined away many of those seams
 and then provided tools to test for them and help you admin.
 Tools like the Cluster Checker and the whole ClusterReady program would
 not be needed if TNC had "stuck," and I think clusters, in general, a
 cluster of small computers on a LAN, not just clusters on a
 high-speed/special interconnect like a supercomputer, would be more
 available today.
 Clem
 ᐧ

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

[TUHS] Re: Version 256 of systemd boasts '42% less Unix philosophy' The Register