What are the 1970’s & 1980’s Computing / IT skills “our grandkids won’t have”?
Whistling into a telephone while the modem is attached, because your keyboard has a stuck key
- something I absolutely don’t miss.
Having a computer in a grimy wharehouse with 400 days of uptime & wondering how a reboot might go?
steve j
=========
9 Skills Our Grandkids Will Never Have
<https://blog.myheritage.com/2022/06/9-skills-our-grandkids-will-never-have/>
1: Using record players, audio cassettes, and VCRs
2: Using analog phones [ or an Analog Clock ]
3. Writing letters by hand and mailing them
4. Reading and writing in cursive
5. Using manual research methods [ this is a Genealogy site ]
6. Preparing food the old-fashioned way
7. Creating and mending clothing
8. Building furniture from scratch
9. Speaking the languages of their ancestors
--
Steve Jenkin, IT Systems and Design
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA
mailto:sjenkin@canb.auug.org.au http://members.tip.net.au/~sjenkin
Warner Losh:
Alcatel-Lucent gave an official grant to V8, V9 and V10. See
https://www.tuhs.org/Archive/Distributions/Research/Dan_Cross_v8/statement_…
====
Quite so. I believe this was announced on a mailing list called TUHS.
Those here who are interested in such things might want to subscribe;
I have and find it quite useful and interesting, with occasional
disappointment.
Norman Wilson
Toronto ON
(typing this on a train in Texas)
> I understand UNIX v7 is under this BSD-style license by Caldera Inc.
> https://www.tuhs.org/Archive/Caldera-license.pdf
The eqn document by Kernighan and Cherry also appears in the v10
manual, copyright by AT&T and published as a trade book. Wouldn't the
recent release of v10 also pertain to the manual?
Doug
Following an insightful post by Norman Wilson (https://minnie.tuhs.org/pipermail/tuhs/2022-June/025929.html) and re-reading a few old papers (https://minnie.tuhs.org/pipermail/tuhs/2022-June/026028.html) I was thinking about similarities and differences between the various Unix networking approaches in the 1975-1985 era and came up with the following observations:
- First something obvious: early Unix was organised around two classes of device: character based and block based. Arguably, it is maybe better to think of these classes conceptually as “transient” and “memoizing”. A difference between the two would be wether or not it makes conceptual sense to do a seek operation on them and pipes and networks are in the transient class.
- On the implementation side, this relates two early kernel data structures: clists and disk buffers. Clists were designed for slow, low volume traffic and most early Unix network code creates a third kind: the mbufs of Arpanet Unix, BBN-TCP Unix and BSD, the packets of Chesson's V7 packet driver, Ritchie's streams etc. These are all the same when seen from afar: higher capacity replacements for clists.
- Typically devices are accessed via a filter. At an abstract level, there is not much difference between selecting a line discipline, pushing a stream filter or selecting a socket type. At the extreme end one could argue that pushing a TCP stack on a network device is conceptually the same as mounting a file system on a disk device. Arguably, both these operations could be performed through a generalised mount() call.
- Another implementation point is the organisation of the code. Is the network code in the kernel, or in user land? Conceptually connection management is different from stream management when connected (e.g. CMC and URP with Datakit, or RTP and BSP in Xerox Pups). In the BSD lineage all is in the kernel, and in the Research lineage connection management is done in a user space daemon.
Arpanet Unix (originally based on V5) had a curious solution: the network code was organised in a single process, but with code both in kernel mode and in user mode. The user code would make a special system call, and the kernel code would interact with the IMP driver, manage buffers and deliver packets. Only when a state-changing event happened, it would return to user mode and the user code would handle connection management (followed by a new call into kernel mode). Interestingly, this approach mostly hid the IMP connection, and this carried through to the BSD’s where the network devices were also buried in the stack. Arpanet Unix made this choice to conserve kernel address space and to minimize the amount of original kernel code that had to be touched.
- Early Unix has three ways to obtain a file descriptor: open, creat and pipe. Later also fifo. In this context adding more (like socket) does not seem like a mortal sin. Arguably, all could be rolled into one, with open() handling all cases. Some of this was done in 4.2BSD. It is possible to combine socket() & friends into open() with additional flags, much as was done in Arpanet Unix and BBN-TCP Unix.
- Network connections have different meta data than disk files, and in sockets this handled via specialised calls. This seems a missed opportunity for unified mechanisms. The API used in BBN-TCP handles most of this via ioctl. However, one could (cheekily!) argue that V7 unix has a somewhat byzantine meta data API, with the functionality split over seek, ioctl, fcntl, stat and fstat. These could all be handled in a generalised ioctl. Conceptually, this could also be replaced by using read/write on a meta data file descriptor, which could for example be the regular descriptor with the high bit set. But this, of course, did never exist.
- A pain point in Arpanet Unix was that a listening connection (i.e. a server endpoint) would block until a client arrived and then turn into the connection with the client. This would fork out into a service process and the main server process would open a new listening socket for the next client. In sockets this was improved into a rendez-vous type server connection that would spawn individual client connections via ‘accept’. The V8/V9 IPC library took a similar approach, but also developed the mechanism into a generalized way to (i) create rendez-vous points and (ii) ship descriptors across local connections.
- The strict blocking nature of IO in early Unix was another pain point for writing early network code. The first solution to that were BBN’s await and capac primitives, which worked around the blocking nature. With SysIII, non-blocking file access appeared and 4.1a BSD saw the arrival of 'select’. Together these offer a much more convenient way to deal with multiple tty or network streams in a single threaded process (although it did modify some of the early Unix philosophy). Non-blocking IO and select() also appeared in the Research lineage with 8th edition.
- The file system switch (FSS) arrived around 1983, during the gestation of 8th edition. This was just 1 or 2 years after the network interfaces for BSD and Datakit got their basic shape. Had the FSS been part of V7 (as it well could have been), probably the networking designs would have been a bit different, using virtual directories for networking connections. The ‘namei hack’ in MIT’s CHAOS network code already points in this direction. A similar approach could have been extended to named pipes (arriving in SysIII), where the fifo endpoint could have been set up through creating a file in a virtual directory, and making connections through a regular open of such a virtual file (and 9th edition appears to implement this.)
oOo
To me it seems that the V1-V7 abstractions, the system call API, etc. were created with the experience of CTSS, Multics and others fresh in mind. The issues were understood and it combined the best of the ideas that came before. When it came to networking, Unix did not have this advantage and was necessarily trying to ride a bike whilst inventing it. Maybe in a different time line it would have been possible to pick the best ideas in this area as well and combine these into a coherent framework.
I concur with the observation that this list should be about discussion of what once was and only tangentially about what might have been, so it is only after considerable hesitation that I write the below.
Looking at the compare and contrast above (and having been tainted by what became dominant in later decades), I would say that the most “Unixy” way to add networking to V7/SysIII era Unix would have been something like:
- Network access via open/read/write/close, in the style of BBN-TCP
- Network namespace exposed via a virtual file system, a bit like V9
- Meta data via a generalised ioctl, or via read/write on a meta data descriptor
- Connection rendez-vous via a generalised descriptor shipping mechanism, in the style of V8/V9
- Availability of non-blocking access, together with a waiting primitive (select/poll/etc.), in the style of BSD
- Primary network device visible as any other device, network protocol mounted similar to a file system.
- Both connection management and stream management located in kernel code, in the style of BSD
i remember a fellow student debugging an lsi11 kernel using a form of analogue vectorscope.
i think it had a pair of DACs attached to the upper bits of the address bus. it generated a 2d pattern which you could recognise as particular code - interrupts are here, userspace is there, etc.
the brightness of the spot indicated the time spent, so you got a bit of profiling too - and deadlocks became obvious.
anyone remember these, what where they called? i think it was an HP or Tek product.
-Steve
Wanted to post my notes as plain text, but the bullets / sub-bullets get lost.
Here is a 2 page PDF with my notes on Research Datakit:
https://www.jslite.net/notes/rdk.pdf
The main takeaway is that connection build-up and tear-down is considerably more expensive than with TCP. The first cost is in the network, which builds up a dedicated path for each connection. Bandwidth is not allocated/reserved, but a path is and routing information is set up at each hop. The other cost is in the relatively verbose switch-host communication in this phase. This compares to the 3 packets exchanged at the hosts’ driver level to set up a TCP connection, with no permanent resources consumed in the network.
In compensation, the cost to use a connection is considerably lower: the routing is known and the host-host link protocol (“URP") can be light-weight, as the network guarantees in-order delivery without duplicates but packets may be corrupted or lost (i.e. as if the connection is a phone line with a modem). No need to deal with packet fragmentation, stream reassembly and congestion storms as in the TCP of the early 80’s.
Doing UDP traffic to a fixed remote host is easily mapped to using URP with no error correction and no flow control. Doing UDP where the remote host is different all the time is not practical on a Datakit network (i.e. a virtual circuit would be set up anyway).
A secondary takeaway is that Research Datakit eventually settled on a three-level ascii namespace: “area/trunk/switch”. On each switch, the hosts would be known by name, and each connection request had a service name as parameter. In an alternate reality we would maybe have used “ca/stclara/mtnview!google!www” to do a search.
> From: Rob Pike
> having the switch do some of the call validation and even maybe
> authentication (I'm not sure...) sounds like it takes load off the host.
I don't have enough information to express a judgement in this particular
case, but I can say a few things about how one would go about analyzing
questions of 'where should I put function [X]; in the host, or in the
'network' (which almost inevitably means 'in the switches')'.
It seems to me that one has to examine three points:
- What is the 'cost' to actually _do_ the thing (which might be in
transmission usage, or computing power, or memory, or delay), in each
alternative; these costs obviously generally cannot be amortized across
multiple similar transactions.
- What is the 'cost' of providing the _mechanism_ to do the thing, in each
alternative. This comes in three parts. The first is the engineering cost of
_designing_ the thing, in detail; this obviously is amortized across muiple
instances. The second is _producing_ the mechanism, in the places where it is
needed (for mechanisms in software, this cost is essentially zero, unless it
needs a lot of memory/computes/etc); this is not amortized across many. The
third is harder to measure: it's complexity.
This is probably a book by itself, but it has costs that are hard to
quantify, and are also very disparate: e.g. more complex designs are more
likely to have unforseen bugs, which is very different from the 'cost' that
more complex designs are probaly harder to evolve for new uses.
So far I haven't said anything that isn't applicable across a broad range of
information sytems. The last influence on where one puts functions is much
more common in communication systems: the Saltzer/Clark/Reed 'End-to-end
Arguments in System Design' questions. If one _has_ to put a function in the
host to get 'acceptable' performace of that function, the
operation/implementation/design cost implications are irrelevant: one has to
grit one's teeth and bear them.
This may then feed back to design questions in the other areas. E.g. the
Version 2 ring at MIT deliberately left out hardware packet checksums -
because it was mostly intended for use with TCP/IP traffic, which provided a
pseudo-End-to-End checksum, so the per-unit hardware costs didn't buy enough
to be worth the costs of a hardware CRC. (Which was the right call; I don't
recall the lack of a hardware checksum ever causing a problem.)
And then there's the 'techology is a moving target' point: something that
might be unacceptably expensive (in computing cost) in year X might be fine
in year X+10, when we're lighting our cigars with unneeded computing power.
So when one is designing a communication system with a likely lifetime in
many decades, one tends to bias one's judgement toward things like End-to-End
analysis - because those factors will be forever.
Sorry if I haven't offered any answer to your initial query: "having the
switch do some of the call validation ... sounds like it takes load off the
host", but as I have tried to explain, these 'where should one do [X]'
questions are very complicated, and one would need a lot more detail before
one could give a good answer.
But, in general, "tak[ing] load off the host" doesn't seem to rate
highly as a goal these days... :-) :-(
Noel
> From: Paul Ruizendaal
> Will read those RFC's, though -- thank you for pointing them out.
Oh, I wouldn't bother - unless you are really into routing (i.e. path
selection).
RFC-1992 in particular; it's got my name on it, but it was mostly written by
Martha and Isidro, and I'm not entirely happy with it. E.g. CSC mode and CSS
mode (roughly, strict source route and loose source route); I wasn't really
sold on them, but I was too tired to argue about it. Nimrod was complicated
enough without adding extra bells and whistles - and indeed, LSR and SSR are
basically unused to this day in the Internet (at least, at the internet
layer; MPLS does provide the ability to specify paths, which I gather is used
to some degree). I guess it's an OK overview of the architecture, though.
RFC-1753 is not the best overview, but it has interesting bits. E.g. 2.2
Packet Format Fields, Option 2: "The packet contains a stack of flow-ids,
with the current one on the top." If this reminds you of MPLS, it should!
(One can think of MPLS as Nimrod's packet-carrying subsystem, in some ways.)
I guess I should mention that Nimrod covers more stuff - a lot more - than
just path selection. That's because I felt that the architecture embodied in
IPv4 was missing lots of things which one would need to do the internet layer
'right' in a global-scale Internet (e.g. variable length 'addresses' - for
which we were forced to invent the term 'locator' because many nitwits in the
IETF couldn't wrap their minds around 'addresses' which weren't in every
packet header). And separation of location and identity; and the introduction
of traffic aggregates as first-class objects at the internet layer. Etc, etc,
etc.
Nimrod's main focus was really on i) providing a path-selection system which
allowed things like letting users have more input to selecting the path their
traffic took (just as when one gets into a car, one gets to pick the path
one's going to use), and ii) controlling the overhead of the routing.
Of course, on the latter point, in the real world, people just threw
resources (memory, computing power, bandwidth) at the problem. I'm kind of
blown away< that there are almost 1 million routes in the DFZ these days.
Boiling frogs...
Noel
I thought this comment was very good.
I went looking for “Clem’s Law” (presume Clem Cole) and struck out.
Any hints anyone can suggest or history on the comment?
steve j
==========
Larry McVoy wrote Fri Sep 17 10:44:25 AEST 2021
<https://minnie.tuhs.org/pipermail/tuhs/2021-September/024424.html>
Plan 9 is very cool but I am channeling my inner Clem,
Plan 9 didn't meet Clem's law.
It was never compelling enough to make the masses love it.
Linux was good enough.
==========
--
Steve Jenkin, IT Systems and Design
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA
mailto:sjenkin@canb.auug.org.au http://members.tip.net.au/~sjenkin
Just as the topic of TUHS isn't 'how _I_ could/would build a _better_ OS', but
'history of the OS that was _actually built_' (something that many posters
here seem to lose track of, to the my great irritation), so too the topic
isn't 'how to build a better network' - or actually, anything network-centric.
I'll make a few comments on a couple of things, though.
> From: steve jenkin
> packet switching won over Virtual Circuits in the now distant past but
> in small, local and un-congested networks without reliability
> constraints, any solution can look good. ... Packet switching
> hasn't scaled well to Global size, at least IMHO.
The internetworking architecture, circa 1978, has not scaled as well as would
have been optimal, for a number of reasons, among them:
- pure scaling effects (e.g. algorithms won't scale up; subsystems which
handle several different needs will often need to be separated out at a larger
scale; etc)
- inherent lack of hindsight (unknown unknowns, to use Rumsfeld's phrase; some
things you only learn in hindsight)
- insufficiently detailed knowledge of complete requirements for a
global-scale network (including O+M, eventual business model, etc)
- limited personnel resources at the time (some things we _knew_ were going to
be a problem we had to ignore because we didn't have people to throw at the
problem, then and there)
- rapid technological innovation (and nobody's crystal ball is 100% perfect)
It has been possible to fix some aspects of the ca. 1978 system - e.g. the
addition of DNS, which I think has worked _reasonably_ well - but in other
areas, changes weren't really adequate, often because they were constrained by
things like upward compatibility requirements (e.g. BGP, which, among numerous
other issues, had to live with existing IP addressing).
Having said all that, I think your assertion that virtual circuits would have
worked better in a global-scale network is questionable. The whole point of
networks which use unreliable datagrams as a fundamental building block is
that by moving a lot of functionality into the edge nodes, it makes the
switches a lot simpler. Contemporary core routers may be complex - but they
would be much worse if the network used virtual circuits.
Something I suspect you may be unaware of is that most of the people who
devised the unreliable datagram approach of the internetworking architecture
_had experience with an actual moderately-sized, operational virtual circuit
network_ - the ARPANET. (Yes, it was basically a VC network. Look at things
like RFNMs, links {the specific ARPANET mechanism referred to by this term,
not the general concept}, etc.) So they _knew_ what a VC network would
involve.
So, think about the 'core routers' in a network which used VC's. I guess a
typical core router tese days uses a couple of OC768 links. Assume an average
packet size of 100 bytes (probably roughly accurate, with the bimodal
distribution between data and acks). With 4 OC768's, that's 4*38.5G/800 =
~155M packets/second. I'm not sure of the average TCP connection length in
packets these days, but assume it's 100 packets or so (that's a 100KB Web
object). That's still roughly _1 million cicuit setups per second_.
If the answer is 'oh, we'll use aggregation so core routers don't see
individual connections - or their setup/tear-down' - well, the same
can be done with a datagram system; that's what MPLS does. Work
through the details - VCs were not preferred, for good reasons.
> Ethernet only became a viable LAN technology with advent of Twisted
> pair: point to point + Switches.
It's really irritating that a lot of things labelled 'Ethernet' these days
_aren't_ _real_ Ethernet (i.e. a common broadcast bus allocated via CSMA-CD).
They use the same _packet format_ as Ethernet (especially the 48-bit
globally-unique address, which can usefully be blown into things at
manufacture time), but it's not Ethernet. In some cases, they also retain the
host interface<->network physical interface - but the thing on the other side
of the interface is totally different (such as the hub-based systems commmon
now - as you indicate, it's a bunch of small datagram packet switches plugged
together with point-point links).
Interfaces are forever; like the screw in light-bulb. These days, it's likely
an LED bulb on one side, powered by a reactor on the other - two technologies
which were unforseen (and unforseeable) when the interface was defined, well
over 100 years ago.
Noel
On Tue, Jun 21, 2022 at 05:56:02PM -0600, Jacob Moody wrote:
> I recently stumbled across the existence of datakit
> when going through the plan9foundation source archives.
> Would be curious to hear more about its involvement
> with plan9.
There are at least 2 versions of Datakit. I my current understanding there are “Datakit” which is the research version, and “Datakit II” which seems to be the version that was broadly deployed into the AT&T network in the late 80’s -- but very likely the story is more complicated than that. Plan9 is contemporaneous with Datakit II.
In short, Sandy Fraser developed the “Spider” network in 1970-1974 and this was actively used with early Unix (at least V4, maybe earlier). Sandy was dissatisfied with Spider and used its learnings to start again. The key ideas seem to have gelled together around 1977 with the first switches being available in 1979 or so. The first deployment into the Bell system was around 1982 (initially connecting a handful of Bell sites).
In 1979/1980 there were two Datakit switches, one in the office of Greg Chesson who was writing the first iteration of its control software, and one in the office/lab of Gottfried Luderer et al., who used it to develop a distributed Unix.
Datakit at this time is well described in two papers that the ACM recently moved from behind its paywall:
https://dl.acm.org/doi/pdf/10.1145/1013879.802670 (mostly about 1980 Datakit)
https://dl.acm.org/doi/pdf/10.1145/800216.806604 (mostly about distributed Unix)
The Chesson control software was replaced by new code written by Lee McMahon around 1981 (note: this is still Datakit 1). The Datakit driver code in V8 is designed to work with this revised Datakit. Three aspects of Datakit show through in the design the V8-V10 networking code:
- a separation in control words and data words (this e.g. comes back in ‘streams')
- it works with virtual circuits; a connection is expensive to set up (‘dial’), but cheap to use
- it does not guarantee reliable packet delivery, but it does guarantee in-order delivery
Probably you will see echoes of this in early Plan9 network code, but I have not studied that.
> From: Paul Ruizendaal
> it would seem to me that Sandy had figured out a core problem some 30
> years before the TCP/IP world would come up with a similar solution. I
> would not even be surprised if I learned that modern telco routers
> transparantly set up virtual circuits for tcp traffic.
To fully explore this topic would take a book, which I don't have the energy
to write, and nobody would bother to read, but...
Anyway, I'm not upon the latest and greatest high-speed routers: I saw some
stuff from one major vendor under NDA about a decade ago, but that's my most
recent - but at that point there was nothing that looked even _vaguely_ like
virtual circuits. (The stuff Craig was alluding to was just about
connectivity for getting bitts from _interface_ to _interface_ - if you don't
have a giant crossbar - which is going to require buffering on each input
anyway - how exactly do you get bits from board A to board Q - a single
shared bus isn't going to do it...)
A problem with anything like VC's in core switches is the growth of per-VC
state - a major high-speed node will have packets from _millions_ of TCP
connections flowing through it at any time. In the late-80's/early-90's - well
over 30 years ago - I came up with an advanced routing architecture called
Nimrod (see RFC-1992, "The Nimrod Routing Architecture"; RFC-1753 may be of
interest too); it had things called 'flows' which were half way between pure
datagrams (i.e. no setup - you just stick the right destination address in the
header and send it off) and VCs (read the RFCs if you want to kow why), and it
went to a lot of trouble to allow flow aggregation in traffic going to core
switches _precisely_ to limit the growth of state in core switches, which
would have traffic from millions of connections going through them.
I have barely begun to even scratch the surface, here.
Noel
I’ve been wondering about the growth of Unix and if there’s any good data available.
There’s the Early Unix Epoch, which probably ends with the Unix Support Group assuming the distribution role, plus providing / distributing their version of the code.
Later there’s commercial Unix:
System III and System V, I guess.
BSD, until the lawsuit was resolved, required a Source code license, but their installation count is important in pre-Commercial Unix.
Large licensees like SUN, HP & IBM (AIX) may not have published license counts for their versions - but then, were their derivatives “Unix” or something else?
Warner Loch’s paper has data to around 1978 [below].
I’ve no idea where to find data for USG issued licences, or if the number of binary & source licences were ever reported in the Commercial Era by AT&T.
I’ll not be the first person who’s gone down this road, but my Search Fu isn’t good enough to find them.
Wondering if anyone on the list can point me at resources, even a bunch of annual reports.
I don’t mind manually pulling out the data I’m interested in. But why reinvent the wheel if the work is already done?
steve
===============
numbers extracted from Warner Loch’s paper.
<https://papers.freebsd.org/2020/FOSDEM/losh-Hidden_early_history_of_Unix.fi…>
2nd Edn June 1972 10 installations
3rd Edn February 1973 16
4th Edn November 1973 >20, or 25
July 74 CACM paper "Unix Time Sharing System” after which external interest exploded
6th Edn 1975 ???
7th Edn March 1978 600+, >300 inside Bell System, "even more have been licensed to outside users”
===============
--
Steve Jenkin,
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA
mailto:sjenkin@canb.auug.org.au http://members.tip.net.au/~sjenkin
> From: Dan Cross
> I believe that's actually a menu
Hence the "erroneous _impression_" (emphasis added).
I'm curious as to how they decided which models to run which editions on.
Although V4 _ran_ on the /45, split I+D wasn't supported - for user or kernel
- until V6. (I'm assuming a number of things - both in the kernel, and
applications - started hitting the 64KB limit, which led to its support.)
Speaking of split I+D, there's an interesting little mystery in V6 that at
one point in time I thought involved split I+D - but now that I look closely,
apparently not. The mystery involves a 'tombstone' in the V6 buf.h:
#define B_RELOC 0200 /* no longer used */
I had created (in my mind) an explanation what this is all about - but now
that I look, it's probably all wrong!
My explanation involves the slightly odd layout of the kernel in physical
memory, with split I+D; data below the code, at physical 0. This actually
makes a lot of sense; it means the virtual address of any data (e.g. a
buffer) is the same as its physical address (needed for DMA). It does require
the oddness of 'sysfix', to invert the order of code+data in the system
binary, plus odd little quirks in the assembler startup (e.g. copying the
code up to make room for BSS).
So I thought that B_RELOC was a hangover from a time, at the start of split
I+D, when data _wasn't_ at physical 0, so a buffer's virtual and phsyical
addresses differed.
But that must be wrong (at least in any simple way). B_RELOC was in buf.h as
of V4 - the first kernel version in C - with no split I+D. So my theory has
to be wrong.
However, I am unable to find any code in the V4 kernel which uses it! So
unless someone who remembers the very early PDP-11 kernel can enlighten us,
its purpose will always remain a mystery!
Noel
> From: Paul Ruizendaal
> [c] Fifth Edition UNIX PDP-11/40 June 1974
> [d] Sixth Edition UNIX PDP-11/45 May 1975
> [e] Seventh Edition UNIX PDP-11/70 January 1979
This table gives an erroneous impression of which versions supported which
PDP-11 models. 4th Edition supported only the /45; 5th Edition added support
for the /40; and the /70 appeared in 6th edition.
Noel
Sandy Fraser died June 13. The moving spirit behind Datakit, Sandy
served as director then executive director responsible for computing
science at Bell Labs in the era of v8, v9, and v10. He became VP at
AT&T Shannon Labs after the split with Lucent.
Doug
Excited as I was to see this history of Unix code in a single repository:
https://github.com/dspinellis/unix-history-repo
it continues the long-standing tradition of ignoring all the work done at
Bell Labs after v7. I consider v8 v9 v10 to be worth of attention, even
influential, but to hear this list talk about it - or discussions just
about anywhere else - you'd think they never existed. There are exceptions,
but this site does reinforce the broadly known version of the story.
It's doubly ironic for me because people often mistakenly credit me for
working on Unix, but I landed at the Labs after v7 was long dispatched. At
the Labs, I first worked on what became v8.
I suppose it's because the history flowed as this site shows, with BSD
being the driving force for a number of reasons, but it feels to me that a
large piece of Unix history has been sidelined.
I know it's a whiny lament, but those neglected systems had interesting
advances.
-rob
While I know that there are people here who like good old ed...I've been playing with UTS under VM/370. This version is from 1981 and I think it's v7. But the important thing is that Tom Lyon wrote a 3270 terminal driver, and it comes with ned, which is a screen editor that feels a lot like XEDIT--which wasn't even in CMS at that point, although EE has been added to the VM370 Community Edition I'm using. And the man pages are fullscreen as well.
UTS is very, very usable because of that. This really is a wonderful terminal driver.
So, thank you, Tom!
Adam
> I don't know the exact history of RFS a la System V, but I
> don't think it was Peter Weinberger's stuff, and it certainly
> wasn't his code.
Peter’s code is available in the V8 and V9 trees on TUHS.
The Sys V repositories on Github appear to include RFS code in all of R3.0, R3.1 and R3.2.
At first glance, it seems quite different from the V8/V9 code.
> Peter, being a self-described fan of cheap hacks, also wasn't
> inclined to spend much time thinking about general abstractions;
> in effect he just turned various existing kernel subroutines
> (when applied to a network file system) into RPCs. The
> structure of the file system switch was rather UNIX-specific,
> reflecting that.
Yes, well put. I’ve back ported his filesystem switch to V6/V7 and it is very light touch: on the PDP11 it added only some 500 bytes of kernel code (after some refactoring).
With hindsight it seems such a logical idea, certainly in a context where the labs were experimenting with remote system calls in the mid 70’s (Heinz Lycklama's work on satellite Unix) and early 80’s (Gottfried Luderer et al. on distributed Unix — another forgotten version). It is such a powerful abstraction, but apparently very elusive to invent.
Paul
> I can think of at least 4 things, some big, some small, where post-V7
> Research Unix was influential
Besides streams, file system switch, /proc, and /dev/fd. v8 had the
Blit. Though Rob's relevant patent evoked disgruntled rumblings from
MIT that window systems were old hat, the Blit pioneered multiple
windows as we know them today. On the contemporary Lisp Machine, for
example, active computation happened in only one window at a time.
V8 also had Peter Weinberger's Remote File System. Unlike NFS, RFS
mapped UIDS, thus allowing files to be shared among computers in
different jurisdictions with different UID lists. Unfortunately, RFS
went the way of Reiser paging.
And then there was Norman Wilson, who polished the kernel and
administrative tools. All kinds of things became smaller and
cleaner--an inimitable accomplishment
> No clue what was new in V10
This suggests I should put on my to-do list an update of the Research
Unix Reader's combined table of man-page contents, which covers only
v1-v9. I think it's fair to say, though, that nothing introduced in
v10 was as influential as the features mentioned above.
Doug
I don't know the exact history of RFS a la System V, but I
don't think it was Peter Weinberger's stuff, and it certainly
wasn't his code. Nor his name: he called his first version
neta and his second netb (he knew it would be changing and
allowed for it in the name from the start).
I don't remember us ever calling it RFS, or even remote
file systems, inside 1127; we called it network file systems
(never NFS because the Sun stuff existed by then).
For those who don't know it, Peter's goal was quite different
from that of NFS. The idea behind NFS seems always to have
been to mount a remote file system as if it were local, with
a base assumption early on that everything was within the
same administrative domain so it was OK to make assumptions
about userids matching up, and running code as super-user.
Peter described his intent as `I want to be able to use your
disks, and that's a lot simpler if I don't have to get you
to add code to your kernel, or even to run a program as
super-user.' Hence the entirely-user-mode server program,
which could use super-user privileges to afford access as
any user if it had them, but also worked fine when run as
an ordinary user with only that user's file permissions.
We did in fact normally run it as super-user so each of
our 15 or so VAXes could see the file system tree on each
other, but we also occasionally did it otherwise.
That was one reason device files worked as they did, accessing
the device on the server end rather than acting like a local
special file on the client: we didn't care about running
diskless clients, but we did occasionally care about accessing
a remote system's tape drive.
Peter, being a self-described fan of cheap hacks, also wasn't
inclined to spend much time thinking about general abstractions;
in effect he just turned various existing kernel subroutines
(when applied to a network file system) into RPCs. The
structure of the file system switch was rather UNIX-specific,
reflecting that.
That also means Peter's code was a bit ad-hoc and wonky in
places. He cleaned it up considerably between neta and netb,
and I did further cleanup later. I even had a go at a library
to isolate the network protocol from the server proper, converted
the netb server to use it, and made a few demo servers of my own
like one to read and write raw FILES-11 file systems--useful for
working with the console file system on the VAX 8800 series,
which was exported to the host as a block device--and a daemon
to allow a tar archive to be mounted as a read-only file system.
In modern systems, you can do the same sort of things with FUSE,
and set up the same I-want-to-use-your-disks (or I want to get
at my own files from afar without privileges) scheme with sshfs.
I would be very surprised to learn that either of those borrowed
from their ancient cousins in Research UNIX; so far as I know
they're independent inventions. Either way I'm glad they exist.
Norman Wilson
Toronto ON
interesting to know the vax was a complete dead end.
i do remember jmk (rip) reporting on 9fans, maybe even releasing, the vax plan9 kenc compiler he discovered in a dusty corner of the dump filesystem.
I was intrigued and asked if there was anything else, but he said he said there where no kernel or driver fragments to go with it.
-Steve
For those interested in a quick feel for V8 and early SysV, I recommend the excellent unix50 stuff:
SSH to unix50: "ssh unix50(a)unix50.xn--org-9o0a
Password is “unix50”
You end up in a menu with:
SDF Public Access UNIX System presents ...
/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/
/~/~ H Y S T E R I C A L ~ U N I X ~ S Y S T E M S ~/~/
/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/~/
[a] UNICS (Version Zero) PDP-7 Summer 1969
[b] First Edition UNIX PDP-11/20 November 1971
[c] Fifth Edition UNIX PDP-11/40 June 1974
[d] Sixth Edition UNIX PDP-11/45 May 1975
[e] Seventh Edition UNIX PDP-11/70 January 1979
[f] Research UNIX 8 VAX-11/750 1984
[g] AT&T UNIX System III PDP-11/70 Fall 1982
[h] AT&T UNIX System V PDP-11/70 1983
[i] AT&T UNIX System V 3b2/400 1984
[j] 4.3 BSD MicroVAX June 1986
[k] 2.11 BSD PDP-11/70 January 1992
[w] What's running now?
[q] QUIT (and run away in fear!)
User contributed tutorials are at https://sdf.org/?tutorials/unix50th
Want persistent images? networking? more ttys? Join https://sdf.org
Don’t to exit from a run, press Ctrl-E to return to sims, type 'exit', type ‘q'
I just tried V8 and it still works, although the boot log suggests that an image reset may be in order.
Many, many thanks to whoever is maintaining this!
As one of the few remaining people who has actually worked
with the original dmr stream I/O system, I'd love to dive
into the debate here, but I don't have time. I can't resist
commenting on a few things that have flown by, though. Maybe
I can find time to engage better on the weekend.
-- If you think there were no record delimiters in the stream
system, you don't know enough about it. A good resource to
repair that is Dennis's BSTJ paper:
https://www.bell-labs.com/usr/dmr/www/st.pdf
It's an early description and some of the details later
evolved (for example we realized that once pipes were
streams, we no longer needed pseudoterminals) but the
fundamentals remained constant.
See the section about Message blocks, and the distinction
between data and control blocks. Delimiters were one kind
of control; there were others, some of them potentially
specific to the network at hand. In particular, Datakit
(despite being a virtual-circuit network) had several sorts
of control words, including message delimiters. Delimiters
were necessary even just for terminals, though: how else
does the generic read(2) code know to return early, before
filling the buffer, when a complete line has arrived?
-- It's hard to compare sockets to streams and make much
sense, because they are very different kinds of thing.
When people talk about sockets, especially when complaining
that sockets are a mess, they usually mean the API: system
calls like connect and listen and getpeername and so on.
The stream system was mainly about the internal structure--
the composable modules and the general-purpose queue interface
between them, so that you could take a stream representing
an (already set up) network connection and push the tty
module on it and have a working remote terminal, no need
for user-mode programs and pseudo-terminals.
It's not inconceivable to build a system with socket-like
API and stream internals.
-- Connection setup was initially done with network-specific
magic messages and magic ioctls. Later we moved the knowledge
of that messy crap into network-specific daemons, so a user
program could make a network call just by calling
fd = ipcopen("string-destination-name")
without knowing or caring whether the network transport was
TCP or Datakit or involved forwarding over Datakit to a
gateway that then placed a TCP call to the Internet or whatnot.
That's what the connection server was all about:
https://www.bell-labs.com/usr/dmr/www/spe.pdf
Again, the API is not specific to the stream system. It
wouldn't be hard to write a connection server that provided
the same universal just-use-a-string interface (with the
privileged parts or network details left up to daemons)
on a system with only socket networking; the only subtle
bit is that it needs to be possible to pass an open file
descriptor from one process to another (on the same system),
which I don't think the socket world had early on but I
believe they added long ago.
-- There's nothing especially hard about UDP or broadcast.
It's not as if the socket abstraction has some sort of magic
datagram-specific file descriptor. Since every message sent
and every message received has to include the far end's
address info, you have to decide how to do that, whether
by specifying a format for the data (the first N bytes are
always the remote's address, for example) or provide an
out-of-band mechanism (some ioctl mess that lets you
supply it separately, a la sendto/recvfrom, and encodes it
as a control message).
There was an attempt to make UDP work in the 9th/10th edition
era. I don't think it ever worked very cleanly. When I
took an unofficial snapshot and started running the system
at home in the mid-1990s, I ended up just tossing UDP out,
because I didn't urgently need it (at the time TCP was good
enough for DNS, and I had to write my own DNS resolver anyway).
I figured I'd get around to fixing it later but never did.
But I think the only hard part is in deciding on an interface.
-- It's certainly true that the Research-system TCP/IP code
was never really production-quality (and I say that even
though I used it for my home firewall/gateway for 15 years).
TCP/IP wasn't taken as seriously as it ought to have been
by most of us in 1127 in the 1980s. But that wasn't because
of the stream structure--the IP implementation was in fact
a copy of that from 4.2 (I think) BSD, repackaged and
shoehorned into the stream world by Robert T Morris, and
later cleaned up quite a bit by Paul Glick. Maybe it would
have worked better had it been done from scratch by someone
who cared a lot about it, as the TCP/IP implementors in the
BSD world very much did. Certainly it's a non-trivial design
problem--the IP protocols and their sloppy observance of
layering (cf the `pseudo header' in the TCP and UDP standards)
make them more complicated to implement in a general-purpose
framework.
Or maybe it just can't be done, but I wish someone had
tried in the original simpler setup rather than the
cluttered SVr4 STREAMS.
Norman Wilson
Toronto ON
> Sockets (which btw, totally SUCK PUS) were coded into things
> and even (YECHH) made POSIX and IETF spec status. Streams didn't stand
> a chance.
The question that originally pulled me into researching Unix networking 1975-1985 was more or less “how did we end up with sockets?”. That was 7 years or so ago, I now have a basic feel for how it came to be, and I also have better appreciation of the trade offs. What is the most “Unixy” of networking (as in the API and its semantics) is not something with an easy answer.
If I limit myself to the 1975-1985 time frame, I see three approaches:
1. The API used in Arpanet Unix, which was also used by BBN in its first reference implementation of TCP/IP
2. The BSD sockets API, in two flavours: the Joy flavour in BSD4.1a, and the Karels flavour in BSD4.1c and later
3. The Ritchie/Presotto IPC library / API from V8/V9. This evolved into SysV networking, but the original is the clean idea
At a high level of abstraction, there is a lot of similarity; close-up they are quite different. I like all three solutions!
One thing that struck my attention was that the Ritchie/Presotto IPC library has the concept of “calling” a host and the host/network can reply with a response code (“line busy”, “number unknown”, “not authorised”, etc.). BSD sockets do not cover that. I guess it derives from Spider/Datakit having that functionality, and Arpanet / tcp-ip not having that (resorting to a connection ‘reset’ or dead line instead). Sockets have a more elegant solution for connectionless datagrams (imo), and for the same reason I guess.
Sure, sockets has too much of the implementation sticking through the abstractions, but it is IMO not a bad design. That it became dominant I think is in equal measure due to economics and due to being “good enough”.
If someone has a proposal for a network API that is cleaner and better than what was out there, and would have worked with the hardware and use cases of the early 80’s, I’m all ears. But maybe better on COFF...
Paul