TUHS

tuhs@tuhs.org

16 participants
6536 discussions

by Warren Toomey

All, Matt e-mailed this to me and the TUHS list, but it doesn't seem to have made it through so I'm punting on a copy ... Warren ----- Forwarded message from Matt Gilmore ----- Subject: Documents for UNIX Collections Good afternoon everyone, my name is Matt Gilmore, and I recently worked with some folks here to help facilitate the scanning and release of the "Documents for UNIX" package as well as a few odds and ends pertinent to UNIX/TS 4.0. I've been researching pretty heavily the history of published memoranda and how they ultimately became the formal documents that Western Electric first published with UNIX/TS 5.0 and System V. Think the User's Guide, Graphics Guide, etc. In my research, I've found that document sets in a similar spirit have been published since at least Research Version 6. I've been able to track down a few that are on the TUHS source archive in original *ROFF format (Links given as path in the tree to avoid hyperlink mangling): Research V6: V6/usr/doc Mini-UNIX: Mini-Unix/usr/doc PWB/UNIX 1.0: PWB1/usr/man/man0/documents (note, I'm not sure where the actual docs are, this is just a TOC, Operators Manual is in op in the base man folder) Wollongong 7/32 Version: Interdata732/usr/doc (only 7/32 relevant docs, allegedly) Research V7: V7/usr/doc UNIX/32V: 32V/usr/doc There are probably others, but these are the ones I'm aware of on the archive for Bell-aligned revisions prior to the commercialization of UNIX/TS as System III. On the note of System III, I seem to have an archive that is slightly different than what is on TUHS, namely in that it has this same documents collection. I can't find it in the System III section on the site, so I'm assuming it isn't hosted anywhere presently. One of the projects I'm working on (slowly) is comparing these documents with the 4.0 docs I scanned for Arnold and making edits to the *ROFF sources with the hopes I could then use them to produce 1:1 clean copies of the 4.0 docs, while providing an easy means for diff'ing the documents as well (to flush out changes between 3.0 and 4.0). Happy to provide this dump to Warren for comparison with what is currently hosted. Usenix also published documentation sets for 4.2 and 4.3BSD in the 80's which served the same purpose for BSD users. There seems to be a 4.4BSD set as well, although I haven't looked at these yet, I've got a random smattering between 4.2 and 4.3 of the comb-bound Usenix manuals, but I assume the 4.4 set is in a similar vein, with reference guides and supplementary documents. Looks like a lot of the same, but with added documents regarding developments at Berkeley. Now for my reasons for mailing, there are a couple: 1. Is anyone aware of whether similar document sets were compiled for MERT, UNIX/RT, USG Program Generic, or CB-UNIX? Or would users of those systems have simply been referred to the collection most closely matching the version they're forked from? 2. Was there ever any such document set published in this nature as "Documents for UNIX" consistent of memoranda for 5.0/System V? Or did USG immediately begin by providing just the published trade manuals? The implication here is if USG published no such documents, then the Documents for UNIX 4.0 represents the last time USG compiled the memoranda as they were written (of course with version-related edits) with original authorship and references as a documentation set. 3. Have there been any known efforts to analyze the history and authorship of these documents, explicitly denote errata and revisions, and map out the evolution of the system from a documentation perspective like this? Thanks for any insight anyone can provide! - Matt G. P.S. I'd be interested in doing more preservation work, if anyone else has documents that need preserving, I'll happily coordinate shipment and scanning. P.P.S. Ccing Warren, I don't know if I'm able to send emails to this list or not, so pardon the extraneous email if not necessary. ----- End forwarded message -----

3 years

ed: multiple addresses (with semicolons)

by markus schnalke

Hoi, via a recent message from Chris Pinnock to the list I became aware of the book ``Ed Mastery'' by Michael W. Lucas. At once I bought and read it. Although it is not on the mastery level it claims and I would have liked it to be, it still was fun to read. This brought me back to my ed interest. I like ed a lot and despite my young age, I've actually programmed with ed for fun and have prepared the troff slides for a talk on early Unix tools (like ed) with ed alone. I use the Heirloom version of ed. Anyways, I wondered about the possibility to give multiple addresses ... more than two for relative address searches. For example, to print the context of the first occurance of `argv' within the main function, you can use: /^main(/;/\<argv\>/-2;+4n For the last occurance it's even one level more: /^main(/;/^}/;?\<argv\>?-2;+4n (The semicolons mean that the next search or relative addressing starts at the result of the previous one. I.e. in this case: We go to the `main' function, from there go to the function end, then backwards to `argv' minus two lines and print (with line numbers) this line and four lines more.) The manpage of 6th Edition mentiones this possibility to give more than two addresses: Commands may require zero, one, or two addresses. Commands which require no addresses regard the presence of an address as an error. Commands which accept one or two addresses assume default addresses when insufficient are given. If more addresses are given than such a command requires, the last one or two (depending on what is accepted) are used. http://man.cat-v.org/unix-6th/1/ed You can see it in the sources as well: https://www.tuhs.org/cgi-bin/utree.pl?file=V6/usr/source/s1/ed.c (Search for ';' to find the line. There's a loop processing the addresses.) V5 ed(1) is in assembler, however, which I cannot read. Thus there must have been a complete rewrite, maybe introducing this feature at that point. (I don't know where to find v5 manpage to check that as well.) I wonder how using multiple addresses for setting starting points for relative searches came to be. When was it implemented and what use cases drove this features back in the days? Or was it more an accident that was introduced by the implementation, which turned out to be useful? Or maybe it existed already in earlier versions of ed, althoug maybe undocumented. For reference, POSIX writes: Commands accept zero, one, or two addresses. If more than the required number of addresses are provided to a command that requires zero addresses, it shall be an error. Otherwise, if more than the required number of addresses are provided to a command, the addresses specified first shall be evaluated and then discarded until the maximum number of valid addresses remain, for the specified command. https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ed.html Here more explanation rom the rationale section: Any number of addresses can be provided to commands taking addresses; for example, "1,2,3,4,5p" prints lines 4 and 5, because two is the greatest valid number of addresses accepted by the print command. This, in combination with the <semicolon> delimiter, permits users to create commands based on ordered patterns in the file. For example, the command "3;/foo/;+2p" will display the first line after line 3 that contains the pattern foo, plus the next two lines. Note that the address "3;" must still be evaluated before being discarded, because the search origin for the "/foo/" command depends on this. As far as I can see, multiple addresses make only sense with the semicolon separator, because the comma separator does not change the state, thus previous addresses can have no effect on later addresses. The implementation just does not forbid them, for simplicity reasons. meillo

3 years

Stuart Feldman's EFL

by arnold＠skeeve.com

Hi. EFL was definitely a part of BSD Unix. But I don't see it in the V7 stuff in the TUHS archives. When did it first appear? Was it part of 32V and I should look there? It is definitely in the V8 and V10 stuff. Did anyone actually use it? I have the feeling that ratfor had already caught on and spread far, and that it met people's needs, and so EFL didn't really catch on that much, even though it provided more features on top of Fortran. Thanks, Arnold

3 years

Re: EFL

by Steve Simon

I remember reading the EFL docs in the paper manuals for a sysv.3.2 honeywell 68k tower machine i worked on circa 1987. i never tried it though.

3 years

Re.: is networking different?

by Paul Ruizendaal

> On Sun, Jul 3, 2022 at 1:33 PM Marc Donner wrote: > > I've been ruminating on the question of whether networks are different from > disks (and other devices). Here are a couple of observations: [...] From my perspective most of these things are not unique to networks, they happen with disks and/or terminals. Only out-of-order delivery seems new. However, in many early networking contexts (Spider/Arpanet/Datakit/UUCP) this aspect was not visible to the host (and the same holds for a single segment ethernet). To me, in some ways networks are like tty’s (e.g. completing i/o can take arbitrarily long, doing a seek() does not make sense), in other ways they are like disks (raw devices are organised into byte streams, they have a name space). Uniquely, they have two end-points, only one of which is local (but pipes come close). Conceptually, a file system does two things: (i) it organises raw blocks into multiple files; these are the i-nodes and (ii) it provides a name space; these are directories and the namei routine. A network stack certainly does the first: a raw network device is organised into multiple pipe-like connections; depending on the network, it optionally offers a naming service. With the first aspect one could refer to any file by “major device number, minor device number, i-node number”. This is not very different from referring to a network stream by “network number, host number, port number” in tcp/ip (and in fact this is what bind() and connect() in the sockets API do), or “switch / host / channel” in Datakit. For disks, Unix offers a clean way to organise the name spaces of multiple devices into a unified whole. How to do this with networks is not so easy, prior to the invention of the file system switch. Early on (Arpanet Unix), it was tried to incorporate host names into a net directory by name (RFC 681) but this is not scalable. Another way would be to have a virtual directory and include only names for active connections. The simple way would be to use a text version of the numeric name as described above - but that is not much of an improvement. Better to have a network variant of namei that looks up symbolic names in a hosts file or in a network naming service. The latter does not look very performant on the hardware of 40 years ago, but it appears to have worked well on the Alto / PuPs network at Xerox PARC. With the above one could do open(“/net/inet/org.tuhs.www:80”, O_RDWR | O_STREAM) to connect to the TUHS web server, and do open(“/net/inet/any:80”, O_RDWR | O_STREAM | O_CREAT, 0600) to create a ‘listening’ (rendez-vous) socket. Paul

3 years

Re: Thoughts on Licenses

by Larry McVoy

On Sun, Jul 03, 2022 at 05:55:15PM +1000, steve jenkin wrote: > > > On 3 Jul 2022, at 12:27, Larry McVoy <lm(a)mcvoy.com> wrote: > > > > I love the early Unix releases because they were so simple, processors > > were simple then as well. > > > Bell???s Observation on Computer Classes has brought surprises > - we???ve had some very popular new devices appear at the bottom end of the market and sell in the billions. Yes, and they all run Linux or some tiny OS. Has anyone ported v7 to any of these devices and seen it take off? Of course not, it doesn't have TCP/IP.

3 years

is networking different?

by Marc Donner

On June 28 Rob Pike wrote: "One of the reasons I'm not a networking expert may be relevant here. With networks, I never found an abstraction to hang my hat on. Unlike with file systems and files, or even Unix character devices, which provide a level of remove from the underlying blocks and sectors and so on, the Unix networking interface always seemed too low-level and fiddly, analogous to making users write files by managing the blocks and sectors themselves." I've been ruminating on the question of whether networks are different from disks (and other devices). Here are a couple of observations: 1 - Two different packets may take two different paths from the sender to the receiver. 1a - The transit time for one packet may vary widely from that of the other. 1b - The two packets may arrive in an order different from the order in which they were transmitted. (Note - recently I have been reading Bob Gezelter's monograph [and PhD dissertation] and I've learned that modern high-performance disk systems behave more like networks in 1a and 1b.) 2 - A packet may never arrive. 3 - Behavior 2 not a sign of hard failure for networks, whereas it is generally considered so for other I/O devices. There is probably more to why networks are weird, but these are some of the big dissonances that seem to me to make Rob's comment resonate so loudly to me. Best, Marc ===== nygeek.net mindthegapdialogs.com/home <https://www.mindthegapdialogs.com/home>

3 years

Thoughts on Licenses

by Clem Cole

As part of some of simh work, I've been immersed in some licensing discussions. Thanks for the V8-10, Plan-9 and Inferno notes - they are relevant. Anyway, WRT to TUHS, I'm thinking that at least in the case of the Unix style bits, I propose a small change to Waren's top-level directory. Add a new dir called something like 'Legal Docs' or 'Copyrights+Licenses'. Then move the Caldera document and Warren's current note into that area. Then add copies of anything we can collect like the Dan Cross's V8-10, anything WRT to Plan9/Inferno or anything we from the UNIX world - such as something Sun, DEC or HP or like might have added. Maybe add a subdirectory with the AT&T/USL case details. And maybe add a sub-directory with known FOSS licenses used by the UNIX community and add a copy of the 3-clause BSD and maybe even the two GPLs. Then update the README in the current top-level dir. Adding to the contents something like "*the IP contained on this website is covered by different licenses depending on the specific IP. Copies of these can be found with the source code itself, but have also been all collected together in the top-level directory: ...*." I think these all have both historical values, as well as practical values. As I said, I was not sure myself and I think other would be less ignorant if they could find it all easily. In the case of the practical, a for instance, in an email with some lawyers last week, I had pointed them at the Caldera document. I'ld have loved to have been able to say look in this directory. The Caldera and later Nokia Licenses are what we are considering as examples. Thoughts?

3 years

Research Datakit notes

by Geoff Pool

I've enjoyed reading this thread as networking has always been a passion of mine. Lawrence Livermore had, at one time, their own networking system they called Spider. Is this the same Spider technology that Sandy Fraiser references in his Datakit notes? Geoff

3 years

Re: "9 skills our grandkids won't have" - Is this a TUHS topic?

by Nelson H. F. Beebe

>> I don't know the answer to Ctrl-D. The Unix command "man ascii" has the answer: Oct Dec Hex Char Oct Dec Hex Char ------------------------------------------------------------------------ 000 0 00 NUL '\0' 100 64 40 @ 001 1 01 SOH (start of heading) 101 65 41 A 002 2 02 STX (start of text) 102 66 42 B 003 3 03 ETX (end of text) 103 67 43 C 004 4 04 EOT (end of transmission) 104 68 44 D .... Ctrl-D signifies end of transmission. Some other O/Ses have used Ctrl-Z for that purpose, presumably because Z is the final letter of numerous alphabets. There is a good book about the history of character sets (pre-Unicode) in the book described at this URL: http://www.math.utah.edu/pub/tex/bib/master.html#Mackenzie:1980:CCS Bob Bemer (1920--2004), known as Dr. ASCII to some of us, was a key person in the standardization of character sets: https://en.wikipedia.org/wiki/Bob_Bemer https://en.wikipedia.org/wiki/ASCII ------------------------------------------------------------------------------- - Nelson H. F. Beebe Tel: +1 801 581 5254 - - University of Utah - - Department of Mathematics, 110 LCB Internet e-mail: beebe(a)math.utah.edu - - 155 S 1400 E RM 233 beebe(a)acm.org beebe(a)computer.org - - Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ - -------------------------------------------------------------------------------

3 years

"9 skills our grandkids won't have" - Is this a TUHS topic?

by steve jenkin

What are the 1970’s & 1980’s Computing / IT skills “our grandkids won’t have”? Whistling into a telephone while the modem is attached, because your keyboard has a stuck key - something I absolutely don’t miss. Having a computer in a grimy wharehouse with 400 days of uptime & wondering how a reboot might go? steve j ========= 9 Skills Our Grandkids Will Never Have <https://blog.myheritage.com/2022/06/9-skills-our-grandkids-will-never-have/> 1: Using record players, audio cassettes, and VCRs 2: Using analog phones [ or an Analog Clock ] 3. Writing letters by hand and mailing them 4. Reading and writing in cursive 5. Using manual research methods [ this is a Genealogy site ] 6. Preparing food the old-fashioned way 7. Creating and mending clothing 8. Building furniture from scratch 9. Speaking the languages of their ancestors -- Steve Jenkin, IT Systems and Design 0412 786 915 (+61 412 786 915) PO Box 38, Kippax ACT 2615, AUSTRALIA mailto:sjenkin@canb.auug.org.au http://members.tip.net.au/~sjenkin

3 years

Re: Typesetting Mathematics by Kernighan and Cherry, retypeset

by Norman Wilson

Warner Losh: Alcatel-Lucent gave an official grant to V8, V9 and V10. See https://www.tuhs.org/Archive/Distributions/Research/Dan_Cross_v8/statement_… ==== Quite so. I believe this was announced on a mailing list called TUHS. Those here who are interested in such things might want to subscribe; I have and find it quite useful and interesting, with occasional disappointment. Norman Wilson Toronto ON (typing this on a train in Texas)

3 years

Re: Typesetting Mathematics by Kernighan and Cherry, retypeset

by Douglas McIlroy

> I understand UNIX v7 is under this BSD-style license by Caldera Inc. > https://www.tuhs.org/Archive/Caldera-license.pdf The eqn document by Kernighan and Cherry also appears in the v10 manual, copyright by AT&T and published as a trade book. Wouldn't the recent release of v10 also pertain to the manual? Doug

3 years

Compare, contrast and a "Unixy" networking API

by Paul Ruizendaal

Following an insightful post by Norman Wilson (https://minnie.tuhs.org/pipermail/tuhs/2022-June/025929.html) and re-reading a few old papers (https://minnie.tuhs.org/pipermail/tuhs/2022-June/026028.html) I was thinking about similarities and differences between the various Unix networking approaches in the 1975-1985 era and came up with the following observations: - First something obvious: early Unix was organised around two classes of device: character based and block based. Arguably, it is maybe better to think of these classes conceptually as “transient” and “memoizing”. A difference between the two would be wether or not it makes conceptual sense to do a seek operation on them and pipes and networks are in the transient class. - On the implementation side, this relates two early kernel data structures: clists and disk buffers. Clists were designed for slow, low volume traffic and most early Unix network code creates a third kind: the mbufs of Arpanet Unix, BBN-TCP Unix and BSD, the packets of Chesson's V7 packet driver, Ritchie's streams etc. These are all the same when seen from afar: higher capacity replacements for clists. - Typically devices are accessed via a filter. At an abstract level, there is not much difference between selecting a line discipline, pushing a stream filter or selecting a socket type. At the extreme end one could argue that pushing a TCP stack on a network device is conceptually the same as mounting a file system on a disk device. Arguably, both these operations could be performed through a generalised mount() call. - Another implementation point is the organisation of the code. Is the network code in the kernel, or in user land? Conceptually connection management is different from stream management when connected (e.g. CMC and URP with Datakit, or RTP and BSP in Xerox Pups). In the BSD lineage all is in the kernel, and in the Research lineage connection management is done in a user space daemon. Arpanet Unix (originally based on V5) had a curious solution: the network code was organised in a single process, but with code both in kernel mode and in user mode. The user code would make a special system call, and the kernel code would interact with the IMP driver, manage buffers and deliver packets. Only when a state-changing event happened, it would return to user mode and the user code would handle connection management (followed by a new call into kernel mode). Interestingly, this approach mostly hid the IMP connection, and this carried through to the BSD’s where the network devices were also buried in the stack. Arpanet Unix made this choice to conserve kernel address space and to minimize the amount of original kernel code that had to be touched. - Early Unix has three ways to obtain a file descriptor: open, creat and pipe. Later also fifo. In this context adding more (like socket) does not seem like a mortal sin. Arguably, all could be rolled into one, with open() handling all cases. Some of this was done in 4.2BSD. It is possible to combine socket() & friends into open() with additional flags, much as was done in Arpanet Unix and BBN-TCP Unix. - Network connections have different meta data than disk files, and in sockets this handled via specialised calls. This seems a missed opportunity for unified mechanisms. The API used in BBN-TCP handles most of this via ioctl. However, one could (cheekily!) argue that V7 unix has a somewhat byzantine meta data API, with the functionality split over seek, ioctl, fcntl, stat and fstat. These could all be handled in a generalised ioctl. Conceptually, this could also be replaced by using read/write on a meta data file descriptor, which could for example be the regular descriptor with the high bit set. But this, of course, did never exist. - A pain point in Arpanet Unix was that a listening connection (i.e. a server endpoint) would block until a client arrived and then turn into the connection with the client. This would fork out into a service process and the main server process would open a new listening socket for the next client. In sockets this was improved into a rendez-vous type server connection that would spawn individual client connections via ‘accept’. The V8/V9 IPC library took a similar approach, but also developed the mechanism into a generalized way to (i) create rendez-vous points and (ii) ship descriptors across local connections. - The strict blocking nature of IO in early Unix was another pain point for writing early network code. The first solution to that were BBN’s await and capac primitives, which worked around the blocking nature. With SysIII, non-blocking file access appeared and 4.1a BSD saw the arrival of 'select’. Together these offer a much more convenient way to deal with multiple tty or network streams in a single threaded process (although it did modify some of the early Unix philosophy). Non-blocking IO and select() also appeared in the Research lineage with 8th edition. - The file system switch (FSS) arrived around 1983, during the gestation of 8th edition. This was just 1 or 2 years after the network interfaces for BSD and Datakit got their basic shape. Had the FSS been part of V7 (as it well could have been), probably the networking designs would have been a bit different, using virtual directories for networking connections. The ‘namei hack’ in MIT’s CHAOS network code already points in this direction. A similar approach could have been extended to named pipes (arriving in SysIII), where the fifo endpoint could have been set up through creating a file in a virtual directory, and making connections through a regular open of such a virtual file (and 9th edition appears to implement this.) oOo To me it seems that the V1-V7 abstractions, the system call API, etc. were created with the experience of CTSS, Multics and others fresh in mind. The issues were understood and it combined the best of the ideas that came before. When it came to networking, Unix did not have this advantage and was necessarily trying to ride a bike whilst inventing it. Maybe in a different time line it would have been possible to pick the best ideas in this area as well and combine these into a coherent framework. I concur with the observation that this list should be about discussion of what once was and only tangentially about what might have been, so it is only after considerable hesitation that I write the below. Looking at the compare and contrast above (and having been tainted by what became dominant in later decades), I would say that the most “Unixy” way to add networking to V7/SysIII era Unix would have been something like: - Network access via open/read/write/close, in the style of BBN-TCP - Network namespace exposed via a virtual file system, a bit like V9 - Meta data via a generalised ioctl, or via read/write on a meta data descriptor - Connection rendez-vous via a generalised descriptor shipping mechanism, in the style of V8/V9 - Availability of non-blocking access, together with a waiting primitive (select/poll/etc.), in the style of BSD - Primary network device visible as any other device, network protocol mounted similar to a file system. - Both connection management and stream management located in kernel code, in the style of BSD

3 years

kernel debugging in analogue

by Steve Simon

i remember a fellow student debugging an lsi11 kernel using a form of analogue vectorscope. i think it had a pair of DACs attached to the upper bits of the address bus. it generated a 2d pattern which you could recognise as particular code - interrupts are here, userspace is there, etc. the brightness of the spot indicated the time spent, so you got a bit of profiling too - and deadlocks became obvious. anyone remember these, what where they called? i think it was an HP or Tek product. -Steve

3 years

Research Datakit notes

by Paul Ruizendaal

Wanted to post my notes as plain text, but the bullets / sub-bullets get lost. Here is a 2 page PDF with my notes on Research Datakit: https://www.jslite.net/notes/rdk.pdf The main takeaway is that connection build-up and tear-down is considerably more expensive than with TCP. The first cost is in the network, which builds up a dedicated path for each connection. Bandwidth is not allocated/reserved, but a path is and routing information is set up at each hop. The other cost is in the relatively verbose switch-host communication in this phase. This compares to the 3 packets exchanged at the hosts’ driver level to set up a TCP connection, with no permanent resources consumed in the network. In compensation, the cost to use a connection is considerably lower: the routing is known and the host-host link protocol (“URP") can be light-weight, as the network guarantees in-order delivery without duplicates but packets may be corrupted or lost (i.e. as if the connection is a phone line with a modem). No need to deal with packet fragmentation, stream reassembly and congestion storms as in the TCP of the early 80’s. Doing UDP traffic to a fixed remote host is easily mapped to using URP with no error correction and no flow control. Doing UDP where the remote host is different all the time is not practical on a Datakit network (i.e. a virtual circuit would be set up anyway). A secondary takeaway is that Research Datakit eventually settled on a three-level ascii namespace: “area/trunk/switch”. On each switch, the hosts would be known by name, and each connection request had a service name as parameter. In an alternate reality we would maybe have used “ca/stclara/mtnview!google!www” to do a search.

3 years

Re: Research Datakit notes

by jnc＠mercury.lcs.mit.edu

> From: Rob Pike > having the switch do some of the call validation and even maybe > authentication (I'm not sure...) sounds like it takes load off the host. I don't have enough information to express a judgement in this particular case, but I can say a few things about how one would go about analyzing questions of 'where should I put function [X]; in the host, or in the 'network' (which almost inevitably means 'in the switches')'. It seems to me that one has to examine three points: - What is the 'cost' to actually _do_ the thing (which might be in transmission usage, or computing power, or memory, or delay), in each alternative; these costs obviously generally cannot be amortized across multiple similar transactions. - What is the 'cost' of providing the _mechanism_ to do the thing, in each alternative. This comes in three parts. The first is the engineering cost of _designing_ the thing, in detail; this obviously is amortized across muiple instances. The second is _producing_ the mechanism, in the places where it is needed (for mechanisms in software, this cost is essentially zero, unless it needs a lot of memory/computes/etc); this is not amortized across many. The third is harder to measure: it's complexity. This is probably a book by itself, but it has costs that are hard to quantify, and are also very disparate: e.g. more complex designs are more likely to have unforseen bugs, which is very different from the 'cost' that more complex designs are probaly harder to evolve for new uses. So far I haven't said anything that isn't applicable across a broad range of information sytems. The last influence on where one puts functions is much more common in communication systems: the Saltzer/Clark/Reed 'End-to-end Arguments in System Design' questions. If one _has_ to put a function in the host to get 'acceptable' performace of that function, the operation/implementation/design cost implications are irrelevant: one has to grit one's teeth and bear them. This may then feed back to design questions in the other areas. E.g. the Version 2 ring at MIT deliberately left out hardware packet checksums - because it was mostly intended for use with TCP/IP traffic, which provided a pseudo-End-to-End checksum, so the per-unit hardware costs didn't buy enough to be worth the costs of a hardware CRC. (Which was the right call; I don't recall the lack of a hardware checksum ever causing a problem.) And then there's the 'techology is a moving target' point: something that might be unacceptably expensive (in computing cost) in year X might be fine in year X+10, when we're lighting our cigars with unneeded computing power. So when one is designing a communication system with a likely lifetime in many decades, one tends to bias one's judgement toward things like End-to-End analysis - because those factors will be forever. Sorry if I haven't offered any answer to your initial query: "having the switch do some of the call validation ... sounds like it takes load off the host", but as I have tried to explain, these 'where should one do [X]' questions are very complicated, and one would need a lot more detail before one could give a good answer. But, in general, "tak[ing] load off the host" doesn't seem to rate highly as a goal these days... :-) :-( Noel

3 years

Re: Research Datakit notes

by jnc＠mercury.lcs.mit.edu

> From: Paul Ruizendaal > Will read those RFC's, though -- thank you for pointing them out. Oh, I wouldn't bother - unless you are really into routing (i.e. path selection). RFC-1992 in particular; it's got my name on it, but it was mostly written by Martha and Isidro, and I'm not entirely happy with it. E.g. CSC mode and CSS mode (roughly, strict source route and loose source route); I wasn't really sold on them, but I was too tired to argue about it. Nimrod was complicated enough without adding extra bells and whistles - and indeed, LSR and SSR are basically unused to this day in the Internet (at least, at the internet layer; MPLS does provide the ability to specify paths, which I gather is used to some degree). I guess it's an OK overview of the architecture, though. RFC-1753 is not the best overview, but it has interesting bits. E.g. 2.2 Packet Format Fields, Option 2: "The packet contains a stack of flow-ids, with the current one on the top." If this reminds you of MPLS, it should! (One can think of MPLS as Nimrod's packet-carrying subsystem, in some ways.) I guess I should mention that Nimrod covers more stuff - a lot more - than just path selection. That's because I felt that the architecture embodied in IPv4 was missing lots of things which one would need to do the internet layer 'right' in a global-scale Internet (e.g. variable length 'addresses' - for which we were forced to invent the term 'locator' because many nitwits in the IETF couldn't wrap their minds around 'addresses' which weren't in every packet header). And separation of location and identity; and the introduction of traffic aggregates as first-class objects at the internet layer. Etc, etc, etc. Nimrod's main focus was really on i) providing a path-selection system which allowed things like letting users have more input to selecting the path their traffic took (just as when one gets into a car, one gets to pick the path one's going to use), and ii) controlling the overhead of the routing. Of course, on the latter point, in the real world, people just threw resources (memory, computing power, bandwidth) at the problem. I'm kind of blown away< that there are almost 1 million routes in the DFZ these days. Boiling frogs... Noel

3 years

Clem's Law.

by steve jenkin

I thought this comment was very good. I went looking for “Clem’s Law” (presume Clem Cole) and struck out. Any hints anyone can suggest or history on the comment? steve j ========== Larry McVoy wrote Fri Sep 17 10:44:25 AEST 2021 <https://minnie.tuhs.org/pipermail/tuhs/2021-September/024424.html> Plan 9 is very cool but I am channeling my inner Clem, Plan 9 didn't meet Clem's law. It was never compelling enough to make the masses love it. Linux was good enough. ========== -- Steve Jenkin, IT Systems and Design 0412 786 915 (+61 412 786 915) PO Box 38, Kippax ACT 2615, AUSTRALIA mailto:sjenkin@canb.auug.org.au http://members.tip.net.au/~sjenkin

3 years

Re: Research Datakit notes

by jnc＠mercury.lcs.mit.edu

Just as the topic of TUHS isn't 'how _I_ could/would build a _better_ OS', but 'history of the OS that was _actually built_' (something that many posters here seem to lose track of, to the my great irritation), so too the topic isn't 'how to build a better network' - or actually, anything network-centric. I'll make a few comments on a couple of things, though. > From: steve jenkin > packet switching won over Virtual Circuits in the now distant past but > in small, local and un-congested networks without reliability > constraints, any solution can look good. ... Packet switching > hasn't scaled well to Global size, at least IMHO. The internetworking architecture, circa 1978, has not scaled as well as would have been optimal, for a number of reasons, among them: - pure scaling effects (e.g. algorithms won't scale up; subsystems which handle several different needs will often need to be separated out at a larger scale; etc) - inherent lack of hindsight (unknown unknowns, to use Rumsfeld's phrase; some things you only learn in hindsight) - insufficiently detailed knowledge of complete requirements for a global-scale network (including O+M, eventual business model, etc) - limited personnel resources at the time (some things we _knew_ were going to be a problem we had to ignore because we didn't have people to throw at the problem, then and there) - rapid technological innovation (and nobody's crystal ball is 100% perfect) It has been possible to fix some aspects of the ca. 1978 system - e.g. the addition of DNS, which I think has worked _reasonably_ well - but in other areas, changes weren't really adequate, often because they were constrained by things like upward compatibility requirements (e.g. BGP, which, among numerous other issues, had to live with existing IP addressing). Having said all that, I think your assertion that virtual circuits would have worked better in a global-scale network is questionable. The whole point of networks which use unreliable datagrams as a fundamental building block is that by moving a lot of functionality into the edge nodes, it makes the switches a lot simpler. Contemporary core routers may be complex - but they would be much worse if the network used virtual circuits. Something I suspect you may be unaware of is that most of the people who devised the unreliable datagram approach of the internetworking architecture _had experience with an actual moderately-sized, operational virtual circuit network_ - the ARPANET. (Yes, it was basically a VC network. Look at things like RFNMs, links {the specific ARPANET mechanism referred to by this term, not the general concept}, etc.) So they _knew_ what a VC network would involve. So, think about the 'core routers' in a network which used VC's. I guess a typical core router tese days uses a couple of OC768 links. Assume an average packet size of 100 bytes (probably roughly accurate, with the bimodal distribution between data and acks). With 4 OC768's, that's 4*38.5G/800 = ~155M packets/second. I'm not sure of the average TCP connection length in packets these days, but assume it's 100 packets or so (that's a 100KB Web object). That's still roughly _1 million cicuit setups per second_. If the answer is 'oh, we'll use aggregation so core routers don't see individual connections - or their setup/tear-down' - well, the same can be done with a datagram system; that's what MPLS does. Work through the details - VCs were not preferred, for good reasons. > Ethernet only became a viable LAN technology with advent of Twisted > pair: point to point + Switches. It's really irritating that a lot of things labelled 'Ethernet' these days _aren't_ _real_ Ethernet (i.e. a common broadcast bus allocated via CSMA-CD). They use the same _packet format_ as Ethernet (especially the 48-bit globally-unique address, which can usefully be blown into things at manufacture time), but it's not Ethernet. In some cases, they also retain the host interface<->network physical interface - but the thing on the other side of the interface is totally different (such as the hub-based systems commmon now - as you indicate, it's a bunch of small datagram packet switches plugged together with point-point links). Interfaces are forever; like the screw in light-bulb. These days, it's likely an LED bulb on one side, powered by a reactor on the other - two technologies which were unforseen (and unforseeable) when the interface was defined, well over 100 years ago. Noel

3 years

Re: forgotten versions

by Paul Ruizendaal

On Tue, Jun 21, 2022 at 05:56:02PM -0600, Jacob Moody wrote: > I recently stumbled across the existence of datakit > when going through the plan9foundation source archives. > Would be curious to hear more about its involvement > with plan9. There are at least 2 versions of Datakit. I my current understanding there are “Datakit” which is the research version, and “Datakit II” which seems to be the version that was broadly deployed into the AT&T network in the late 80’s -- but very likely the story is more complicated than that. Plan9 is contemporaneous with Datakit II. In short, Sandy Fraser developed the “Spider” network in 1970-1974 and this was actively used with early Unix (at least V4, maybe earlier). Sandy was dissatisfied with Spider and used its learnings to start again. The key ideas seem to have gelled together around 1977 with the first switches being available in 1979 or so. The first deployment into the Bell system was around 1982 (initially connecting a handful of Bell sites). In 1979/1980 there were two Datakit switches, one in the office of Greg Chesson who was writing the first iteration of its control software, and one in the office/lab of Gottfried Luderer et al., who used it to develop a distributed Unix. Datakit at this time is well described in two papers that the ACM recently moved from behind its paywall: https://dl.acm.org/doi/pdf/10.1145/1013879.802670 (mostly about 1980 Datakit) https://dl.acm.org/doi/pdf/10.1145/800216.806604 (mostly about distributed Unix) The Chesson control software was replaced by new code written by Lee McMahon around 1981 (note: this is still Datakit 1). The Datakit driver code in V8 is designed to work with this revised Datakit. Three aspects of Datakit show through in the design the V8-V10 networking code: - a separation in control words and data words (this e.g. comes back in ‘streams') - it works with virtual circuits; a connection is expensive to set up (‘dial’), but cheap to use - it does not guarantee reliable packet delivery, but it does guarantee in-order delivery Probably you will see echoes of this in early Plan9 network code, but I have not studied that.

3 years

Re: Research Datakit notes

by jnc＠mercury.lcs.mit.edu

> From: Paul Ruizendaal > it would seem to me that Sandy had figured out a core problem some 30 > years before the TCP/IP world would come up with a similar solution. I > would not even be surprised if I learned that modern telco routers > transparantly set up virtual circuits for tcp traffic. To fully explore this topic would take a book, which I don't have the energy to write, and nobody would bother to read, but... Anyway, I'm not upon the latest and greatest high-speed routers: I saw some stuff from one major vendor under NDA about a decade ago, but that's my most recent - but at that point there was nothing that looked even _vaguely_ like virtual circuits. (The stuff Craig was alluding to was just about connectivity for getting bitts from _interface_ to _interface_ - if you don't have a giant crossbar - which is going to require buffering on each input anyway - how exactly do you get bits from board A to board Q - a single shared bus isn't going to do it...) A problem with anything like VC's in core switches is the growth of per-VC state - a major high-speed node will have packets from _millions_ of TCP connections flowing through it at any time. In the late-80's/early-90's - well over 30 years ago - I came up with an advanced routing architecture called Nimrod (see RFC-1992, "The Nimrod Routing Architecture"; RFC-1753 may be of interest too); it had things called 'flows' which were half way between pure datagrams (i.e. no setup - you just stick the right destination address in the header and send it off) and VCs (read the RFCs if you want to kow why), and it went to a lot of trouble to allow flow aggregation in traffic going to core switches _precisely_ to limit the growth of state in core switches, which would have traffic from millions of connections going through them. I have barely begun to even scratch the surface, here. Noel

3 years

Early Unix Growth: Number of “Installations” or Licences?

by steve jenkin

I’ve been wondering about the growth of Unix and if there’s any good data available. There’s the Early Unix Epoch, which probably ends with the Unix Support Group assuming the distribution role, plus providing / distributing their version of the code. Later there’s commercial Unix: System III and System V, I guess. BSD, until the lawsuit was resolved, required a Source code license, but their installation count is important in pre-Commercial Unix. Large licensees like SUN, HP & IBM (AIX) may not have published license counts for their versions - but then, were their derivatives “Unix” or something else? Warner Loch’s paper has data to around 1978 [below]. I’ve no idea where to find data for USG issued licences, or if the number of binary & source licences were ever reported in the Commercial Era by AT&T. I’ll not be the first person who’s gone down this road, but my Search Fu isn’t good enough to find them. Wondering if anyone on the list can point me at resources, even a bunch of annual reports. I don’t mind manually pulling out the data I’m interested in. But why reinvent the wheel if the work is already done? steve =============== numbers extracted from Warner Loch’s paper. <https://papers.freebsd.org/2020/FOSDEM/losh-Hidden_early_history_of_Unix.fi…> 2nd Edn June 1972 10 installations 3rd Edn February 1973 16 4th Edn November 1973 >20, or 25 July 74 CACM paper "Unix Time Sharing System” after which external interest exploded 6th Edn 1975 ??? 7th Edn March 1978 600+, >300 inside Bell System, "even more have been licensed to outside users” =============== -- Steve Jenkin, 0412 786 915 (+61 412 786 915) PO Box 38, Kippax ACT 2615, AUSTRALIA mailto:sjenkin@canb.auug.org.au http://members.tip.net.au/~sjenkin

3 years

Re: forgotten versions

by jnc＠mercury.lcs.mit.edu

> From: Dan Cross > I believe that's actually a menu Hence the "erroneous _impression_" (emphasis added). I'm curious as to how they decided which models to run which editions on. Although V4 _ran_ on the /45, split I+D wasn't supported - for user or kernel - until V6. (I'm assuming a number of things - both in the kernel, and applications - started hitting the 64KB limit, which led to its support.) Speaking of split I+D, there's an interesting little mystery in V6 that at one point in time I thought involved split I+D - but now that I look closely, apparently not. The mystery involves a 'tombstone' in the V6 buf.h: #define B_RELOC 0200 /* no longer used */ I had created (in my mind) an explanation what this is all about - but now that I look, it's probably all wrong! My explanation involves the slightly odd layout of the kernel in physical memory, with split I+D; data below the code, at physical 0. This actually makes a lot of sense; it means the virtual address of any data (e.g. a buffer) is the same as its physical address (needed for DMA). It does require the oddness of 'sysfix', to invert the order of code+data in the system binary, plus odd little quirks in the assembler startup (e.g. copying the code up to make room for BSS). So I thought that B_RELOC was a hangover from a time, at the start of split I+D, when data _wasn't_ at physical 0, so a buffer's virtual and phsyical addresses differed. But that must be wrong (at least in any simple way). B_RELOC was in buf.h as of V4 - the first kernel version in C - with no split I+D. So my theory has to be wrong. However, I am unable to find any code in the V4 kernel which uses it! So unless someone who remembers the very early PDP-11 kernel can enlighten us, its purpose will always remain a mystery! Noel

3 years

Re: forgotten versions

by jnc＠mercury.lcs.mit.edu

> From: Paul Ruizendaal > [c] Fifth Edition UNIX PDP-11/40 June 1974 > [d] Sixth Edition UNIX PDP-11/45 May 1975 > [e] Seventh Edition UNIX PDP-11/70 January 1979 This table gives an erroneous impression of which versions supported which PDP-11 models. 4th Edition supported only the /45; 5th Edition added support for the /40; and the /70 appeared in 6th edition. Noel

3 years

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

TUHS