On 12 Aug 2023, at 07:08, Warner Losh
<imp(a)bsdimp.com> wrote:
On Fri, Aug 11, 2023 at 3:05 AM Paul Ruizendaal <pnr(a)planet.nl> wrote:
Bill Joy of CSRG concluded that the BBN stack did not perform according to his
expectations. Note that CSRG was focused on usage over (thick) ethernet links, and BBN was
focused on usage over Arpanet and other wide-area networks (with much lower bandwidth, and
higher latency and error rates). He then in 1982 rewrote the stack to match the CSRG
environment, changing the design to use software interrupts instead of a kernel thread and
optimising the code (e.g. checksumming and fast code paths). It was a matter of debate how
new the code was, with the extremes being that it was written from scratch using the spec
versus it being mostly copied. Looking at it with a nearly 50 year distance, it seems in
between: small bits of surviving SCCS suggest CSRG starting with parts of BBN code
followed by rapid, massive modification; the end result is quite different but retained
the ‘mbuf’ core data structure and a BBN bug (off-by-one for OOB TCP segments).
When Kirk McKusick tells the story, UCB got a beta release (or early access) of the BBN
stack. UCB was supposed to add the socket interface to whatever was there. But Bill Joy
found it performed terribly (multiple seconds to connect sometimes, single digit kB over
10Mb media, etc). He optimized it to make it perform well. This was a combination of
rewriting chunks and tweaking other chunks, which matches your analysis of SCCS. When BBN
came back with their new, release ready stack Bill supposedly said something like 'no
thanks, we already got one that works way better.' This is why much of the structure
of the original BBN stack survived the rewrite: if there wasn't a big issue with
them, the design and mechanisms wound up being conserved by this effort. It was too much
work to move from mbuf to something else, and too little gain.
I tried to find a good link, but they are in his BSD history retrospective talks to
differing degrees. Sorry I don't have an exact reference.
Warner
===
UCB got a beta release (or early access) of the BBN
stack.
This is certainly true. There are four surviving BBN tapes in the CSRG archive, from
memory the first was from August 1981 and the last from early 1982. Again from memory, the
oldest SCCS entries for the rewrite are from Oct ’81. All of this is on Kirk’s DVD.
Indeed the first tape is early code, written to just interface with Arpanet IMP’s. Things
like routing are rudimentary or hard-coded, etc. This is all filled out by the time of the
last tape, which also includes a token ring driver (written by Noel Chiappa, if I remember
well).
There are two editions of Mike Muuss’ TCP-DIGEST mailing list with posts on performance
from Joy and Gurwitz respectively (both from late 1981):
https://groups.google.com/g/fa.tcp-ip/c/WNE_j4mAbAE/m/3nCB79uvNcUJ
https://groups.google.com/g/fa.tcp-ip/c/zfYZh-kRlMg/m/pl-5oLQtYxIJ
Jonathan Gray sent me a good link of one of Kirk’s talks that addresses this bit of
history:
https://youtu.be/DEEr6dT-4uQ?t=706
===
Maybe Kirk was using some hyperbole in that talk. As far as I can tell the main issues
were the below:
- BBN coded the checksum routine in C and it compiled badly. Even Joy's
hand-optimised version still took some 25% of CPU when traffic maxed out.
- The time-out constants were 2s and 5s. This makes sense for the Arpanet of the time. For
local ethernet this was changed to 0.2s and 0.5s.
- The BBN code used at lot of bit fields. This too compiled badly, and was later changed
to and/or with #define’d constants
- The BBN code took a very layered approach, often abstracting small bits of functionality
into a separate routine. Without compiler inlining and a somewhat slow VAX subroutine
mechanism this had a cost. Some of these functions where later changed to #define’s
instead of functions.
- Although Kirk singles out replacing the state machine with a big switch statement in the
above talk, I’m not sure this was a major performance boost. It is certainly the most
visible/recognisable change in the source though. Somehow, this seems to have become core
to the debate at the time (ref. the BBN talk at the Summer '84 Usenix conference).
Maybe I underestimate the impact of this aspect.
- The TCP management process ran as a kernel thread with normal scheduling. On a loaded
machine this meant that it could take seconds for this process to be scheduled again.
Changing this to a software interrupt mechanism (somewhat similar to the runrun flag,
although Joy appears to have credited VMS for the idea) made things more responsive and
avoided context switches.
Of all the above, it would seem to me that only the last point was fundamental to the
design. The other things appear relatively easy to fix.
===
Doug McIlroy wrote: "I recall expressions of surprise (dismay?) at the size of the
BSD internet code, but without it we'd have lost our place in the Unix
community."
An interesting question is how much of this size is unavoidable. Just counting file lines
(i.e. ‘wc -l’), 4.2BSD has some 2200 lines of TCP code (i.e. just the TCP part), the ‘82
BBN implementation 2400 lines, 8th edition had some 2400 lines, and the first version of
Plan9 some 2200 lines.
Some parts of TCP could perhaps be simplified (pseudo headers and ‘urgent' segments
come to mind), but not much. Maybe this is just the code size that it takes.
URP is smaller, but if I remember well it does not handle out-of-order packets or packet
duplicates (both of which do not occur in a (virtual) line switched context).
On the other hand, these sizes compare with 900 lines for the IL protocol in early Plan9.