More from Yost below.
My purpose in relating this was to point out that the original unix
implementation choices were mostly fine; they just had to be tweaked a
bit. Clearly an independent implementation such as in Linux would veer
off in a different direction, done in a different era and with different prior
experience. I was a bit surprised that Bruce didn't make this same
tweak to cblock size but no way of knowing his reasons now.
Begin forwarded message:
From: Dave Yost
Subject: Re: [TUHS] 386BSD released
Date: July 16, 2021 at 9:21:53 AM PDT
To: Bakul Shah
Plz forward this
thanks
This was in early 1983 or late 1982.
We got the serial driver to go 19200 out and 9600 in.
I did 2 things in the Fortune Systems 68k serial driver:
• hand-coded asm pseudo-DMA, suggested by Robert P Warnock III
• cblock size 128 bytes instead of 8, count ’em, 8.
From Lyons,
https://cs3210.cc.gatech.edu/r/unix6.pdf
<https://cs3210.cc.gatech.edu/r/unix6.pdf>
the unix v6 serial driver used a clist of cblocks, like this:
The pseudo-DMA interrupt handler was a function made up of a few hand-coded 68k
instructions, entered into C code as hex data. That code transferred one byte into or out
of a cblock, and at the end of the cblock it grabbed the next cblock from a queue and rang
the “doorbell” hardware interrupt, which caused a “software interrupt” at lower priority
for further processing. Rob put the doorbell into the architecture with a couple of gates
on the board because he was well aware of this software interrupt trick, which was already
used in bsd. For some reason I didn’t look at the bsd code, probably because Rob’s
explanation was lucid and sufficient.
I once had occasion to mention this, and specifically the relaxing of the draconian 8
byte cblock size, to Dennis Ritchie. He said, sure, why not, the 8 byte cblock size was
just a neglected holdover from early days.
This approach was just an interrupt version of what I had proposed to Rick Kiessig as a
first project at Fortune Systems: to get a 30x speed up when writing to the Fortune
Systems memory-mapped character display hardware. I had done the same thing a few years
earlier in Z80 in C code in a serial CRT terminal. It’s simple and obvious: make the inner
loop do as little as possible. The most primitive operation needs to be a block operation,
not a byte-at-a-time operation.