Nelson H. F. Beebe:
P.S. Jay was the first to get Steve Johnson's Portable C Compiler,
pcc, to run on the 36-bit PDP-10, and once we had pcc, we began the
move from writing utilities in Pascal and PDP-10 assembly language to
doing them in C.
======
How did that C implementation handle ASCII text on the DEC-10?
Were it a from-scratch UNIX port it might make sense to store
four eight- or nine-bit bytes to a word, but if (as I sense it
was) it was C running on TOPS-10 or TOPS-20, it would have had
to work comfortably with DEC's convention of five 7-bit characters
(plus a spare bit used by some programs as a flag).
Norman Wilson
Toronto ON
More from Yost below.
My purpose in relating this was to point out that the original unix
implementation choices were mostly fine; they just had to be tweaked a
bit. Clearly an independent implementation such as in Linux would veer
off in a different direction, done in a different era and with different prior
experience. I was a bit surprised that Bruce didn't make this same
tweak to cblock size but no way of knowing his reasons now.
> Begin forwarded message:
>
> From: Dave Yost
> Subject: Re: [TUHS] 386BSD released
> Date: July 16, 2021 at 9:21:53 AM PDT
> To: Bakul Shah
>
> Plz forward this
> thanks
>
> This was in early 1983 or late 1982.
>
> We got the serial driver to go 19200 out and 9600 in.
>
> I did 2 things in the Fortune Systems 68k serial driver:
> • hand-coded asm pseudo-DMA, suggested by Robert P Warnock III
> • cblock size 128 bytes instead of 8, count ’em, 8.
>
> From Lyons,
> https://cs3210.cc.gatech.edu/r/unix6.pdf <https://cs3210.cc.gatech.edu/r/unix6.pdf>
> the unix v6 serial driver used a clist of cblocks, like this:
>
>
> The pseudo-DMA interrupt handler was a function made up of a few hand-coded 68k instructions, entered into C code as hex data. That code transferred one byte into or out of a cblock, and at the end of the cblock it grabbed the next cblock from a queue and rang the “doorbell” hardware interrupt, which caused a “software interrupt” at lower priority for further processing. Rob put the doorbell into the architecture with a couple of gates on the board because he was well aware of this software interrupt trick, which was already used in bsd. For some reason I didn’t look at the bsd code, probably because Rob’s explanation was lucid and sufficient.
>
> I once had occasion to mention this, and specifically the relaxing of the draconian 8 byte cblock size, to Dennis Ritchie. He said, sure, why not, the 8 byte cblock size was just a neglected holdover from early days.
>
> This approach was just an interrupt version of what I had proposed to Rick Kiessig as a first project at Fortune Systems: to get a 30x speed up when writing to the Fortune Systems memory-mapped character display hardware. I had done the same thing a few years earlier in Z80 in C code in a serial CRT terminal. It’s simple and obvious: make the inner loop do as little as possible. The most primitive operation needs to be a block operation, not a byte-at-a-time operation.
Doug McIlroy asks about the Rosetta Stone table relating TOPS-20
commands to Unix command in my ``Unix for TOPS-20 Users'' document:
>> I was puzzled, though, by the Unix command "leave", which is
>> not in the manuals I edited, nor is it in Linux. What does
>> (or did) it do?
I reread that 1987 document this morning, and found a few small
mistakes, but on the whole, I still agree with what I wrote 34 years
ago, and I'm pleased that almost everything there about Unix still
applies today.
I confess that I had forgotten about the TOPS-20 ALERT command and its
Unix equivalent, leave. As Doug noted, leave is not in Linux systems,
but it still exists in the BSD world, in DragonFlyBSD, FreeBSD,
NetBSD, OpenBSD, and their many derivatives. From a bleeding-edge
FreeBSD 14 system, I find
% man leave
LEAVE(1) FreeBSD General Commands Manual LEAVE(1)
NAME
leave – remind you when you have to leave
SYNOPSIS
leave [[+]hhmm]
DESCRIPTION
The leave utility waits until the specified time, then reminds you that
you have to leave. You are reminded 5 minutes and 1 minute before the
actual time, at the time, and every minute thereafter. When you log off,
leave exits just before it would have printed the next message.
...
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: beebe(a)math.utah.edu -
- 155 S 1400 E RM 233 beebe(a)acm.org beebe(a)computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------
Clem Cole asks:
>> Did you know that before PCC the 'second' C compiler was a PDP-10
>> target Alan Snyder did for his MIT Thesis?
>> [https://github.com/PDP-10/Snyder-C-compiler]
I was unaware of that compiler until sometime in the 21st Century,
long after our PDP-10 was retired on 31-Oct-1990.
The site
https://github.com/PDP-10/Snyder-C-compiler/tree/master/tops20
supplies a list of some of Snyder's files, but they don't match
anything in our TOPS-20 archives of almost 180,000 files.
I then looked into our 1980s-vintage pcc source tree and compared
it with a snapshot of the current pcc source code taken three
weeks ago. The latter has support for these architectures
aarch64 hppa m16c mips64 pdp11 sparc64
amd64 i386 m68k nova pdp7 superh
arm i86 mips pdp10 powerpc vax
and the pdp10 directory contains these files:
CVS README code.c local.c local2.c macdefs.h order.c table.c
All 5 of those *.c files are present in our TOPS-20 archives. I then
grepped those archives for familiar strings:
% find . -name '*.[ch]' | sort | \
xargs egrep -n -i 'scj|feldman|johnson|snyder|bell|at[&]t|mit|m.i.t.'
./code.c:8: * Based on Steve Johnson's pdp-11 version
./code2.c:19: * Based on Steve Johnson's pdp-11 version
./cpp.c:1678: stsym("TOPS20"); /* for compatibility with Snyder */
./local.c:4: * Based on Steve Johnson's pdp-11 version
./local2.c:4: * Based on Steve Johnson's pdp-11 version
./local2.c:209: case 'A': /* emit a label */
./match.c:2: * match.c - based on Steve Johnson's pdp11 version
./optim.c:318: * Turn 'em into regular PCONV's
./order.c:5: * Based on Steve Johnson's pdp-11 version
./pftn.c:967: * fill out previous word, to permit pointer
./pftn.c:1458: register commflag = 0; /* flag for labelled common declarations */
./pftn2.c:1011: * fill out previous word, to permit pointer
./pftn2.c:1502: register commflag = 0; /* flag for labelled common declarations */
./reader.c:632: p2->op = NOASG p2->op; /* this was omitted in 11 & /6 !! */
./table.c:128: " movei A1,1\nZN", /* ZN = emit branch */
./xdefs.c:13: * symbol table maintainence
Thus, I'm confident that Jay's work was based on Steve Johnson's
compiler, rather than Alan Snyder's.
Norman Wilson asks:
>> ...
>> How did that C implementation handle ASCII text on the DEC-10?
>> Were it a from-scratch UNIX port it might make sense to store
>> four eight- or nine-bit bytes to a word, but if (as I sense it
>> was) it was C running on TOPS-10 or TOPS-20, it would have had
>> to work comfortably with DEC's convention of five 7-bit characters
>> (plus a spare bit used by some programs as a flag).
>> ...
Our pcc compiler treated char* as a pointer to 7-bit ASCII strings,
stored in the top 35 bits of a word, with the low-order bit normally
zero; a 1-bit there meant that the word contained a 5-digit line
number that some compilers and editors would report. Of course, that
low-order non-character bit meant that memset(), memcpy(), and
memmove() had somewhat dicey semantics, but I no longer recall their
specs.
kcc later gave us access to the PDP-10's 1- to 36-bit byte
instructions.
For text processing, 5 x 7b + 1b bits matched the conventions for all
other programming languages on the PDP-10. When it came time to
implement NFS, and exchange files and data with 32-bit-word machines,
we needed the ability to handle files of 4 x 8b + 4b and 9 x 8b (in
two 36-bit words), and kcc provided that.
The one's-complement 36-bit Univac 1108 machines chose instead to
store text in a 4 x 9b format, because that architecture had
quarter-word load/store instructions, but not the general variable
byte instructions of the PDP-10. Our campus had an 1108 at the
University of Utah Computer Center, but I chose to avoid it, because
it was run in batch mode with punched cards, and never got networking.
By contrast, our TOPS-20, BSD, RSX-11, SunOS, and VMS systems all had
interactive serial-line terminals, and there was no punched card
support at all.
-------------------------------------------------------------------------------
- Nelson H. F. Beebe Tel: +1 801 581 5254 -
- University of Utah FAX: +1 801 581 4148 -
- Department of Mathematics, 110 LCB Internet e-mail: beebe(a)math.utah.edu -
- 155 S 1400 E RM 233 beebe(a)acm.org beebe(a)computer.org -
- Salt Lake City, UT 84112-0090, USA URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------
>> -r is weird because it enables backwards reading, but only as
>> limited by count. Better would be a program, say revfile, that simply
>> reads backwards by lines. Then tail p has an elegant implementation:
>> revfile p | head | revfile
> tail -n can be smarter in that it can simply read the last K bytes
> and see if there are n lines. If not, it can read back further.
> revfile would have to read the whole file, which could be a lot
> more than n lines! tail -n < /dev/tty may never terminate but it
> will use a small finite amount of memory.
Revfile would work the same way. When head has seen enough
and terminates, revfile will get SIGPIPE and stop. I agree that,
depending on scheduling and buffer management, revfile might
read more than tail -n, but it wouldn't read the whole of a
humongous file.
Doug
> Arguably ancient PDP-10 operating systems like ITS, WAITS, TENEX were
> somewhat "open" and "free", but it's not a clear cut case.
The open source movement was a revival of the old days of SHARE and other
user groups.
SAP, the SHARE assembly program for the IBM 704, was freely available--with
source code--to all members of the SHARE user group. I am not aware of any
restrictions on redistribution.
Other more specialized programs were also freely available through SHARE. In
particular, Fortran formatted IO was adopted directly from a SHARE program
written by Roy Nutt (who also wrote SAP and helped write Fortran I).
Bell Labs freely distributed the BESYS operating system for the IBM 704.
At the time (1958) no operating system was available from IBM.
IBM provided source code for the Fortran II compiler. In the
fashion of the time, I spent a memorable all-night session with
that code at hand, finding and fixing a bizarre bug (a computed GOTO
bombed if the number of branches was 74 mod 75) with a bizarre cause
(the code changed the index-register field in certain instructions on the
fly--inconsistently). And there was no operating system to help, because
BESYS swapped itself out to make room for the compiler.
Doug
This somewhat stale note was sent some time ago, but was ignored
because it was sent from an unregistered email address.
> And if the Unix patriarchs were perhaps mistaken about how useful
> "head" might be and whether or not it should have been considered
> verboten.
Point well taken.
I don't know which of head(1) and sed(1) came first. They appeared in
different places at more or less the same time. We in Research
declined to adopt head because we already knew the idiom "sed 10q".
However one shouldn't have to do related operations in unrelated ways.
We finally admitted head in v10.
Head was independently invented by Mike Lesk. It was Lesk's
program that was deemed superfluous.
Head might not have been written if tail didn't exist. But, unlike head,
tail strayed from the tao of "do one thing well". Tail -r and tail -f are
as cringeworthy as cat -v.
-f is a strange feature that effectively turns a regular file into a pipe
with memory by polling for new data, A clean general alternative
might be to provide an open(2) mode that makes reads at the current
file end block if some process has the file open for writing.
-r is weird because it enables backwards reading, but only as
limited by count. Better would be a program, say revfile, that simply
reads backwards by lines. Then tail p has an elegant implementation:
revfile p | head | revfile
Doug
>> -f is a strange feature that effectively turns a regular file into a pipe
>> with memory by polling for new data, A clean general alternative
>> might be to provide an open(2) mode that makes reads at the current
>> file end block if some process has the file open for writing.
> OTOH, this would mean adding more functionality (read: complexity)
> into the kernel, and there has always been a general desire to avoid
> pushing <stuff> into the kernel when it can be done in userspace. Do
> you really think using a blocking read(2) is somehow more superior
> than using select(2) to wait for new data to be appended to the file?
I'm showing my age. tail -f antedated select(2) and was implemented
by alternately sleeping and reading. select(2) indeed overcomes that
clumsiness.
> I'll note, with amusement, that -r is one option which is *NOT* in the
> GNU version of tail. I see it in FreeBSD, but this looks like a
> BSD'ism.
-r came from Bell Labs. This reinforces the point that the ancients
had their imperfections.
Doug