`quotes'
> rules used ... to create British spelling from an American
> English database often leave a lot to be desired.
Among the BUGS listed for spell(1) in v7 was "Britsh spelling was
done by an American".
Nevertheless, at least one British expat thanked me for spell -b.
He had been using the original "spell", and ignoring its reports
of British "misspellings". But, he said, long exposure to American
writing had infected his writing. Spell -b was a blessing, for
revealed where his usage wobbled between traditions.
> I am curious if anyone on the list remembers much
> about the development of the first spell checkers in Unix?
Yes, intimately. They had no relationship to the PDP 10.
The first one was a fantastic tour de force by Bob Morris,
called "typo". Aside from the file "eign" of the very most common
English words, it had no vocabulary. Instead it evaluated the
likelihood that any particular word came from a source with the
same letter-trigram frequencies as the document as a whole. The
words were then printed in increasing order of likelihood. Typos
tended to come early in the list.
Typo, introduced in v3, was very popular until Steve Johnson wrote
"spell", a remarkably short shell script that (efficiciently) looks
up a document's words in the wordlist of Webster's Collegiate
Dictionary, which we had on line. The only "real" coding he did
was to write a simple affix-stripping program to make it possible
to look up plurals, past tenses, etc. If memory serves, Steve's
program is described in Kernighan and Pike. It appeared in v5.
Steve's program was good, but the dictionary isn't an ideal source
for real text, which abounds in proper names and terms of art.
It also has a lot of rare words that don't pull their weight in
a spell checker, and some attractive nuisances, especially obscure
short words from Scots, botany, etc, which are more likely to
arise in everyday text as typos than by intent. Given the basic
success of Steve's program, I undertook to make a more useful
spelling list, along with more vigorous affix stripping (and a
stop list to avert associated traps, e.g. "presenation" =
pre+senate+ion"). That has been described in Bentley's "Programming
Pearls" and in http://www.cs.dartmouth.edu/~doug/spell.pdf.
Morris's program and mine labored under space constraints, so
have some pretty ingenious coding tricks. In fact Morris has
a patent on the way he counted frequencies of the 26^3 trigrams
in 26^3 byes, even though the counts could exceed 256. I did
some heroic (and probabilistic) encoding to squeeze a 30,000
word dictionary into a 64K data space."
Doug
Hi,
I found this paper by bwk referenced in the Unix manpages,
in v4 as: TROFF Made Trivial (unpublished),
in v5 as: TROFF Made Trivial (internal memorandom),
also in the v6 "Unix Reading List",
but not anymore in v7.
Anyone have a copy or a scan?
--
Leah Neukirchen <leah(a)vuxu.org> http://leah.zone
> From: Larry McVoy
> So tape I can see being more weird, but isn't raw disk just "don't put
> it in buffer cache"?
One machines/controllers which are capable of it, with raw devices DMA happens
directly into the buffers in the process (which obviously has to be resident
while the I/O is happening).
Noel
> From: Will Senn
> I don't quite no how to investigate this other than to pore through the
> pdp11/40 instruction manual.
One of these:
https://www.ebay.com/itm/Digital-pdp-Programming-Card-8-Pages/142565890514
is useful; it has a list of all the opcodes in numerical order; something none
of the CPU manuals have, to my recollection. Usually there are a flock of
these "pdp11 Programming Cards" on eBait, but I only see this one at the
moment.
If you do any amount of work with PDP-11 binary, you'll soon find yourself
recognizing the common instructions. E.g. MOV is 01msmr (octal), where 'm' is
a mode specifier, and s and r are source and destination register
numbers. (That's why PDP-11 people are big on octal; the instructions are easy
to read in octal.) More here:
http://gunkies.org/wiki/PDP-11_architecture#Operands
So 0127xx is a move of an immediate operand.
>> You don't need to mount it on DECTape drive - it's just blocks. Mount
>> it as an RK05 image, or a magtape, or whatever.
> I thought disk (RK05) and tape (magtape) blocks were different...
Well, you need to differentiate between DECtape and magtape - very different
beasts.
DECtape on a PDP-11 _only_ supports 256 word (i.e. 512 byte) blocks, the same
as most disks. (Floppies are an exception when it comes to disks - sort
of. The hardware supports 128/256 byte sectors, but the usual driver - not in
V6 or V7 - invisibly makes them look like 512-byte blocks.)
Magtapes are complicated, and I don't remember all the details of how Unix
handles them, but the _hardware_ is prepared to write very long 'blocks', and
there are also separate 'file marks' which the hardware can write, and notice.
But a magtape written in 512-byte blocks, with no file marks, can be treated
like a disk; that's what the V6 distribution tapes look like:
http://gunkies.org/wiki/Installing_UNIX_Sixth_Edition#Installation_tape_con…
and IIRC 'tp' format magtape tapes are written the same way, hardware-wise (so
they look just like DECtapes).
Noel
> From: Will Senn
> (e) UNIX assembler uses the characters $ and "*" where the DEC
> assemblers use "#" and "@" respectively.
Amusing: the "UNIX Assembler Reference Manual" says:
The syntax of the address forms is identical to that in DEC assemblers,
except that "*" has been substituted for "@" and "$" for "#"; the
UNIX typing conventions make "@" and "#" rather inconvenient.
What's amusing is that in almost 40 years, it had never dawned on me that
_that_ was why they'd made the @->*, etc change! "Duhhhh" indeed!
Interesting side note: the UNIX erase/kill characters are described as being
the same as Multics', but since Bell pulled out of the Multics project fairly
early, I wonder if they'd used it long enough to get '@' and '#' hardwired
into their fingers. So I recently has the thought 'Multics was a follow-on to
CTSS, maybe CTSS used the same characters, and that's how they got burned in'.
So I looked in the "CTSS Programmer's Guide" (2nd edition), and no, according
to it (pg. AC.2.02), the erase and kill characters on CTSS were '"' and
'?'. So, so much for that theory!
> (l) The names "_edata" and "_end" are loader pseudo variables which
> define the size of the data segment, and the data segment plus the bss
> segment respectively.
That one threw me, too, when I first started looking at the kernel!
I don't recall if I found documentation about it, or just worked it out: it is
in the UPM, although not in ld(1) like one might expect (at least, not in the
V6 UPM; although in V7:
http://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/man/man1/ld.1
it is there), but in end(3):
http://minnie.tuhs.org/cgi-bin/utree.pl?file=V6/usr/man/man3/end.3
Noel
Why does the first of these incantations not present text, but the
second does (word is a file)? Neither errors out.
$ <word | sed 20q
$ <word sed 20q
Thanks,
Will
--
GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF
> From: Clem Cole <clemc(a)ccc.com>
> IIRC Tom Lyons started a 370 port at Princeton and finished it at
> Amdahl. But I think that was using VM
Maybe this is my lack of knowledge of VM showing, but how did having VM help
you over running on the bare hardware?
Noel
https://en.wikipedia.org/wiki/Leonard_Kleinrock#ARPANET
``The first permanent ARPANET link was established on November 21, 1969,
between the IMP at UCLA and the IMP at the Stanford Research Institute.''
And thus from little acorns...
--
Dave Horsfall DTM (VK2KFU) "Those who don't understand security will suffer."
> From: Will Senn
> he is addressing an aspect that was not addressed in either of the
> manual's entries and is very helpful for making the translation between
> PDP-11 Macro Assembler and unix as.
I'm curious - what aspect was that?
Noel
> From: Will Senn <will.senn(a)gmail.com>
> To bone up on assembly language, Lions's commentary is exceptionally
> helpful in explaining assembly as it is implemented in V6. The manual
> itself is really thin
Err, which manual are you referring to there? Not the "UNIX Assembler
Reference Manual":
http://minnie.tuhs.org/cgi-bin/utree.pl?file=V6/usr/doc/as/as
I would assume, but the 'as(I)' page in the UPM?
Noel
> From: Will Senn
> I'm off to refreshing my pdp-11 assembly language skills...
A couple of things that might help:
- assemble mboot.s and 'od' the result, so when you see something that matches
in the dump of the 0th block, you can look back at the assembler source, to see
what the source looks like
- read the boot block into a PDP-11 debugger ('db' or 'cdb' on V6, 'adb' on
V7; I _think_ 'adb' was available on V7, if not, there are some BSD's that
have it) and use that to disassmble the code
Noel
> The 0th block does seem to contain some PDP-11 binary - a bootstrap of
> some sort. I'll look in more detail in a bit.
OK, I had a quick look, and it seems to be a modified version of mboot.s:
http://minnie.tuhs.org/cgi-bin/utree.pl?file=V6/usr/source/mdec/mboot.s
I had a look through the rest of the likely files in 'mdec', and I didn't find
a better match. I'm too lazy busy to do a complete dis-assembly, and work out
exactly how it's different, though..
A few observations:
000: 000407 000606 000000 000000 000000 000000 000000 000001
An a.out header, with the 0407 'magic' here performing its original intended
function - to branch past the header.
314: 105737 177560 002375
Some console I/O stuff - this two instruction loop waits for the input
ready bit to be set.
326: 042700 177600 020027 000101 103405 020027 000132 101002
More character processing - the first instruction clears the high bits of R0,
and the next two sets of two instructions compare the contents with two
characters (0101 and 0132), and branch.
444: 000207 005000 021027 000407 001004 016020
460: 000020 020006 103774 012746 137000 005007
This seems like the code that checks to see if the thing is an a.out file
(note the 'cmp *r0, $0407'), but the code is different from that code in
mboot.s; in that, the instruction before the 'clr r0' (at 0446 here) is a
'jsr', whereas in this it's an 'rts pc'. And the code after the 'cmp r0, sp'
and branch is different too. I love the '05007' - not very often you see
_that_ instruction!
502: 012700 177350 012701 177342 012711 000003 105711
Clearly the code at 'taper:' (TC11 version).
Noel
So, I came across this tape:
http://bitsavers.trailing-edge.com/bits/DEC/pdp11/dectape/TU_DECtapes/unix6…
I was curious what was on it, so I read the description at:
http://bitsavers.trailing-edge.com/bits/DEC/pdp11/dectape/TU_DECtapes.txt
UNIX1 PURDUE UNIX TAPES
UNIX2
UNIX4
UNIX6
HARBA1 HARVARD BASIC TAPE 1
HARBA2 HARVARD BASIC TAPE 2
MEGTEK MEGATEK UNIX DRIVER
RAMTEK RAMTEK UNIX DRIVER
Cool, sounds interesting, so I downloaded the unix6.dta file and fired
up simh - after some fiddling, I figured out that I could get a boot
prompt (is that actually from the tape?) if I:
set cpu 11/40
set en tc
att tc0 unix6.dta
boot tc0
=
At that point, I was stuck - the usual tmrk, htrk, and the logical
corollary tcrk didn't do anything except return me to the boot prompt.
I was thinking this was a sixth edition install tape of some sort, but
if it is, I'm not able to figure it out. I thought I would load the tape
into v7 and look at its content using tm or tp, but then I realized that
I didn't have a device set up for TU56 and even if I did, I didn't know
how to do a dir on a tape - yeah, I know, I will go read the manual(s)
in chagrin.
In the meantime, my question for y'all is similar to my other recent
questions, and it goes like this:
When you received an unmarked tape back in the day, how did you go about
figuring out what was on it? What was your process (open the box, know
by looking at it that it was an x rather than a y, load it into the tape
reader and read some bytes off it and know that it was a z, use unix to
read the tape using tm, tp, tar, dd, cpio or what, and so on)? What
advice would you give a future archivist to help them quickly classify
bit copies of tapes :).
Thanks,
Will
--
GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF
I don't think we had the Fourth Research Edition Unix Programmer's
Manual available in typeset form. I played a bit with the troff manual
pages on TUHS and managed to typeset it into PDF. You can find the PDF
document at https://dspinellis.github.io/unix-v4man/v4man.pdf.
I modernized the old shell scripts and corrected some minor markup
glitches through commits that are recorded on a GitHub repository:
https://github.com/dspinellis/unix-v4man. The process was surprisingly
smooth. The scripts for generating the table of contents and the
permuted index are based on the original ones. The few problems I
encountered in the troff source had to do with missing spaces after
requests, the ^F hyphenation character causing groff to complain, a
failure of groff to honor .li requests followed by a line starting with
a ., and two uses of a lowercase letter for specifying a font. I wrote
from scratch a script to typeset everything into one volume. I could
not find a shell script for typesetting the whole manual in any of the
Research Editions. I assume the process of running the typesetter was
so cumbersome, error prone, and time-consuming that it was manually
performed on a page-by-page basis. Correct me if I'm wrong here.
Diomidis Spinellis
It can be hard to visualise what is on a tape when you have no idea
what is on there.
Attached is a simple tool I wrote "back then", shamlessly copying an
idea by Paul Scorer at Leeds Poly (My video systems lecturer).
It is called tm (tape mark).
-Steve
> From: Arthur Krewat
> For anyone reading old tapes, I implore you to attempt to read data past
> the soft EOT ;)
The guy who read my tape does in fact do that; you'll notice my program has an
option for looking for data after the soft EOT.
Noel
> From: Will Senn
> I think I understand- the bytes that we have on hand are not device
> faithful representations, but rather are failthful representations of
> what is presented to the OS. That is, back in the day, a tape would be
> stored in various formats as would disks, but unix would show these
> devices as streams of bytes, and those are the streams of bytes are what
> have been preserved.
Yes and no.
To start with, one needs to differentiate three different levels; i) what's
actually on the medium; ii) what the device controller presented to the CPU;
and iii) what the OS (Unix in this case) presented to the users.
With the exception of magtapes (which had some semantics available through
Unix for larger records, and file marks, the details of which escape me - but
try looking at the man page for 'dd' in V6 for a flavour of it), you're correct
about what Unix presented to the users.
As to what is preserved; for disks and DECtapes, I think you are broadly
correct. For magtapes, it depends.
E.g. SIMH apparently can consume files which _represent_ magtape contents (i,
above), and which include 'in band' (i.e. part of the byte stream in the file)
meta-data for things like file marks, etc. At least one of the people who
reads old media for a living, when asked to read an old tape, gives you back
one of these files with meta-data in it. Here:
http://ana-3.lcs.mit.edu/~jnc/tech/pdp11/tools/rdsmt.c
is a program which reads one of those files and convert the contents to a file
containing just the data bytes. (I had a tape with a 'dd' save of a
file-system on it, and wanted just the file-system image, on which I deployed
a tool I wrote to grok 4.2 filesystems.)
Also, for disks, it should be remembered that i) and ii) were usually quite
different, as what was actually on the disk included thing like preambles,
headers, CRCs, etc, none of which the CPU usually could even see. (See here:
http://gunkies.org/wiki/RX0x_floppy_drive#Low-level_format
for an example. Each physical drive type would have its own specific low-level
hardware format.) So what's preserved is just an image of what the CPU saw,
which is, for disks and DECtapes, generally the same as what was presented to
the user - i.e. a pile of bytes.
Noel
> From: Will Senn
> So, I came across this tape:
> ...
> I was curious what was on it
'od' is your friend!
If you look here:
http://mercury.lcs.mit.edu/~jnc/tech/V6Unix.html#dumpf
there's a thing which is basically 'od' and 'dd' rolled in together, which
allows you to dump any block you want in a variety of formats (ASCII, 16-bit
words in octal [very useful for PDP-11 binary], etc). I wrote it under CygWin,
for Windows, but it only uses the StdIO library, and similar programs (e.g. my
usassembler) written that way work fine under Losenux.
Try downloading it and compiling it - if it doesn't work, please let me know;
it'd be worth fixing it so it does work on Linux.
> after some fiddling, I figured out that I could get a boot prompt (is
> that actually from the tape?)
The 0th block does seem to contain some PDP-11 binary - a bootstrap of some
sort. I'll look in more detail in a bit.
> I was thinking this was a sixth edition install tape of some sort, but
> if it is, I'm not able to figure it out.
>From what I can see, it's probably a tp-format tape: the 1st block contains
some filenames which I can see in an ASCII dump of it:
speakez/sbrk.s
dcheck.c
df.c
intel/as80.c
intel/optab.8080
> v7 and look at its content using tm or tp, but then I realized that I
> didn't have a device set up for TU56
You don't need to mount it on DECTape drive - it's just blocks. Mount it as
an RK05 image, or a magtape, or whatever.
> When you received an unmarked tape back in the day, how did you go about
> figuring out what was on it?
Generally there would have been some prior communication, and the person
sending it would have told you what it was (e.g. '800 bpi tar', or whatever).
> What advice would you give a future archivist to help them quickly
> classify bit copies of tapes :).
Like I said: "'od' is your friend!"!! :-)
Noel
Random memories, possibly wrong.
In 1977/78 I was at udel and had done a fair amount of work on unix but as
a lowly undergrad did not get to go to the Columbia Usenix meeting. Ed
Szurkowski of udel went. Ed was the grad student who did hardware design
for 11s for Autotote (another story) but also stood up a lot of the early
unix 11s at udel starting in 1976, starting with an 11/70. Mike Muus used
to come up and visit us at udel and Mike and Ed would try to ask questions
the other could not answer. Mike always had a funny story or two.
Ed later went to Bell Labs and I lost track of him.
The directions for the MTA were fairly clear: it listed a stop that you
under no circumstances should get off at, and if you did get off at, you
should not go up to the street, lest you never return. This was no joke.
Some places in NY were pretty hazardous in those days.
I *think* this was the meeting where Ken showed up with a bunch of
magtapes, and Ed claimed that, in Ken's word, they were "... found in the
street."
This part I remember well: Ed returning with two magtapes and our desire to
upgrade. We at udel, like many places, had done lots of our own mods to the
kernel, which we wanted to keep. So we ran a diff between trees, and I
wrote a merge with TECO and ed which got it all put together. I later
realized this was a very early form of 'patch', as it used patterns, not
line numbers, to figure out how to paste things back together. I really got
to love regex in those years.
Except for one file: the tools just would not merge them. Ed later realized
there was one key difference that we had not noticed, a missing comment,
namely, the Western Electric copyright notice ...
I'm kinda sorry that our "udel Unix" is lost to the great /dev/null, it
would be interesting to see it now.
ron
> From: Clem Cole
> stp is from the Harvard distribution.
The MIT PWB1 system I have has the source; the header says:
M. Ferentz
Brooklyn College of CUNY
September 1976
If it can't be found on TUHS, I can upload it.
No man page, though. :-(
Noel
Ralph Corderoy:
ed(1) pre-dates pipes. When pipes came along, stderr was needed, and
lots of new idioms were found to make use of them. Why didn't ed gain a
`filter' command to accompany `r !foo' and `w !bar'?
===
I sometimes wonder that too.
When I use `ed,' it is usually really qed, an extended ed
written by the late-1970s UNIX crowd here at U of T. (Rob
Pike, Tom Duff, David Tilbrook, and Hugh Redelmeier, I think.)
qed is something of a kitchen sink, full of clumsy programmability
features that I never use. The things that keep me using it are:
-- Multiple buffers, each possibly associated with a different
file or just anonymous
-- The ability to copy or move text (the traditional t and m
commands) between buffers as well as within one
-- The ability to send part or all of a buffer to a shell command,
to read data in from a shell command, or to send data out and
replace it with that from the shell command:
>mail user ...
<ps -ef
|tr a-z A-Z
I use the last quite often; it makes qed sort of a workbench for
manipulating and mining text. One can do the same with the shell
and temporary files, but using an editor buffer is much handier.
sam has similar abilities (but without all the needless programmability).
Were sam less clumsy to use in its non-graphical mode, I'd probably
have abandoned qed for sam.
Norman Wilson
Toronto ON (for real now)