So tape I can see being more weird, but isn't raw disk just "don't put it
in buffer cache"?
>From what I've been able to gather early tape in Unix was dicey, something
about the driver doing seek. Was there more to it than that?
On Tue, Nov 21, 2017 at 02:33:55AM +0000, Clem Cole wrote:
> It???s not so much that they don???t mix, it???s not quite the same. Some
> coprocessor ideas work really well into the Unix I/O model, others don't.
> Raw disk and tape I/O ala a PDP11 or VAX for instance is not easy on an
> IBM channel controller or a CDC PPU.
>
>
>
>
>
>
>
>
> On Mon, Nov 20, 2017 at 6:45 PM Larry McVoy <lm(a)mcvoy.com> wrote:
>
> > On Mon, Nov 20, 2017 at 06:43:28PM -0500, Ron Natalie wrote:
> > >
> > >
> > > > I get that PDP-11 and VAX used memory mapped I/O but was that somehow
> > > exposed above the device driver layer? If so, I missed that, because I
> > had
> > > no conceptual or technical problem with talking to an I/O
> > >
> > > > channel, it was pretty easy. And I suck at writing drivers.
> > >
> > > There's nothing that restricts a device driver to memory mapped I/O.
> > You
> > > do what ever you have to do to initiate the I/O. Even the x86's
> > originally
> > > used special instructions to start the I/O (in/out). The DENELCOR HEP
> > > supercomputer (we did this port around 1983) we had to bounce I/O
> > requests
> > > off a separate I/O processor different from where the kernel was running.
> > > Similar constucts were used on other machines.
> >
> > Yeah, that's what I thought. But other people were saying that I/O
> > processors and Unix didn't mix. I don't get that, seems like whatever
> > the model is is hidden under the driver, that's the whole point of the
> > driver design is it not?
> >
> --
> Sent from a handheld expect more typos than usual
--
---
Larry McVoy lm at mcvoy.comhttp://www.mcvoy.com/lm
> From: Charles Anthony
> Entry points are usually defined as "foo$bar", where "foo" is the
> segment name, and "bar" an entry point in the segment symbol table. I
> believe that the degerate case of "foo$" is treated as "foo$foo" by the
> shell.
So I'm curious about how this, and additional names, interact. (For those who
aren't familiar with Multics, a segment [file, sort of] can have multiple
names. This is sort of like 'hard links' in Unix, except that in Multics one
name, the "primary name" is very slightly preeminent. See here:
http://web.mit.edu/multics-history/source/Multics/mdds/mdd006.compout
page 2-5, for more, if you're interested.)
So if I have a segment with primary name 'foo', and additional names 'bar' and
'zap', and I say 'zap' to the Multics shell, I assume it does a call to
zap$zap, which finds the segment with the primary name 'foo', and calls the
'zap' entry therein?
> Multics rulez; UNIX droolz
Dude, you clearly have Very Large brass ones to send that to this list! :-)
Noel
Early on, when multics was understood to have
one big, segemented address space, it was expected
that PL/I name qualification ($) would serve to address
segments. I do not know whether that idea was
actually implemented.
Doug
My favorite reduction to absurdity was /bin/true. Someone decided we
needed shell commands for true and false. Easy enough to add a script that
said "exit 0" or exit 1" as its only line.
Then someone realized that the "exit 0" in /bin true was superfluous, the
default return was 0. /bin/true turned into an empty, yet executable, file.
Then the lawyers got involved. We got a version of a packaged UNIX (I
think it was Interactive Systems). Every shell script got twelve lines of
copyright/license boilerplate. Including /bin true.
The file had nothing but useless comment in it.
> Multics had some kind of `attach' and `detach' of I/O streams, well
> known to Ossanna, so perhaps dup(2), and a Thompson-shell syntax to go
> with it meant `>' was earmarked early on.
According to "The Evolution of the Unix Timesharing System", full path names
arrived later than I/O redirection, so by they time they needed a separator,
'>' and '<' were gone. '/' also has the advantage of being a non-shift
character!
Noel
PS: Re-reading that, I see that early Unix did not have an exec() call (as I
was just discussing); it was done in user mode, with normal read and write
calls.
> On Tue, Nov 28, 2017 at 2:36 PM, Paul Winalski <paul.winalski(a)gmail.com>
> wrote:
>
>> MS/DOS patterned its command line
>>
>> syntax after RT-11 and inherited the slash as a command option
>> introduction from there.
>
> Minor correction... To do a CDC style patient zero history ;-) RT11 took
> it from DOS/8, CP/M took it from RT11, then finally DOS-86 which became
> PC-DOS ney MS/DOS took it from CP/M.
I think Gary Kildall was very much into the PDP-8 when teaching at the
Naval Post Graduate School in the early 70's (doing the FORTRAN/8 compiler
for instance). Can't find the link now, but I read somewhere that his
work with the 8008 and 8080 was guided by the idea of having a PDP-8 like
machine in his home office. For CP/M's command syntax RT11 probably did
not come into it. I just had a quick glance through the CP/M 1.4 - 2.2
manuals, and I did not see slash options (or any other option character).
Microsoft bought QDOS as a base for PC-DOS/MS-DOS. The QDOS system calls
were done such that converting existing 8080 CP/M code with Intel's source
level 8080-to-8086 asm converter would generate the correct code. The FAT
file system was modeled after the one used by MS Disk BASIC for the 8086.
Not sure where the QDOS command line came from, other than CP/M. MS did a
lot of its early development on a PDP-10: perhaps that was an inspiration
too.
Sorry for getting off-Unix-topic...
> From: Doug McIlroy
> But if that had been in D space, it couldn't have been executed.
Along those lines, I was wondering about modern OS's, which I gather for
security reasons prevent execution of data, and prevent writing to code.
Programs which emit these little 'custom code fragments' (I prefer that term,
since they aren't really 'self-modifying code' - which I define as 'a program
which _changes_ _existing_ instructions) must have some way of having a chunk
of memory into which they can write, but which can also be executed.
> Where is the boundary between changing one instruction and changing them
> all? Or is this boundary a figment of imagination?
Well, the exec() call only overwrites existing instruction memory because of
the semantics of process address space in Unix - there's only one, so it has
to be over-written. An OS operating in a large segmented single-level memory
could implement an exec() as a jump....
BTW, note that although exec() in a single address-space OS is conventionally
something the OS does, this functionality _could_ be moved into the user
space, provided the right address space primitives were provided by the OS,
e.g. 'expand instruction space'. So the exec() code in user space would i)
find the executable, ii) see how much of each kind of memory it needs, iii)
get the OS to give it a block of memory/address space where the exec() code
can live while it's reading in the new code, iv) move itself there, v) use
standard read() calls to read the new image in, and then vi) jump to it.
Yes, it's probably simpler to implement it in the OS, but if one's goal is to
minimize the functionality in the kernel...
Noel
In case you missed it:
https://www.spectrum.ieee.org/view-from-the-valley/tech-history/silicon-rev…
It is important to keep the conversations alive and not to file your
memory boxes away in the attic. Thanks for sharing what you know and
especially for making your documents and bits available.
-will
--
GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF
OK, we were discussing terminals this morning with some other old guys. If
I knew the answer to this I've forgotten.
Every PDP-11 UNIX I ever used had the console KL-11 as /dev/tty8. The
question is why. My guess is that for some reason
an 8 terminal multiplexor (DZ-11?) was stuck at tty0, but why?
> From: Larry McVoy
>> they aren't really 'self-modifying code' - which I define as 'a program
>> which _changes_ _existing_ instructions
> Isn't that how dtrace works?
I'm not familiar with dtrace(), but if it modifies some other routine's code,
then it would not be "self" modifying, right?
Oh, another category, sort of like biological viruses (which are in a grey
zone between 'alive' and not): the PDP-11 paper tape bootstrap:
http://ana-3.lcs.mit.edu/~jnc/tech/pdp11/bootloader.mac
in which the program's own code _is_ modified - but not by program
instructions, but by data on the paper tape it is reading in. It's
entertainingly convoluted (the copy above should be well-enough commented to
make it pretty easy to understand what's going on).
Noel
We lost J.F. Ossanna on this day in 1977; he had a hand in developing
Unix, and was responsible for "roff" and its descendants. Remember him,
the next time you see "jfo" in Unix documentation.
--
Dave Horsfall DTM (VK2KFU) "Those who don't understand security will suffer."
> From: Kevin Bowling
> The earliest stuff may be covered by Novell's grant of early code.
> ...
> Would be fun to run *ix on any of them.
Alas, the Bell port of Unix to the /370 needs that underlying layer of code
from IBM, and that's probably not going to escape. Too bad, it would be pretty
cool.
Noel
I am curious about how the Harvard Architecture relates to Unix,
historically. If the Harvard Architecture is predicated on the
separation of code from data in order to prevent self-modifying code (my
interpretation), then it would seem to me to be somewhat at odds with a
Unix philosophy of extreme abstraction (code, data, it's all 0's and
1's, after all). In my naive understanding, the PDP-11 itself, with the
Unibus and apparently agnostic ISA seem to summarily reject the Harvard
Architecure...
My question is - was there tension around Harvard and Von Neumann
architectures in Unix circles and if so, how was it resolved?
Thanks,
Will
--
GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF
There are some little bits in the public V7 source code that
suggest that it had support for Datakit, but that it was scrubbed
from the public release:
There is a Datakit header file:
http://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/include/dk.h
and Datakit state bits are defined in 'sys/tty.h':
http://minnie.tuhs.org/cgi-bin/utree.pl?file=V7/usr/include/sys/tty.h
Does anyone know if this assumption (Datakit support in V7) is correct?
Perhaps more specific, was there a remote login program for V7/Datakit,
of for V6/Spider? For V8/Datakit there was 'dcon', but perhaps this
was built on earlier programs.
(I'm aware of 'cu' of course, but that does not support Datakit or
Spider).
Being able to login to another host on the network seems so useful
that it is hard to believe that a precursor to 'dcon' did not exist for
V6 and/or V7.
Thanks for explaining that. I think it may be for 10th edition though.
I searched for ipcopen() and 'gated' in the 8th edition source and could
not find them. In that search I did find a few bits that strongly suggest that
IP over Datakit was what was used in late '85 (when dmr posted about this).
In /usr/src/cmd/inet/READ_ME there is an example configuration that seems
to match with dmr's example. In that file an IP over a Datakit channel
appears to be configured.
(see http://chiselapp.com/user/pnr/repository/v8unix/artifact/6d09b05c7f06a2cc?l…)
The program 'dkipconfig' sets up a circuit and pushes the IP discipline on
the stream, both on the local end and on the remote end. It sets fixed local
and remote addresses, much the same as with a 'slip' line.
(see http://chiselapp.com/user/pnr/repository/v8unix/artifact/6c5f3267b58721a6?l…)
On Sat, Nov 25, 2017 at 4:50 PM, William Cheswick <ches at cheswick.com> wrote:
>
> Nope, not IP over Datakit, as I recall. It was quite interesting to work at
> a site (Bell Labs) where there were two distinct network technologies.
>
> [--snip--]
>
> This library was socks about seven years before socks, originally written by
> Presotto and Howard Trickey. The relay program was originally called
> “gated”, but that wouldn’t do after a while. I renamed it “proxyd”, and
> that is the first use of “proxy" in this context that I am aware of.
>
> If you were on AT&T’s intranet and wanted to connect externally, you ripped
> out the entire socket dance and replaced it with an ipcopen call. I also
> distributed common modified clients, like ptelnet, pftp, pfinger, etc.
>
> I still have all this code, and I suppose it ought to go in an archival
> repository. I can’t imagine that AT&T/Lucent/Alcatel/Nokia would care at
> this point. Anyone want it?
>
I'm trying to figure out how tcp/ip networking worked in 8th edition Unix.
I'm starting from dmr's paper about streams (http://cm.bell-labs.co/who/dmr/st.html) the V8 man pages (http://man.cat-v.org/unix_8th/3/) and browsing the source code (tarball here http://www.tuhs.org/Archive/Distributions/Research/Dan_Cross_v8/)
In the below I use 'socket' to mean a file descriptor bound to a network connection. My current understanding is like this:
- The hardware interface is exposed as a character device; for tcp/ip only ethernet is supported. Directly accessing this device reads/writes ethernet frames.
- One can push an 'ip' module (only) onto an ethernet device; this module also handles ARP. Once this is done IP messages are queued to the virtual ip devices, /dev/ipXX. The device minor number XX equals the protocol number, i.e. the ip packets are demultiplexed into separate virtual devices. IP packets from multiple ethernet cards all end up on the same virtual ip devices. I'm not sure if one can still read/write to the ethernet device after pushing the ip module, but I think you can, creating a raw IP interface so to say.
- On /dev/ip6 one can push a TCP module. The TCP module handles the TCP protocol and demultiplexes incoming traffic to the virtual /dev/tcpXX devices. On /dev/ip17 one can push a UDP module. The UDP module handles the UDP protocol and demultiplexes incoming traffic to the virtual /dev/udpXX devices. Not sure wether the ip6 and ip17 devices can still be read/written after pushing these disciplines.
- There are 100 udp devices, /dev/updXX. To open a UPD socket, one opens an unused udp device (through linear search). This socket accepts binary commands ('struct upduser') through the read()/write() system calls. There is a command to set the local port (effectively 'bind') and a comment to also set the foreign address and port (effectively 'bind+connect'). As long as the socket is not connected actual datagrams are preceded by a command header with the address/port information (effectively 'sendto'/'recvfrom'). Once the socket is connected, it is no longer possible to send further commands, but each write/read is a datagram. For udp sockets it is not possible to specify the local address: it is chosen by the system to match with the given foreign address.
- There are 100 tcp devices /dev/tcpXX. Initial connection is always over an odd numbered device. To open a TCP socket, one opens an unused tcp device (through linear search). This socket accepts binary commands ('struct tcpuser') through the read()/write() system calls. There is a command to actively connect (effectively 'connect' with optional 'bind'), and a command to passively listen (effectively 'bind'+'listen'). If the connect command is sent, one can read one more response block and then the socket becomes a regular tcp socket. If the listen command is sent, one can read multiple response blocks, one for each new client (effectively 'accept'). Those response blocks contain a device number for the new client connection, i.e. one has to subsequently open device /dev/tcpXY to talk to the client. This number is always even, i.e. locally initiated tcp connections are over odd numbered tcp devices, and remotely initiated connections are over even numbered tcp devices - not sure what the significance of this is.
- The above seems to be modeled on the Datakit setup, where the network is exposed as 520 virtual devices, one for each channel, as /dev/dk/dkXXX. These channels than also seem to accept binary command blocks through the read()/write() interface, with a 'connect' type command changing the connection into a data only channel.
Anybody on the list with 8th edition experience who can confirm that the above understanding is about correct?
Paul
> From: Will Senn <will.senn(a)gmail.com>
> I am curious about how the Harvard Architecture relates to Unix,
> historically. If the Harvard Architecture is predicated on the
> separation of code from data in order to prevent self-modifying code (my
> interpretation)
That's not the 'dictionary' definition, which is 'separate paths for
instructions and data'. But let's go with the 'no self-modifying code' one for
the moment.
The thing is that self-modifying code is pretty much an artifact of the dawn
of computers, before the economics of gates moved from that of tubes, to
transistors, and also before people understood how important good support for
subroutines was. (This latter is a reference to how Whirlwind did subroutines,
with self-modifying code.) Once people had index registers, and lots of
registers in general, self-modifying code (except for a few small, special
hacks like bootstraps which had to fit in tiny spaces) became as dead as the
dodo.
It's just a Bad Idea.
> then it would seem to me to be somewhat at odds with a Unix philosophy
> of extreme abstraction (code, data, it's all 0's and 1's, after all).
The people who built Unix were fundamentally very practical. Self-modifing
code is not 'practical'. (And note that Unix from V4:
http://minnie.tuhs.org/cgi-bin/utree.pl?file=V4/nsys/ken/text.c
onward has support for pure text - for practical reasons).
> the PDP-11 itself, with the Unibus and apparently agnostic ISA seem to
> summarily reject the Harvard Architecure...
You could say that of a zillion computers. The only recent computer I can
think of offhand with separate instruction and data paths was the AMD 42K
(nice chip, I used it in a product we built at Proteon). They had separate
ports for instructions and data purely for performance reasons. (Our card had
a pathway which allowed the CPU to write the instruction memory, needed during
booting, obviously; the details as to how we did it escape me now.)
> From: Jon Steinhart
> For all intents and purposes instructions were separate from data from
> the PDP 11/70 on.
s/70/45/.
And the other -11 memory management (as on the /40, /23, etc) does allow for
execute-only 'segments' (they call them 'pages' in the later versions of the
manual, but they're not) - again, separating code from data. Unix used this
for shared pure texts.
And note that those machines with separate I+D space don't meet the dictionary
definition either, because they only have one bus from the CPU to memory,
shared between data and instruction fetches.
Noel
> From: Doug McIlroy
> Optimal code for bitblt (raster block transfers) in the Blit
Interesting case. I'm not familiar with BitBLT codes, do they actually modify
the existing program, or rather do they build small custom ones? Only the
former is what I was thinking of.
Noel
>From the discussion of self-modifying code:
>> Optimal code for bitblt (raster block transfers) in the Blit
>
> Interesting case. I'm not familiar with BitBLT codes, do they actually modify
> the existing program, or rather do they build small custom ones? Only the
> > former is what I was thinking of.
>
It built small custom fragments of code. But if that had been in D
space, it couldn't have been executed.
>> Surely JIT compiling must count as self-modifying code.
>
> If it does, then my computer just runs one program from when I turn it
> on. It switches memory formats and then is forever extending itself and
> throwing chunks away.
Exactly. That is the essence of stored-program computers. The exec
system call is self-modification with a vengeance.
Fill memory-and-execute is the grandest coercion I know. What is
data one instant is code the next.
It's all a matter of viewpoint and scale. Where is the boundary
between changing one instruction and changing them all? Or is
this boundary a figment of imagination?
Doug
> From: "Ron Natalie"
> Every PDP-11 UNIX I ever used had the console KL-11 as /dev/tty8.
> The question is why.
Blast! I have this memory of reading an explanation for that somewhere - but
I cannot remember what it was, or where! I've done a grep through my hoard of
Unix documents, looking for "tty8", but no hits.
Noel
> The thing is that self-modifying code is pretty much an artifact of the dawn
> of computers, [...]
>
> It's just a Bad Idea.
Surely JIT compiling must count as self-modifying code.
Optimal code for bitblt (raster block transfers) in the Blit
Repeat, slightly modified, of a previous post that got
shunted to the attachment heap.
> I am curious if anyone on the list remembers much
> about the development of the first spell checkers in Unix?
Yes, intimately. They had no relationship to the PDP 10.
The first one was a fantastic tour de force by Bob Morris,
called "typo". Aside from the file "eign" of the very most common
English words, it had no vocabulary. Instead it evaluated the
likelihood that any particular word came from a source with the
same letter-trigram frequencies as the document as a whole. The
words were then printed in increasing order of likelihood. Typos
tended to come early in the list.
Typo, introduced in v3, was very popular until Steve Johnson wrote
"spell", a remarkably short shell script that (efficiently) looks
up a document's words in the wordlist of Webster's Collegiate
Dictionary, which we had on line. The only "real" coding he did
was to write a simple affix-stripping program to make it possible
to look up plurals, past tenses, etc. If memory serves, Steve's
program is described in Kernighan and Pike. It appeared in v5.
Steve's program was good, but the dictionary isn't an ideal source
for real text, which abounds in proper names and terms of art.
It also has a lot of rare words that don't pull their weight in
a spell checker, and some attractive nuisances, especially obscure
short words from Scots, botany, etc, which are more likely to
arise in everyday text as typos than by intent. Given the basic
success of Steve's program, I undertook to make a more useful
spelling list, along with more vigorous affix stripping (and a
stop list to avert associated traps, e.g. "presenation" =
pre+senate+ion"). That has been described in Bentley's "Programming
Pearls" and in http://www.cs.dartmouth.edu/~doug/spell.pdf.
Morris's program and mine labored under space constraints, so
have some pretty ingenious coding tricks. In fact Morris has
a patent on the way he counted frequencies of the 26^3 trigrams
in 26^3 bytes, even though the counts could exceed 255. I did
some heroic (and probabilistic) encoding to squeeze a 30,000
word dictionary into a 64K data space, without severely
affecting lookup time.
Doug
> From: "Nelson H. F. Beebe"
> The PDF URLs for bstj.bell-labs.com no longer work, and the ones for
> www.alcatel-lucent.com ... now redirect to an HTML page.
With any luck, someone scraped them before they went.
I've gotten in the habit of scraping all the Web content I look at, since it
has (as above) a distressing tendency to vapourize.
Noel