Interesting. My "speak" program had a trivial lexer that
recognized literal tokens, many of which were prefixes
of others, by maximum-munch binary search in a list of
1600 entries. Entries gave token+translation+rewrite.
The whole thing fit in 15K.
Many years later I wrote a regex recognizer that special-cased
alternations of lots of literals. I believe Gnu's regex.c does
that, too. (My regex also supported conjunction and negation--
legitimate regular-language operations--implemented by
continuation-passing to avoid huge finite-state machines.)
We have here a case of imperfect communication in 1127. Had I
been conscious of the lex-explosion problem, I might have
thought of speak and put support for speak-like tables
into lex. As it happened, I only used yacc/lex once, quite
successfully, for a small domain-specific language.
Doug
Steve Johnson wrote:
I also gave up on lex for parsing fairly early. The problem was
reserved words. These looked like identifiers, but the state machine to
pick out a couple of dozen reserved words out of all identifiers was too
big for the PDP-11. When I wrote spell, I ran into the same problem.
I had some rules that wanted to convert plurals to singular forms that
would be found in the dictionary. Writing a rule to recognize .*ies
and convert the "ies" to "y" blew out the memory after only a handful of
patterns. My solution was to pick up words and reverse them before
passing them through lex, so I looked for the pattern "sei.*", converted
it to "y" and then reversed the word again. As it turned out, I only
owned spell for a few weeks because Doug and others grabbed it and ran
with it.
From the 2.11 BSD sources I understand that the PDP-11/70 MMU address
and data registers, KDSA and KDSD, start at 0172360 and 0172320
respectively [1]. Yet, when I read the register contents I don't get
what I would expect to see: increasing by 0200 memory values for KDSA
and the same constant value for KDSD [2]. I checked this by looking at
/dev/mem.
# od -o /dev/mem 0172360 | head -1
0172360 000002 000016 001403 012700 000400 000402 012700 000200
# od /dev/mem 0172320 | head -1
0172320 101016 005064 000026 005067 175456 016467 000006 175430
I get the same results when I examine the memory through SIMH:
sim> examine 172360
172360: 000002
sim> examine 172362
172362: 000016
sim> examine 172364
172364: 001403
sim> examine 172320
172320: 101016
sim> examine 172322
172322: 005064
The MMU kernel instruction registers, KISA and KISD, contain similarly
nonsensical values as do the registers located at a different memory
location (077320, 0772360) indicated in another source [3]. What am I
missing?
My goal is to access from the console the kernel's u area. According to
mem(4) and the symbols in /unix, this should be at address 0140000.
Indeed, accessing it through /dev/kmem I get the expected results for
e.g. u_comm and u_uid. However, I have been unable to find it in the
machine's physical memory, hence my question regarding the MMU's operation.
[1]
https://github.com/RetroBSD/2.11BSD/blob/master/usr/sys/pdpstand/M.s#L346
[2]
https://github.com/RetroBSD/2.11BSD/blob/master/usr/sys/pdpstand/M.s#L247
[3] https://gunkies.org/wiki/PDP-11_Memory_Management
Diomidis
This time looking into non-blocking file access. I realise that the term has wider application, but right now my scope is “communication files” (tty’s, pipes, network connections).
As far as I can tell, prior to 1979 non-blocking access did not appear in the Spider lineage, nor did it appear in the NCP Unix lineage. First appearance of non-blocking behaviour seems to have been with Chesson’s multiplexed files where it is marked experimental (an experiment within an experiment, so to say) in 1979.
The first appearance resembling the modern form appears to have been with SysIII in 1980, where open() gains a O_NDELAY flag and appears to have had two uses: (i) when used on TTY devices it makes open() return without waiting for a carrier signal (and subsequent read() / write() calls on the descriptor return with 0, until the carrier/data is there); and (ii) on pipes and fifo’s, read() and write() will not block on an empty/full pipe, but return 0 instead. This behaviour seems to have continued into SysVR1, I’m not sure when EAGAIN came into use as a return value for this use case in the SysV lineage. Maybe with SysVR3 networking?
In the Research lineage, the above SysIII approach does not seem to exist, although the V8 manual page for open() says under BUGS "It should be possible [...] to optionally call open without the possibility of hanging waiting for carrier on communication lines.” In the same location for V10 it reads "It should be possible to call open without waiting for carrier on communication lines.”
The July 1981 design proposals for 4.2BSD note that SysIII non-blocking files are a useful feature and should be included in the new system. In Jan/Feb 1982 this appears to be coded up, although not all affected files are under SCCS tracking at that point in time. Non-blocking behaviour is changed from the SysIII semantics, in that EWOULDBLOCK is returned instead of 0 when progress is not possible. The non-blocking behaviour is extended beyond TTY’s and pipes to sockets, with additional errors (such as EINPROGRESS). At this time EWOULDBLOCK is not the same error number as EGAIN.
It would seem that the differences between the BSD and SysV lineages in this area persisted until around 2000 or so.
Is that a fair summary?
- - -
I’m not quite sure why the Research lineage did not include non-blocking behaviour, especially in view of the man page comments. Maybe it was seen as against the Unix philosophy, with select() offering sufficient mechanism to avoid blocking (with open() the hard corner case)?
In the SysIII code base, the FNDELAY flag is stored on the file pointer (i.e. with struct file). This has the effect that the flag is shared between processes using the same pointer, but can be changed in one process (using fcntl) without the knowledge of others. It seems more logical to me to have made it a per-process flag (i.e. with struct user) instead. In this aspect the SysIII semantics carry through to today’s Unix/Linux. Was this semantic a deliberate design choice, or simply an overlooked complication?
> I am now writing code in assembly for the PDP-11. I remember reading
> somewhere that the output from "AS" (my caps) is a bit meagre. I can't find
> an option to produce a text listing. Is it possible from AS, using command
> options (I can't see one) or perhaps from "LD"?
>
> Paul
>
> *Paul Riley*
I had the same problem. As I was porting to a different mini I had to write a new assembler. As you have undoubtedly seen, early ‘as’ was written in assembler and not so easy to use as a base. Hence I used Richard’s Miller’s AS for the Interdata as a base (available on Tuhs):
https://www.tuhs.org/cgi-bin/utree.pl?file=Interdata732/usr/source/as
Later I discovered that the TUHS archive has source code for the original ‘as’ rewritten in C, a work by Roger Jaeger:
https://minnie.tuhs.org/Archive/Distributions/USDL/Mini-Unix/
Maybe adding a listing module to this version of ‘as’ is another possible route.
below...
On Thu, Jun 11, 2020 at 9:04 AM Paul Riley <pdr0663(a)icloud.com> wrote:
> Clem,
>
> Thanks for that. So this would compile on modern machines to a cross
> compiler for V6 also running on a modern machine? I note you say macro11,
> so not a Unix “as” style syntax, is that right?
>
Yes - the AT&T syntax was much simpler/less sugar than the DEC assembler.
But the differences are pretty easy to see. IIRC that assembler generates
DEC style linker objects and there is a companion linker that can create
DEC binary objects (*i.e.* 'obj' files) as well as traditional UNIX a.out
format. The entire tool suite was created originally to move code from
RT-11 to UNIX at Harvard and passed around the nascent USENIX community.
IIRC that version was forked from a BSD 2.x/NetBSD source repository and
folks were adding some fields/features in the DEC obj format that RSX
supported that RT-11 did not.
Go hunting and see what you find. My memory was that with the BSD 2.x
project, somebody added a DEC obj to UNIX binary (a.out) converter tool, so
that you could use ld(1) instead of using the DEC style linker that had
been included in the original.
It has been >>years<< since I was really familiar with any of this stuff.
A question about it came up last fall/winter on the simh mailing listing,
which is where I found the the URL.
FWIW: I offered the modern port, assuming you might want to run some of it
as a cross-systems on a newer OS with a modern compiler. But if you are
content running this on V6, then you might just want to go back to the
original. As I said, my memory is that's in the original USENIX Harvard
tape. All that should be Warner's archives if not other places on the
Internet.
Just remember that a big problem with the original code is that it will be
written in pre-'White Book' C (that many of us learned years ago - not
even ANSI of Second edition - this used Lesk's portable C library etc.).
It sometimes looks a little strange to modern eyes. Also if you go
looking, IIRC, someone at Harvard ported the DEC Macro RT-11 library to
UNIX v6. In the late 1970s, I remember tjk, Danny Klein, Tron McConnell
and I, plus some of the folks over in the bio-med group (whose names I have
forgotten) had to a number assembler codes that had been written for the
earlier RT-11 systems to Unix for one of the projects we had. Some of it
got re-written in C, but I do remember we managed to use the Harvard
assembler somehow for parts of it. If my memory is correct, early VMS and
messing with BLISS compatibility could have been mixed up in the project
somehow, but I've long forgotten the details of what we were doing at the
time.
Have fun.
Team,
I am now writing code in assembly for the PDP-11. I remember reading
somewhere that the output from "AS" (my caps) is a bit meagre. I can't find
an option to produce a text listing. Is it possible from AS, using command
options (I can't see one) or perhaps from "LD"?
Paul
*Paul Riley*
I'm seeding this URL to TUHS as one would expect them to be interested in
the work from Warren and friends. FWIW: I tried to browse their archives
and was not impressed (I couldn't find anything).
https://www.softwareheritage.org/
> Steve Johnson's position paper on optimising compilers may amuse you:
> https://dl.acm.org/doi/abs/10.1145/567532.567542
Indeed. This passage struck a particular chord:
"I contend that the class of applications that depend on, for example, loop
optimization and dead code elimination for their efficient solution is of
modest size, growing smaller, and often very susceptible to expression in
applicative languages where the optimization is built into the individual
applicative operators."
I don't know whether I saw that note at the time, but since then I've
come to believe, particularly in regard to C, that one case of dead-code
elmination should be guaranteed. That case is if(0), where 0 is the
value of a constant expression.
This guarantee would take the place of many--possibly even
most--ifdefs. Every ifdef is an ugly intrusion and a pain to read.
Syntactically it occurs at top level completely out of sync with the
indentation and flow of text. Conversion to if would be a big win.
Doug
Does anybody have any good resources on the history of the popularity of C?
I'm looking for data to resolve a claim that C is so prolific and
influential because it's so easy to write a C compiler.
Tyler
> It's another similar to the last two. I've uploaded a version to youtube until the conference has theirs ready. It's a private link, but should work for anybody that has it. Now that I've given my talk it's cool to share more widely.
> The link at the end is wrong. https://github.com/bsdimp/bsdcan2020-demos is the proper link.
> Please let me know what you think.
Watched it & liked it a lot!
I have one nit-pick in the section on early networking: BBN's VAX TCP did not allow the ‘/dev/net/host’ syntax. That particular semantic comes from UoI’s NCP Unix, where the 8-bit host number was encoded in the minor number of character special file ‘host’ - but it did not carry through to the BBN code.
Other systems used something similar. The Chaos network code made namei() break when it recognised the Chaos driver and left the remainder of the path for the driver to fetch & parse. I’m also being told that Greg Chesson experimented with using the given name of a Datakit channel device as the connection string for the switch, but that this approach was abandoned early on.
In my view, exposing the host names through integration in the Unix file name space makes a lot of conceptual sense, but it unfortunately falls down on the practicalities, with the host name set being hard to enumerate (it is large, distributed and not stable - even back then).
A question mark is hard pin-pointing the start of Unix networking to V4 / 1974. Yes, that is the earliest evidence we currently have. However, Sandy Fraser says that Spider came into operation in 1972 and it must have connected to something. Maybe that something was a lab-bench test setup, but it could have been a computer - maybe even one running Unix.
There is another candidate for earliest Unix networking as well. The tech memo’s from Heinz Lycklama include one on the Glance terminal. That memo includes a section on the network used, referencing a 1973 report by D.R. Weller, "A High-Speed I/O Loop Communication System for the DEC PDP-11 Computer”. That computer appears to be an 11/45 running Unix and the loop is not Spider (nor the Pierce loop discussed in 1970/71 BSTJ). I have an off-list question outstanding to better understand this.