> From: Will Senn
> $c
> 0177520: ~signal(016,01) from ~sysinit+034
> 0177542: ~sysinit() from ~main+010
> 0177560: _main() from start+0104
> If this means it got signal 16... or 1 from the sysinit call (called
> from main)
I'm not sure that interpretation is correct. I think that trace shows signal()
being called from sysinit().
On V6, signal() was a system call which one could use to set the handlers for
signals (or set them to be ignored, or back to the default action). In. 2.11
it seems to be a shim layer which provides the same interface, but uses
the Berserkly signal system interface underneath:
https://www.tuhs.org/cgi-bin/utree.pl?file=2.11BSD/include/signal.hhttps://www.tuhs.org/cgi-bin/utree.pl?file=2.11BSD/man/cat3/signal.0
So maybe the old binary for kermit is still trying to use the (perhaps
now-removed) signal system call?
Noel
> From: Lars Brinkhoff
> the Dover printer spooler was written using Snyder's C compiler
I'm not sure if that's correct. I don't remember with crystal clarity all the
details of how we got files to the Dover, but here's what I recall (take with
1/2 a grain of salt, my memory may have dropped some bits). To start with,
there were different paths from the CHAOS and TCP/IP worlds. IIRC, there was a
spooler on the Alto which ran the Dover, and the two worlds had separate paths
to get to it.
>From the CHAOS world, there was a protocol translation which ran on whatever
machine had the AI Lab's 3Mbit Ethernet interface - probably MIT-AI's
CHAOS-11? If you look at the Macro-11 code from that, you should see it - IIRC
it translated (on the fly) from CHAOS to EFTP, the PUP prototocol which the
spooler ran 'natively'.
>From the IP world, IIRC, Dave Clark had adapted his Alto TCP/IP stack (written
in BCPL) to run in the spooler alongside the PUP software; it included a TFTP
server, and people ran TFTP from TCP/IP machines to talk to it. (IP access to
the 3Mbit Ethernet was via another UNIBUS Ethernet interface which was plugged
into an IP router which I had written. The initial revision was in Macro-11; a
massive kludge which used hairy macrology to produce N^2 discrete code paths,
one for every pair of interfaces on the machine. Later that was junked, and
replaced with the 'C Gateway' code.)
I can, if people are interested, look on the MIT-CSR machine dump I have
to see how it (a TCP/IP machine) printed on the Dover, to confirm that
it used TFTP.
I don't recall a role for any PDP-10 C code, though. I don't think there was a
spooler anywhere except on the Dover's Alto. Where did that bit about the
PDP-10 spooler in C come from, may I enquire? Was it a CMU thing, or something
like that?
Noel
> My unscientific survey of summer students was that they either came
> from scouts, or were people working on advanced degrees in college.
Not all high-school summer employees were scouts (or scout equivalents -
kids who had logins on BTL Unix machines). I think in particular of Steve
Johnson and Stu Feldman, who eventually became valued permanent employees.
The labs also hired undergrad summer employees. I was one.
Even high-school employees could make lasting contributions. I am
indebted to Steve for a technique he conceived during his first summer
assignment: using macro definitions as if they were units of associative
memory. This view of macros stimulated previously undreamed-of uses.
Doug
I'm running 211bsd pl 431 in SimH on FreeBSD. I've got networking
working on a tap interface both inbound and outbound. I still have a few
issues hanging around that are bugging me, but I'll eventually get to
them. One that is of concern at the moment is kermit. It is in the
system under /usr/new/kermit. When I call it, I get:
kermit
Bad system call - core dumped
I don't see core anywhere and if I did, I'd need to figure out what to
do with it anyway (mabye adb), but I'm wondering if anyone's used kermit
successfully who is on pl 431 or knows what's going on?
Thanks,
Will
--
GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF
I've always been intrigued with regexes. When I was first exposed to
them, I was mystified and lost in the greediness of matches. Now, I use
them regularly, but still have trouble using them. I think it is because
I don't really understand how they work.
My question for y'all has to do with early unix. I have a copy of
Thompson, K. (1968). Regular expression search algorithm. Communications
of the ACM, 11(6), 419-422. It is interesting as an example of
Thompson's thinking about regexes. In this paper, he presents a
non-backtracking, efficient, algorithm for converting a regex into an
IBM 7094 (whatever that is) program that can be run against text input
that generates matches. It's cool. It got me to thinking maybe the way
to understand the unix regex lies in a careful investigation into how it
is implemented (original thought, right?). So, here I am again to ask
your indulgence as the latecomer wannabe unix apprentice. My thought is
that ed is where it begins and might be a good starting point, but I'm
not sure - what say y'all?
I also have a copy of the O'Reilly Mastering Regular Expressions book,
but that's not really the kind of thing I'm talking about. My question
is more basic than how to use regexes practically. I would like to
understand them at a parsing level/state change level (not sure that's
the correct way to say it, but I'm really new to this kind of lingo).Â
When I'm done with my stepping through the source, I want to be able to
reason that this is why that search matched that text and not this text
and why the search was greedy, or not greedy because of this logic here...
If my question above isn't focused or on topic enough, here's an
alternative set to ruminate on and hopefully discuss:
1. What's the provenance of regex in unix (when did it appear, in what
form, etc)?
2. What are the 'best' implementations throughout unix (keep it pre 1980s)?
3. What are some of the milestones along the way (major changes, forks,
disagreements)?
4. Where, in the source, or in a paper, would you point someone to
wanting to better understand the mechanics of regex?
Thanks!
Will
--
GPG Fingerprint: 68F4 B3BD 1730 555A 4462 7D45 3EAA 5B6D A982 BAAF
> 1. What's the provenance of regex in unix (when did it appear, in what form, etc)?
> 2. What are the 'best' implementations throughout unix (keep it pre1980s)?
> 3. What are some of the milestones along the way (major changes, forks, disagreements)?
The editor ed was in Unix from day 1. For the necessarily tiny
implementation, Ken discarded various features
from the ancestral qed. Among the casualties was alternation
in regular expressions. It has never fully returned.
Ken's original paper described a method for simulating all paths
of a nondeterministic finite automaton in parallel, although he
didn't describe it in these exact terms. This meant he had to
keep track of up to n possible states, where n is the number of
terminal symbols in the regular expression.
"Computing Reviews" published a scathing critique of the paper:
everyone knows a deterministic automaton can recognize regular
expressions with one state transition per input character; what
a waste of time to have to keep track of multiple states! What the
review missed was that the size of the DFA can be exponential in n.
For one-shot use, as in an editor, it can take far longer to construct
the DFA than to run it.
This lesson came home with a vengeance when Al Aho wrote egrep,
which implemented full regular expressions as DFA's. I happened
to be writing calendar(1) at the same time, and used egrep to
search calendar files for dates in rather free formats for today
and all days through the next working day. Here's an example
(egrep interprets newline as "|"):
(^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*1)([^0123456789]|$)
(^|[ (,;])((\* *)0*1)([^0123456789]|$)
(^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*2)([^0123456789]|$)
(^|[ (,;])((\* *)0*2)([^0123456789]|$)
(^|[ (,;])(([Aa]ug[^ ]* *|(08|8)/)0*3)([^0123456789]|$)
(^|[ (,;])((\* *)0*3)([^0123456789]|$)
Much to Al's chagrin, this regular expression took the better
part of a minute to compile into a DFA, which would then run in
microseconds. The trouble was that the DFA was enormously
bigger than the input--only a tiny fraction of the machine's
states would be visited; the rest were useless. That led
him to the brilliant idea of constructing the machine on
the fly, creating only the states that were pertinent to
the input at hand. That innovation made the DFA again
competitive with an NFA.
Doug
This topic is still primarily UNIX but is getting near the edge of COFF, so
I'll CC there if people want to follow up.
As I mentioned to Will, during the time Research was doing the work/put out
their 'editions', the 'releases' were a bit more ephemeral - really a set
of bits (binary and hopefully matching source, but maybe not always)
that become a point in time. With 4th (and I think 5th) Editions it was a
state of disk pack when the bits were copies, but by 6th edition, as Noel
points out, there was a 'master tape' that the first site at an
institution received upon executing of a signed license, so the people at
each institution (MIT, Purdue, CMU, Harvard) passed those bits around
inside.
But what is more, is what Noel pointed out, we all passed source code and
binaries between each other, so DNA was fairly mixed up [sorry Larry - it
really was 'Open Source' between the licensees]. Sadly, it means some
things that actually were sourced at one location and one system, is
credited sometimes credited from some other place the >>wide<< release was
in USG or BSD [think Jim Kulp's Job control, which ended up in the kernel
and csh(1) as part in 4BSD, our recent discussions on the list about
more/pg/less, the different networking changes from all of MIT/UofI/Rand,
Goble's FS fixes to make the thing more crash resilient, the early Harvard
ar changes - *a.k.a.* newar(1) which became ar(1), CMU fsck, e*tc*.].
Eventually, the AT&T Unix Support Group (USG) was stood up in Summit, as I
understand it, originally for the Operating Companies as they wanted to use
UNIX (but not for the licenses, originally). Steve Johnson moved from
Research over there and can tell you many more of the specifics.
Eventually (*i.e.* post-Judge Green), distribution to the world moved from
MH's Research and the Patent Licensing teams to USG and AT&T North Carolina
business folks.
That said, when the distribution of UNIX moved to USG in Summit, things started
to a bit more formal. But there were still differences inside, as we have
tried to unravel. PWB/TS and eventually System x. FWIW, BSD went
through the same thing. The first BSD's are really the binary state of the
world on the Cory 11/70, later 'Ernie.' By the time CSRG gets stood
up because their official job (like USG) is to support Unix for DARPA, Sam
and company are acting a bit more like traditional SW firms with alpha/beta
releases and a more formal build process. Note that 2.X never really
went through that, so we are all witnessing the wonderful efforts to try to
rebuild early 2.X BSD, and see that the ephemeral nature of the bits has
become more obvious.
As a side story ... the fact is that even for professional SW houses, it
was not as pure as it should be. To be honest, knowing the players and
processes involved, I highly doubt DEC could rebuild early editions of VMS,
particularly since the 'source control' system was a physical flag in
Cutler's office.
The fact is that the problem of which bits were used to make what other
bits was widespread enough throughout the industry that in the mid-late 80s
when Masscomp won the bid to build the system that Nasa used to control the
space shuttle post-Challenger, a clause of the contract was that we have
put an archive of the bits running on the build machine ('Yeti'), a copy of
the prints and even microcode/PAL versions so that Ford Aerospace (the
prime contractor) could rebuild the exact system we used to build the
binaries for them if we went bankrupt. I actually, had a duplicate of that
Yeti as my home system ('Xorn') in my basement when I made some money for a
couple of years as a contract/on-call person for them every time the
shuttle flew.
Anyway - the point is that documentation and actual bits being 100% in sync
is nothing new. Companies work hard to try to keep it together, but
different projects work at different speeds. In fact, the 'train release'
model is what is usually what people fall into. You schedule a release of
some piece of SW and anything that goes with it, has to be on the train or
it must wait for the next one. So developers and marketing people in firms
argue what gets to be the 'engine' [hint often its HW releases which are a
terrible idea, but that's a topic for COFF].
> From: Warner Losh
> 8 I think was the limit.
IIRC, you could use longer names than that (in C), but external references
only used the first 7 (in C - C symbols had a leading '_' tacked on; I used to
know why), 8 (in assembler).
> Could that cause this error?
Seems unlikely - see below.
> The error comes from lookloc. This is called for external type
> relocations. It searches the local symbol table for something that
> matches the relocation entry. This error happens when it can't find
> it...
Someone who actually looked at the source:
https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/ld.c
instead of just guessing. Give that man a star!
I spent a while looking at the code, trying to figure out i) how it works, and
ii) what's going wrong with that message, but I don't have a definitive
answer. The code is not super well commented, so one has to actually
understand what it's doing! :-)
It seems to my initial perusal that it maintains two symbol tables, one for
globals (which accumulates as each file is processed), and one for locals
(which is discarded/reset for each file). As Werner mentioned, the message
appears when a local symbol referenced in the relocation information in the
current file can't be found (in the local symbol table).
It's not, I think, simply due to too many local symbols in an input file -
there seems to be a check for that as it's reading the input file symbol
table:
if (lp >= &local[NSYMPR])
error(1, "Local symbol overflow");
*lp++ = symno;
*lp++ = sp;
although of course there could be a bug which breaks this check. It seems to
me that this is an 'impossible' error, one which can only happen due to i) a
bug in the loader (a fencepost error, or something), or ii) an error in the
input a.out file.
I don't want to spend more time on it, since I'm not sure if you've managed to
bypass the problem. If not, let me know, and we'll track it down. (This may
involve you addding some printf's so we have more info about the details.)
Noel
I finally munged lbforth.c (https://gist.github.com/lbruder/10007431) into
compiling cleanly on mostly-stock v7 with the system compiler (lbforth
itself does fine on 211BSD, but it needs a little help to build in a real
K&R environment).
Which would be nice, except that when it gets to the linker....
$ cc -o 4th forth.c
ld:forth.o: Local symbol botch
WTF?
How do I begin to debug this?
Adam
> From: Will Senn
> it finally clicked that it is just one (of many) bit buckets out there
> with the moniker v6. ... I am coming from a world where OS version
> floppy/cd/dvd images are copies of a single master ... These tape things
> could be snapshots of the systems they originate from at very different
> times and with different software/sources etc.
Well, sort of. Do remember that everyone with V6 had to have a license, which
at that point you could _only_ get from Western Electric. So every
_institution_ (which is not the same as every _machine_) had had to have had
dealings with them. However, once your institution was 'in the club', stuff
just got passed around.
E.g. the BBN V6 system with TCP/IP:
https://www.tuhs.org/cgi-bin/utree.pl?file=BBN-V6
I got that by driving over to BBN, and talking to Jack Haverty, and he gave us
a tape (or maybe a pack, I don't recall exactly). But we had a V6 license, so
we could do that.
But my particular machine, it worked just the way you described: we got our V6
from the other V6 machine in the Tech Sq building (RTS/DSSR), including not
only Bell post-V6 'leakage' like adb, but their local hacks (e.g. their TTY
driver, and the ttymod() system call to get to its extended features; the
ability to suspend selected applications; yadda, yadda). We never saw a V6
tape.
Noel