On 2017-06-07 22:14, "Walter F.J. Mueller"<w.f.j.mueller(a)retro11.de> wrote:
> Hi,
>
> a few remarks on the feedback on the kernel panic after a 'here document' in tcsh.
>
> To Michael Kjörling question:
> > I'm curious whether the same thing happens if you try that in some
> > other shell? (Not sure how widely here documents were supported back
> > then, but I'm asking anyway.)
> And Johnny Billquist remark
> > Not sure if any of the other shells have this.
>
> 'here documents' are available and work fine in sh and csh.
> And are in fact used, examples
Ah. Thanks. Too lazy to check.
> To Michael Kjörling remark
> > The PC value in the panic report ("pc 161324") strikes me as high
> and Johnny Billquist remark
> > This is in kernel mode, and that is in the I/O page.
>
> 211bsd uses split I/D space and uses all 64 kB I space for code.
D'oh! Color me stupid. I should have thought of that.
> The top 8 kB are in fact the overlay area, and the crash happened
> in overlay 4 (as indicated by ov 4). With a simple
>
> nm /unix | sort | grep " 4"
>
> one gets
>
> 161254 t ~psignal 4
> 162302 t ~issignal 4
>
> so the crash is just 050 bytes after the entry point of psignal. So the
> PC address is fine and not the problem. For psignal look at
>
> http://www.retro11.de/ouxr/211bsd/usr/src/sys/sys/kern_sig.c.html#s:_psignal
>
> the crash must be one of the first lines. psignal is an internal kernel
> function, called from
>
> http://www.retro11.de/ouxr/211bsd/usr/src/sys/sys/kern_sig.c.html#xref:s:_p…
>
> and has nothing to do with the libc function psignal
>
> http://www.retro11.de/ouxr/211bsd/usr/man/cat3/psignal.0.html
> http://www.retro11.de/ouxr/211bsd/usr/src/lib/libc/gen/psignal.c.html
The libc function would be in user mode, so that one was pretty clear.
Ok. Digging through this a little for real then.
psignal gets called with a signal from the trap handler. The actual
signal is weird. It would appear to be 0160750, which would be -7704 if
I'm counting right. That does not make sense as a signal.
The psignal code pulls a value based on the signal number, which is the
line:
prop = sigprop[sig];
which uses the signal number as an index. With a random, weird signal
number, this access wherever that might end up. Which is when you get
the crash.
On my system, sigprop is at address 0012172, which, with a signal of
-7704 ends up at address 0173142, which by (un)luck happens to be in the
middle of the diagnostics bootstrap rom space. So I don't get a Unibus
timeout error, while you do. Probably because sigprop is at a slightly
different address in your kernel.
So, the real question is how trap can be calling psignal with such a
broken signal number.
I might dig further down that question another day. But unless you
already got this far, I might have saved you a few minutes of digging. I
did start looking into the trap code, which is in pdp/trap.c, but this
is not entirely straight forward. It goes through a bunch of things
trying to decide what signal to send, before actually calling psignal.
> To Johnny Billquist remark
> > Could you (Walter) try the latest version of 2.11BSD and see if you
> > still get that crash?
>
> very interesting that you see a core dump of tcsh rather a kernel panic.
Indeed.
> Whatever tcsh does, it should not lead to a kernel panic, and if it does,
> it is primarily a bug of the kernel. It looks like there are two issues,
> one in tcsh, and one in the kernel. I've a hunch were this might come from,
> but that will take a weekend or two to check on.
Agree that the kernel should not crash on this.
Also, tcsh should not really crash either, but it's a separate issue,
even though one might have triggered the other here.
But yes, there are two bugs in here.
If you can recreate the kernel crash on the latest version, that would
be good.
But it smells like trap.c have some path where it does not even set what
signal to deliver, and then calls psignal with whatever the variable i
got at the function start. Which would be some random stuff on the stack.
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt(a)softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
On 2017-06-08 22:17, Dave Horsfall<dave(a)horsfall.org> wrote:
>
> Just to diverge from this thread a little, it probably isn't all that
> remarkable that programming languages tend to reflect the hardware for
> which they were designed.
>
> Thus, for example, we have the C construct:
>
> do { ... } while (--i);
>
> which translated right into the PDP-11's "SOB" instruction (and
> reminiscent of FORTRAN's insistence that DO loops are run at least once
> (there was a CACM article about that once; anyone have a pointer to it?)).
>
> And of course the afore-mentioned FORTRAN, which really reflects the
> underlying IBM 70x architecture (shudder).
FORTRAN stopped running the loops at least once already with FORTRAN 77.
The last who insisted on running loops at least once was FORTRAN IV.
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt(a)softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
I learned the other day that array indexes in some languages start at 1
instead of 0. This seems to be an old trend that changed around the 70s?
Who started this? Why was the change made?
It seems to have come about around the same time as C, but interestingly
enough Lua is kinda in between (you can start an array at 0 or 1).
Smalltalk can probably have a 0 base index just by it's nature, but I
wonder whether that would work in a 40 year old interpreter.
> Basically, until C came along, the standard practice was for indices
> to start at 1. Certainly Fortran and Pascal did it that way.
Mercury Autocode used 0.
http://www.homepages.ed.ac.uk/jwp/history/mercury/manual/autocode/4.jpg
-- Richard
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Hi,
a few remarks on the feedback on the kernel panic after a 'here document' in tcsh.
To Michael Kjörling question:
> I'm curious whether the same thing happens if you try that in some
> other shell? (Not sure how widely here documents were supported back
> then, but I'm asking anyway.)
And Johnny Billquist remark
> Not sure if any of the other shells have this.
'here documents' are available and work fine in sh and csh.
And are in fact used, examples
/usr/adm/daily (a /bin/sh script)
su uucp << EOF
/etc/uucp/clean.daily
EOF
/usr/crash/why (a /bin/csh script)
adb -k {unix,core}.$1 << 'EOF'
version/sn"Backtrace:"n
$c
'EOF'
To Michael Kjörling remark
> The PC value in the panic report ("pc 161324") strikes me as high
and Johnny Billquist remark
> This is in kernel mode, and that is in the I/O page.
211bsd uses split I/D space and uses all 64 kB I space for code.
The top 8 kB are in fact the overlay area, and the crash happened
in overlay 4 (as indicated by ov 4). With a simple
nm /unix | sort | grep " 4"
one gets
161254 t ~psignal 4
162302 t ~issignal 4
so the crash is just 050 bytes after the entry point of psignal. So the
PC address is fine and not the problem. For psignal look at
http://www.retro11.de/ouxr/211bsd/usr/src/sys/sys/kern_sig.c.html#s:_psignal
the crash must be one of the first lines. psignal is an internal kernel
function, called from
http://www.retro11.de/ouxr/211bsd/usr/src/sys/sys/kern_sig.c.html#xref:s:_p…
and has nothing to do with the libc function psignal
http://www.retro11.de/ouxr/211bsd/usr/man/cat3/psignal.0.htmlhttp://www.retro11.de/ouxr/211bsd/usr/src/lib/libc/gen/psignal.c.html
To Johnny Billquist remark
> Could you (Walter) try the latest version of 2.11BSD and see if you
> still get that crash?
very interesting that you see a core dump of tcsh rather a kernel panic.
Whatever tcsh does, it should not lead to a kernel panic, and if it does,
it is primarily a bug of the kernel. It looks like there are two issues,
one in tcsh, and one in the kernel. I've a hunch were this might come from,
but that will take a weekend or two to check on.
With best regards, Walter
On 2017-06-06 04:00, Michael Kjörling <michael(a)kjorling.se> wrote:
>
> On 5 Jun 2017 16:12 +0200, from w.f.j.mueller(a)retro11.de (Walter F.J. Mueller):
>> I'm using 211bsd (Version 447) and found that a 'here document' in tcsh
>> leads to a kernel panic. It's absolutely reproducible on my system, both
>> when run it on my FPGA PDP-11 or in simh. Just doing
>>
>> tcsh
>> cat << EOF
> I'm curious whether the same thing happens if you try that in some
> other shell? (Not sure how widely here documents were supported back
> then, but I'm asking anyway.)
Not sure if any of the other shells have this. We're basically talking
csh, sh and ksh unless I remember wrong.
But it's a good question. If noone else have tried it by tomorrow, I
could check.
>> is enough, and I get
>>
>> ka6 31333 aps 147472
>> pc 161324 ps 30004
>> ov 4
>> cpuerr 20
>> trap type 0
>> panic: trap
>> syncing disks... done
>>
>> looking at the crash dump gives
>>
>> cd /etc/crash
>> ./why 4
>> Backtrace:
>> 0147372: _boot(05000,0100) from ~panic+072
>> 0147414: _etext(011350) from ~trap+0350
>> 0147450: ~trap() from call+040
>> 0147516: _psignal(0101520,0160750) from ~trap+0364
>> 0147554: ~trap() from call+040
>>
>> so the crash is in psignal, which is afaik the kernel internal
>> mechanism to dispatch signals.
> The PC value in the panic report ("pc 161324") strikes me as high, but
> 161324 octal is 58068 decimal, so it's not excessively so, and perhaps
> in line with what one might expect to see with a kernel pinned near
> top of memory. Are the offsets in the backtrace constant, i.e. does it
> always crash on the same code?
161324 is way high. This is in kernel mode, and that is in the I/O page.
Basically no code lives in the I/O page (some boot roms and hardware
diagnostics excepted). This smells like corrupted memory (pointer or
stack), or something else very funny.
> Not knowing what cpuerr 20 is specifically doesn't help, and at least
> http://www.retro11.de/ouxr/29bsd/usr/src/sys/sys/trap.c.html#n:112
> (which doesn't seem to be too far from what you are running) isn't
> terribly enlightening; CPUERR is simply a pointer into a memory-mapped
> register of some kind, as seen at
> http://www.retro11.de/ouxr/29bsd/usr/include/sys/iopage.h.html#m:CPUERR,
> and at least pdp11_cpumod.c from the simh source code at
> http://simh.trailing-edge.com/interim/pdp11_cpumod.c wasn't terribly
> enlightening, though of course I could be looking in entirely the
> wrong place.
Like others said - the cpu error register is documented in the processor
handbook.
020 means Unibus Timeout, which is consistent with trying to access
something in the I/O page, where there is no device configured to
respond to that address.
I just tried the same thing on a simh system here, and I do not get a
crash. This on 2.11BSD at patch level 449, running on an emulated 11/94.
I do however get tcsh to crash.
simh:/home/bqt> su -
Password:
erase, kill ^U, intr ^C
# tcsh
simh:/# cat << EOF
Illegal instruction - core dumped
#
Suspended (tty input)
simh:/home/bqt>
simh:/home/bqt> cat /VERSION
Current Patch Level: 448
Date: January 5, 2010
Yes, it says patch level 448, but it really is 449. This was the system
where I worked together with Steven when doing the 449 patch set, but I
never got around to actually updating the VERSION file itself.
Also, this was while running on the console.
Could you (Walter) try the latest version of 2.11BSD and see if you
still get that crash?
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt(a)softjar.se || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
Hi,
I'm using 211bsd (Version 447) and found that a 'here document' in tcsh
leads to a kernel panic. It's absolutely reproducible on my system, both
when run it on my FPGA PDP-11 or in simh. Just doing
tcsh
cat << EOF
is enough, and I get
ka6 31333 aps 147472
pc 161324 ps 30004
ov 4
cpuerr 20
trap type 0
panic: trap
syncing disks... done
looking at the crash dump gives
cd /etc/crash
./why 4
Backtrace:
0147372: _boot(05000,0100) from ~panic+072
0147414: _etext(011350) from ~trap+0350
0147450: ~trap() from call+040
0147516: _psignal(0101520,0160750) from ~trap+0364
0147554: ~trap() from call+040
so the crash is in psignal, which is afaik the kernel internal
mechanism to dispatch signals.
Questions:
1. has anybody seen this before ?
2. any idea what the reason could be ?
With best regards, Walter
> From: Jacob Ritorto
> Where might one find the list of trap_types
Look in:
http://minnie.tuhs.org/cgi-bin/utree.pl?file=2.11BSD/sys/pdp/scb.s
which maps from trap vector locations (built into the hardware; consult a
PDP-11 CPU manual for details) to trap type numbers, which are defined here:
http://minnie.tuhs.org/cgi-bin/utree.pl?file=2.11BSD/sys/pdp/trap.h
and handled here:
http://minnie.tuhs.org/cgi-bin/utree.pl?file=2.11BSD/sys/pdp/trap.c
> and cpuerrs?
That just prints the contents of the CPU Error Register; see an appropriate
PDP-11 CPU manual - 11/70, /44, /73, /83 or /84 for what all the bits mean.
Also the "KDJ11-A CPU Module User's Guide", which also documents it.
In theory, there's also a KDJ11-B UG, but it's not online. If anyone has one,
can we please get it scanned? Thanks!
Noel
> The people working on TCP/IP did know of the Spider work (like they knew of
> the Cambridge ring work), but it didn't really have any impact; it was a
> totally different direction than the one we were going in.
I'm aware of that, and I think it was the same the other way around. My
interest is tracing how the networking API of Unix developed in the very
early days, and that's were there is a link.
When I asked a few months back why Bell Labs did not jump onto the work
done at UoI, Doug observed that the lab's focus was on Datakit and that
triggered my interest.
>>>> it turns out that the TIU driver was in Warren's repo all along:
>
> V4?! Wow. I'd have never guessed it went that far back.
My current understanding is that Spider development began in 1969 and
that it was first operational in 1972. By '73/'74 it connected a dozen
computers at Murray Hill and Unix had gained basic network programs.
From Sandy Fraser's "Origins of ATM" video lecture I understand that the
Spider learnings included that using a mini to simulate a switch/router
was too slow and too costly, and that doing flow control inside the network
induced avoidable complexity (I guess Fraser/Cerf/Pouzin all learned that
lesson around the same time). The follow-on, custom designed Datakit switch
was to correct these issues.
Work started in 1974 and I guess that prototypes may have been available
around 1978 (when Spider was apparently switched off at Murray Hill).
By 1981 a multi-site Datakit network connected various Bell labs and by
1983 Datakit was introduced as a commercial service.
As to the Spider network API, it currently seems that it was relatively
simple: it exposed the switch as a group of character mode devices, with
the user program responsible for doing all protocol work. Interestingly,
Spider used a high speed DMA based I/O board (DR11-B), whereas the
Datakit switch was apparently connected to a low speed polled I/O board
(DR11-C).
I did not find the Datakit device driver(s) in the V7 source tree (only a
few references in tty.h), so it is hard to be sure of anything. However,
it seems that in V7 the Datakit switch was used as "a fancy modem" so to
speak, supporting the uucp software stack.
There is source for a Datakit driver in the V8 tree, but I currently
have no time to study that (and perhaps it is beyond my scope anyway).
All input and corrections much appreciated.
> From: Paul Ruizendaal
>>> The report I have is: "SPIDER-a data communication experiment"
>>> ...
>>> I think it can be public now, but doing some checks.
OK, that would be great to have online. I _think_ the hardcopy I have
(somewhere! :-) is that report, but my memory should not be trusted.
The people working on TCP/IP did know of the Spider work (like they knew of
the Cambridge ring work), but it didn't really have any impact; it was a
totally different direction than the one we were going in.
>>> it turns out that the TIU driver was in Warren's repo all along:
V4?! Wow. I'd have never guessed it went that far back.
>>> The code calls snstat()
>> The object code for snstat() is in libc.a in the dmr's V5 image.
>> Reconstructed, the source code is here:
>> ...
>> In short, snstat() is a modified stty call
Yes, I looked and found the original source, appended below.
>>> Could that be the tiu sys call (#45) in the sysent.c table for V4-V6?
I wonder if we'll ever be able to find a copy of the kernel code for that
tiu() system call. And I wonder what it did?
> [1] Oldest alarm() code I can find is in PWB1
> ...
> Either alarm existed in V5 and V6 .. or is was added after V6 was
> released, perhaps soon after. In the latter case the 'nfs' code that we
> have must be later than 1974
Remember, that source came from the MIT system, which is a modified PWB1.
So it's not surprising it's using PWB1 system calls.
Noel
--------
/ C interface to spider status call
.globl _snstat
.globl cerror
_snstat:
mov r5,-(sp)
mov sp,r5
mov 4(r5),r0
mov 6(r5),0f
mov 8(r5),0f+2
sys stty; 0f
bec 1f
jmp cerror
1:
clr r0
mov (sp)+,r5
rts pc
.data
0: .=.+6