Wow. This is a terrible paper. It's full of of incorrect, unsubstiantiated,
and meaningless statements. It's so bad in my opinion that I'm not sorry
that I dropped my ACM membership a while ago. These folks really needed an
editor. The paper annoys me so much that I've decided to write a lengthy
note to the authors which I'll post here once I'm done.
Jon
>From what I can gather the only way to reasonably examine the disassembly
of a program in the early days of Unix was adb. Is this true? Was there a
way to easily produce a full disassembly? I'll confess to being fairly
ignorant of adb use since I always had dbx or the equivalent available.
The first tool I'm aware of to purposefully/primarily produce a full
listing is MIPS dis (ca. 1986?) but there must have been something before
that for other systems, no?
-Henry
On 7/1/21, scj(a)yaccman.com <scj(a)yaccman.com> wrote:
> When PCC came along and started running on 32-bit machines, I started
> thinking about algorithms for optimization. A problem that I had no
> good solution for could be illustrated by a simple piece of code:
>
> x = *p;
>
> y = *q;
>
> q gets changed
>
> *q = z;
>
> The question is, do I need to reload x now because q might have been
> changed to point to the same place as p?
Yes, this is a very well-known problem in scalar optimization in
compiler engineering. It's called pointer disambiguation and is part
of the more general problem of data flow analysis. As you observed,
getting it wrong can lead to very subtle and hard-to-debug correctness
problems. In the worst case, one has to throw out all current data
flow analysis of global and currently active local variables and start
over. In your example, the statement "*q = z" may end up forcing the
compiler to toss out all data flow information on x and z (and maybe p
and q as well). If q could possibly point to x and x is in a
register, the assignment forces x to be reloaded before its next use.
Ambiguous pointers prohibit a lot of important optimizations. This
problem is the source of a lot of bugs in compilers that do aggressive
optimizations.
Fortunately a bit of knowledge of just how "q gets changed" can rescue
the situation. In strongly-typed languages, for example, if x and z
are different data types, we know the assignment of z through q can't
affect x. We also know that the assignment can't affect x if x and z
have disjoint scopes.
The 'restrict' keyword in C pointer declarations was added to help
mitigate this problem.
Some compilers also have a command line option that allows the user to
say, "I solemnly swear that I won't do this sort of thing".
-Paul W.
> From: Paul Riley <paul(a)rileyriot.com>
>> (I wrote a note, BITD, explaining how all this worked; I'll upload it
>> to the CHWiki when I get a chance.)
Now here:
https://gunkies.org/wiki/PDP-11_C_stack_operation
along with simple examples of args and auto variables, which are both
referenced via the FP.
> As a non-C consumer of printf, should I point R5 at some space for a
> stack and call printf in the same manner as the C example I cited?
Not necessary to do anything with R5 (you can leave it blank); the only things
a PDP-11 C routine needs are:
- a good stack
- the arguments, and return point, on the top of the stack
csv will set up the frame pointer, making no assumptions about the old
contents of R5 - see the source:
http://ana-3.lcs.mit.edu/~jnc/tech/unix/lib/csv.s
although it does save the old R5 contents, and restore them on exit.
Noel
> From: Paul Riley
> I want to use printf from an assembly language program, in V6. ... the
> substitutional arguments for printf are pushed onto the stack in reverse
> order, then the address of the string, and then printf is called. After
> this, 6 is added to the stack pointer.
This is all down to the standard C environment / calling sequence on the
PDP-11 (at least, in V6 C; other compilers may do it differently). Calls to
printf() are in no way special.
Very briefly, there's a 'frame pointer' (R5) which points to the current stack
frame; all arguments and automatics are relative to that. A pair of special
routines, csv and cret (I couldn't find the source on TUHS, but it happens to
be here:
http://ana-3.lcs.mit.edu/~jnc/tech/unix/lib/csv.s
if you want to see it), set up and tear down the frame on entry/exit to that
routine. The SP (R6) points to a blank location on the top (i.e. lower address;
PDP-11 stacks grow down) of the stack.
To call a subroutine, the arguments are pushed, the routine is called (which
pushes the return PC), and on return (which pops the return PC), the arguments
are discarded by the caller.
(I wrote a note, BITD, explaining how all this worked; I'll upload it to the
CHWiki when I get a chance.)
> I assume that the printf routine pops the address of the string off the
> stack, but leaves the other values on the stack
No, all C routines (including printf()) leave the stack more or less alone,
except for CSV/CRET hackery, allocating space for automatic variables on
routine entry (that would be at L1; try looking at the .s for a routine with
automatic variables), and popping the return PC on exit. The exception to this
is the stuff around calling _enother_ routine (sketched above).
Another exception is alloca() (source here:
http://ana-3.lcs.mit.edu/~jnc/tech/unix/lib/alloca.s
again, couldn't find it in TUHS), which allocated a block of memory on
the stack (automatically discarded when the routine which called alloca()
returns). Note that since all automatic variables and incoming arguments
are relative to the FP, alloca is _really_ simple; justs adjusts the
SP, and it's done.
> What troubles me is that the stack pointer is not decremented before the
> first mov, in the example below. Is this some C convention? I would
> assume that the first push in my example would overwrite the top of the
> stack.
That's right; that's because in the C runtime environment, the top location
on the stack is a trash word (set up by csv).
> I understand db only works on files like a.out or core dumps. If I
> wanted to break the assembly language program to examine values, how can
> I force a termination and core dump elegantly, so I can examine some
> register values?
Use 'cdb':
https://minnie.tuhs.org//cgi-bin/utree.pl?file=V6/usr/man/man1/cdb.1
which can do interactive debugging.
Noel
Hi,
I want to use printf from an assembly language program, in V6. It seems
that the Unix Programmer's Manual doesn't show how to use it from assembly,
so I wrote a short C program and captured the assembler output, for some
clues. Listings below.
In my example, the substitutional arguments for printf are pushed onto the
stack in reverse order, then the address of the string, and then printf is
called. After this, 6 is added to the stack pointer. I assume that the
printf routine pops the address of the string off the stack, but leaves the
other values on the stack, hence the need to add 2x3=6 to the stack after
calling printf in my example.
What troubles me is that the stack pointer is not decremented before the
first mov, in the example below. Is this some C convention? I would assume
that the first push in my example would overwrite the top of the stack.
Perhaps I'm not used to PDP-11 stack conventions.
I understand db only works on files like a.out or core dumps. If I wanted
to break the assembly language program to examine values, how can I force a
termination and core dump elegantly, so I can examine some register values?
Paul
*Paul Riley*
Email: paul(a)rileyriot.com
int a, b, c;
int main(){
printf("printf: start\n");
a = 1;
b = 2;
c = 3;
printf("A = %d, B = %d, C = %d", a, b, c);
printf("printf: end\n");
}
.comm _a,2
.comm _b,2
.comm _c,2
.globl _main
.text
_main:
~~main:
jsr r5,csv
jbr L1
L2:mov $L4,(sp)
jsr pc,*$_printf
mov $1,_a
mov $2,_b
mov $3,_c
mov _c,(sp)
mov _b,-(sp)
mov _a,-(sp)
mov $L5,-(sp)
jsr pc,*$_printf
add $6,sp
mov $L6,(sp)
jsr pc,*$_printf
L3:jmp cret
L1:jbr L2
.globl
.data
L4:.byte 160,162,151,156,164,146,72,40,163,164,141,162,164,12,0
L5:.byte
101,40,75,40,45,144,54,40,102,40,75,40,45,144,54,40,103,40,75,40,45,144,0
L6:.byte 160,162,151,156,164,146,72,40,145,156,144,12,0
#
> The demand paging code for SysVR2 was written by Keith A. Kelleman and Steven J. Buroff, and in contemporary conference talks they were saying that they wanted to combine the best parts of demand-paged 32V and BSD. They may have some additional memories that could help with getting a better understanding of the final version of 32V.
>
> Does anybody have contact details for these two gentlemen?
I’ve managed to contact Keith Kelleman and he had some interesting remarks. The paging code in SVR2 was all new code, with a focus the 3B dual processor. It does not derive at the code level from 32V and in fact he does not recall working with the 32V paging code. This kills the hope that the SVR2 code had clues about the 32V code. Keith did suggest that I tried to contact Tom Raleigh, who might have worked with the later 32V code base. Anybody with suggestions for locating him?
===
Besides functionality, the people that remember paged 32V all recall it being very fast. I wonder what made it fast.
First to consider is “faster than what?”. Maybe Rob P. has a specific memory, but I would assume faster than 4BSD: if the comparison was with the "scatter loading, partial swapping” version of 32V people would have expected the better performance and not remember it as impressive 40 years later. Possibly the comparison is with 8th Edition which would have used the 4BSD paging code by then.
If the comparison is with 4BSD, then the CoW feature in paging 32V would have been mostly matched by the vfork mechanism in 4BSD: it covers 80% of the use and it leaves the code paths simpler. If the comparison is with 8th edition, this may be the difference that people remember.
The next possibility is that paging 32V had a better page-out algorithm. Joy/Babaoglu mention that the cost of the clock process is noticable. Maybe paged 32V used a VMS-like FIFO/second chance algorithm that did not need a separate kernel process/thread. Arguably this is not enough for a convincing speed difference.
It is also possible that JFR found a more clever way to do LRU approximation. He remembers that his code used ‘strict LRU’, but not the algorithm. On Tenex - his conceptual reference - that was natural to do, because the MMU hardware maintains a table with 4 words of 36 bits for each frame with statistical data. With the VAX hardware it is a challenge. Considering his mathematical prowess it is maybe plausible that JFR found an efficient way. A slightly better page hit rate gives a significant speed improvement.
All speculation of course: only finding the source will truly tell.
> So there you have it. Just a line of B code that wasn't updated to C.
>
> Cheers,
> aap
I love posts like this, thank you! “Sherlock Holmes and the mysterious case of the excessive parameter list"
Greetings,
It has always bugged me that the bsd-family-tree file got 2.8BSD slightly
wrong.
It has the relationship V6 -> 1BSD -> 2BSD -> 2.79BSD -> 2.8BSD
with V7 -> 32V -> 3BSD ... -> 4.1BSD -> 2.8BSD
Now, as far as it goes, that's not terrible. But it's missing something.
First, there was no V6 code in 1BSD or 2BSD through 2.79BSD. It was for V6
at first then for both V6 and V7, but w/o V7 code. There weren't even
patches for V6 or new drivers on the early 1BSD and 2BSDs. However,
starting with 2.8BSD, there's a V7 kernel. Or should I say a heavily
patched V7 kernel with a ton of #ifdefs for all the fixes and enhancements
collected by Berkeley and a few minor build system tweaks.
Also, the code from 4.1BSD that's in 2.8 is rather minimal from what I can
tell with my analysis so far, except indirectly in some of the patches to
the V7 kernel appear to also be in 4.1BSD. The biggest thing that's in
2.8BSD from 4.1BSD is the job control (confined to its own directory with
big warnings that basically say you'll need to update the code and even
then the system is unstable). 2.9BSD has much better 4.xBSD integration,
but 2.8 was quite early days for rejoining the two lines. 4.1BSD didn't
have many berkeley rewrites of userland code, and the 2.8 tape has only a
few of them (eg ls). So although it's not as complete as one would hope,
there was a decent amount of code from 4.1BSD flowing into 2.8BSD.
Now, my request. I've created a code review for FreeBSD to fix this.
https://reviews.freebsd.org/D30883 is the review. We use phabricator in the
FreeBSD project. Anybody can view this, but if you don't want to create an
account, please send me email with a comment about the change and/or the
commit message. It just adds an arc from V7 to 2.8BSD.
Thanks for any time and/or insight you might have here. I'm judging the
above entirely on the archived code rather than any historical knowledge...
Warner
On 23/06/21, silas poulson wrote:
> I’m aware line 2238’s famous “You are not expected to understand
> this.” Comment is due to odd PDP-11/45 behaviour.
Actually there are two different takes on what it is exactly that you're
not expected to understand. The "obvious" one in Lions' book, (i.e. saved
stack being overwritten when swapping so you have to restore from the
special swap-saved stack), but dmr had a different take on it, that it
had to do with functions really not being happy if you switch the stack
underneath them. You can find dummy variables in the interdata port that
make sure the stack frames of some functions (newproc and swtch?) match.
So not really a hardware thing but a consequence of the compiler.
> Do you know if other sections of the C show remnants of B or the PDP?
> Or is it just those spots?
For the kernel I'm just aware of this one. printf was copied over a
number of times (the c compiler also includes a version) but the kernel
was never written in B, so i wouldn't expect any B-ness in there
otherwise.
aap