On 12/9/15 9:07 AM, Clem Cole wrote:
Clem,
Nice!
In full disclosure, I read about magic numbers being PDP
architecture related somewhere in the recent past, but I didn't
really understand the connection. So, when you mentioned it, rather
than google it and spoil the fun, I took the hint and thought a bit
about it in the context of our discussion. As it turns out,
according to my now handy-dandy PDP11/40 processor handbook, the 18
bits holding the word 000407 which appears as the first word of most
of my v6 binaries, 000 000 100 000 111b, the magic number represents
the BR instruction base where the last 8 bits is an offset added to
the PC after it has read the word. So, the PDP loads the object,
reads the first word, determines it is a branch instruction:
BR PC+7
and jumps forward to start reading the 9th word of the program. Why
the 9th word? According to man 5 a.out, the a.out header always
contains 8 words.
So, given a binary, like write:
dd if=write bs=128 count=1|od
1+0 records in
1+0 records out
0000000 000407 001176 000032 000150 000000 000000 000000 000001
0000020 022627 000002 001412 003006 012700 000001 104404 000657
...
The analysis that follows in light of the information discovered so
far would be:
word 1 at 0000000 000407 Magic Number (BR PC+7)
word 2 at 0000002 001176 Size of the program text segment (638
bytes)
word 3 at 0000004 000032 Size of the initialized data segment (26
bytes)
word 4 at 0000006 000150 Size of the uninitialized data segment (104
bytes)
word 5 at 0000010 000000 Size of the symbol table (stripped in this
case)
word 6 at 0000012 000000 The entry location 0 at present
word 7 at 0000014 000000 Unused
word 8 at 0000016 000001 Relocation bit suppression flag (I think
this means the file contains absolute memory references)
word 9 at 0000020 022627 The target of the first branch, contains
instructions to MOV (R6)+,(R7)+ (I think)...
According to man 5 a.out on v7, the different magic numbers
represent normal, read-only text, separate I&D, and overlay:
#define A_MAGIC1 0407 /* normal */
#define A_MAGIC2 0410 /* read-only text */
#define A_MAGIC3 0411 /* separated I&D */
#define A_MAGIC4 0405 /* overlay */
This means that branches are to 9th, 10th, 11th and 7th words,
respectively. It'll be a while before I really understand what the
ramifications are.
Oh and by the way, jumping between octal and decimal is weird, but
convenient once you get the hang of it - 512 is 1000, which is nifty
and makes finding buffer boundaries in an octal dump easy :).
Thanks,
Will