> From: Will Senn
> I noticed that the sum utility from v6 reports a different checksum
> than it does using the sum utility from v7 for the same file.
> ... does anyone know what's going on here?
> Why is sum reporting different checksum's between v6 and v7?
The two use different algorithms to accumulate the sum (I have added comments
to the relevant portion of the V6 assembler one, to help understand it):
V6:
mov $buf,r2 / Pointer to buffer in R2
2: movb (r2)+,r4 / Get new byte into R4 (sign extends!)
add r4,r5 / Add to running sum
adc r5 / If overflow, add carry into low end of sum
sob r0,2b / If any bytes left, go around again
Read the description of MOVB in the PDP-11 Processor manual.
V7:
while ((c = getc(f)) != EOF) {
nbytes++;
if (sum&01)
sum = (sum>>1) + 0x8000;
else
sum >>= 1;
sum += c;
sum &= 0xFFFF;
}
I'm not clear on some of that, so I'll leave its exact workings as an
exercise, but I'm pretty sure it's not a equivalent algorithm (as in,
something that would produce the same results); it's certainly not
identical. (The right shift is basically a rotate, so it's not a straight sum,
it's more like the Fletcher checksum used by XNS, if anyone remembers that.)
Among the parts I don't get, for instance, sum is declared as 'unsigned',
presumably 16 bits, so the last line _should_ be a NOP!? Also, with 'c' being
implicitly declared as an 'int', does the assignment sign extend? I have this
vague memory that it does. And does the right shift if the low bit is one
really do what the code seems to indicate it does? I have this bit that ASR on
the PDP-11 copies the high bit, not shifts in a 0 (check the processor
manual). That is, of course, assuming that the compiler implements the '>>'
with an ASR, not a ROR followed by a clear of the high bit, or something.
Noel
Ok, it definitely sounds like the v6tar source is around somewhere so
if someone could point me in the right direction...
I've only seen the binary, and I can't remember where I got it.
Mark
All,
While working on the latest episode of my saga about moving files
between v6 and v7, I noticed that the sum utility from v6 reports a
different checksum than it does using the sum utility from v7 for the
same file. To confirm, I did the following on both systems:
# echo "Hello, World" > hi.txt
# cat hi.txt
Hello, World
Then on v6:
# sum hi.txt
1106 1
But on v7:
# sum hi.txt
37264 1
There is no man page for the utility on v6, and it's assembler. On v7,
there's a manpage and it's C:
man sum
...
Sum calculates and prints a 16-bit checksum for the named
file, and also prints the number of blocks in the file.
...
A few questions:
1. I'll eventually be able to read assembly and learn what the v6
utility is doing the hard way, but does anyone know what's going on here?
2. Why is sum reporting different checksum's between v6 and v7?
3. Do you know of an alternative to check that the bytes were
transferred exactly? I used od and then compared the text representation
of the bytes on the host using diff (other than differences in output
between v6 and v7 related to duplicate lines, it worked ok but is clunky).
Thanks,
Will
All,
In my exploration of v6, I followed the advice in "Setting up Unix -
Seventh Edition" and copied v6tar from v7 to v6. Life is good. However,
tar is using mt1 and it is hard coded into the source, tar.c:
char magtape[] = "/dev/mt1";
As the subject line suggested, I have two questions for those of you who
might know:
1. Why is it hard coded?
2. Why is it the second device and not the first?
Interestingly, it took me a little while to figure out it was doing this
because I didn't actually move files between v6 and v7 until today.
Before this my tests had been limited to separate tests on v6 and v7
along the lines of:
cd /wherever
tar c .
followed by
tar t
list of files
cd /elsewhere
tar x
files extracted and matching
What it was doing was writing to the non-existant /dev/mt1, which it
then created, tarring up stuff, and exiting. Then when I listed the
contents of the tarfile, or extracted the contents, it was successful.
But, when I went to move the tape between v6 and v7, the tape (mt0) was
blank, of course. It was at this point that I followed Noel's advice
and "Used the source", and figured out that it was hard-coded as you see
above.
Thanks,
Will
That's exactly right. ld performs the same task as LOAD did on BESYS,
except it builds the result in the file system rather than user
space. Over time it became clear that "linker" would be a better
term, but that didn't warrant canning the old name. Gresham's law
then came into play and saddled us with the ponderous and
misleading term, "link editor".
Doug
> My understanding, which predates my contact with Unix, is that the
> original toochains for single-job machines consisted of the assembler
> or compiler, the output of which was loaded directly into core with
> the loader. As things became more complicated (and slow), it made
> sense to store the memory image somewhere on drum, and then load that
> image directly when you wanted to run it. And that in some systems
> the name "loader" stuck, even though it no longer loaded. Something
> like the modern ISP use of the term "modem" to mean "router". But I
> don't have anything to back up this version; comments welcome.
> estabur (who thought these names up, I know 8 characters is limiting,
> but c'mon)
'establish user mode registers'
> the 411 header is read by a loader
Actually, it's read by the exec() system call (in sys1.c).
Noel
> From: Dave Horsfall
> I love those PDP-11 instructions, such as "blos" and "sob" :-)
Yes, but alas, there is no 'jump [on] no carry' instruction! (Yes, yes, I
know about BCC! :-) Although I guess the x86 has one...
Noel
> Yes the V6 kernel runs in split I and D mode, but it doesn't end up
> supporting any more data. I.e. the kernel is still a 407 (or 410) file.
> _etext/_edata/_end are still referencing the same 64K space.
Err, actually, not really.
The thing is that to build the split-I/D kernel, one sets the linker to
produce an output file which still contains the relocation bits. That is then
post-processed by 'sysfix', which does wierd magic (moves the text up to
020000, in terms of address space; and puts the data _below_ the text, in the
actual output file). So while the files concerned may have a '407' in their
header, they definitely aren't what one normally finds in a linked 407 or 410
file.
In particular, data addresses start at 0, and can run up to 0140000 (i.e. up
to 56KB), while text addreses start at 020000 and can run up to 0160000. So,
_etext/_edata/_end are not, in fact, in the same 64K space. And the total of
data (initialized and un-initialized) together with the text can be much
larger than 64KB - up to 112KB (modor so.
Noel