> >Since this OCR is independent of the other work that has been done, a
> >diff should provide an opportunity to fix any errors in the comments
> >that would not have been caught by the assembler.
> >
> >Is there a place to upload this without a Google account? The assembler
> >listing is about 416K.
>
> If you email me a tar file (or zip) I can put it on my website. I can
> receive email up to about 10mb. If it's larger than that I can set up
> with an ftp account.
A compressed .tar.gz won't be very large. I can send it along when I get
home late this evening. It's got 100% of the kernel source pages, so
it should be able to fill in any holes.
> I'd also be happy to write a script to diff the files.
"diff -b" works fine, as does tkdiff :-).
James Markevitch
Guys, I'm writing a PDP-11 a.out disassember. I think it will be useful for
a couple of reasons:
- we will be able to convert the extant 1972 binaries back into some form
of source code. It won't be as good as the real thing, but it will be
better than the binary.
- we have some source code in fragmentary form on the s1 tape, see
http://minnie.tuhs.org/UnixTree/1972_stuff/. Some of the fragments
are identifiable, some are not. We might be able to use the
diassembled binaries to identify some of the fragments, and even
reconstruct a hybrid original/diassembled version of the source
for some of the 1972 applications.
Right now, here's what I've got: disassembly of the top of 1972 ls:
sys break: 00
mov $01,044260
mov sp,r5
mov (r5)+,043732
tst (r5)+
dec 043732
mov 043732,043734
bgt 040056
mov $042542,r5
mov (r5)+,r4
cmpb (r4)+,$055
bne 040174
dec 043734
and the top of the frag19 file:
sys break; end+512.
mov $1,obuf
mov sp,r5
mov (r5)+,count
tst (r5)+
dec count
mov count,ocount
bgt loop
mov $dotp,r5
loop:
mov (r5)+,r4
cmpb (r4)+,$'-
bne 1f
dec ocount
At the moment it's a 1-pass disassembler. I want to make it 2-pass: on the
first pass I will try to identify labels for branches, functions, strings and
variable locations (and given them arbitrary names); on the second pass
I'll print out the instructions with reference to the labels.
None of the binaries have symbol tables, unfortunately.
It's a start, anyway.
Warren
I got a chance to do some work on the UNIX V1 sources this evening. I
took the output of my OCR software and with a couple of hours of editing,
it successfully assembles with a MACRO11 assembler modified for "as" syntax,
with the only exception being that "fpsym" is undefined. It looks like
the floating point emulation code is missing.
Since this OCR is independent of the other work that has been done, a
diff should provide an opportunity to fix any errors in the comments
that would not have been caught by the assembler.
Is there a place to upload this without a Google account? The assembler
listing is about 416K.
I wrote much of the bootstrap code a few weeks ago, so it ought to be
straightforward to get this up and running under simulation.
James Markevitch
A while ago, I heard someone (I can't remember who) say that he had a
paper listing of (at least part of) PDP-7 Unix. How much is there in the
way of surviving listings of PDP-7 Unix (if any)? With all of the
discussion of OCRing the V1 Unix kernel listing, I was wondering if
something similar could be done with PDP-7 Unix if enough listings have
survived (which is sort of unlikely, but you never know).
> I have dug up another listing of the PDP-11 assembly languge
> version, which seems to about contemporary with the
> one you have. The files mostly bear a copyright date
> of 1972, but like other printouts from the time,
> the datestamps only give month and day, not year.
> They are generally from May. It is post 11/45,
> and has segmentation and floating-point support.
Very cool! (fpsym, presumably)
> I replied and asked if we could get either a scan copy of the "other listing",
> or if he could send a photocopy to Tim.
As usual, the key is a high resolution, high quality scan. There is a huge
difference between 300dpi and 400dpi/600dpi for this old stuff, since the
signal to noise ratio is much better with the better scans.
This sounds like a broken record, but there was a 1200 page listing were
the first 400 pages were at 300dpi and the remaining 800 pages were at
400dpi. When you zoomed in, the differences were astounding and the
OCR results reflected that (the person needed to do a lot of editing on
the first third of the document to get it to compile).
If someone can get me a hardcopy, I'll scan it at 600dpi, as I am sure
Al would, if Tim isn't set up to scan stuff like this.
James Markevitch
Guys, I got this message from Dennis.
Warren
----- Forwarded message from Dennis Ritchie -----
Subject: Re: Trying to restore 1972 UNIX
Date: Thu, 1 May 2008 00:55:35 -0400
About the assembler, I am pretty sure that it's substantially
the same as that on the 5th edition tape, so it's likely
that a modified version, without the syscall definitions,
could be produced.
I have dug up another listing of the PDP-11 assembly languge
version, which seems to about contemporary with the
one you have. The files mostly bear a copyright date
of 1972, but like other printouts from the time,
the datestamps only give month and day, not year.
They are generally from May. It is post 11/45,
and has segmentation and floating-point support.
Incidentally, it doesn't use any of the system call names
as such; 'read' is at sysread: and so on.
About assembling it, I'm pretty sure we just did
'as u?.s' and the a.out was ready. This was before
make, after all.
Dennis
----- End forwarded message -----
I replied and asked if we could get either a scan copy of the "other listing",
or if he could send a photocopy to Tim.
Cheers,
Warren
I went through all the errors on the code checked in so far and made
edits consistent (I hope :-) with the pdf.
I also added the missing KE11A addresses (memory mapped EAE).
The remaining errors seem to be only due to missing pages.
-brad
> Can you show me how you are running it? (and feel free to cc the list)
(I think its mentioned in an earlier post already). I copy the
files to my 7ed system (make a tar, put it on a tape image, and
attach it in simh, then tar x to get contents). Probably easier
if you're using apout and local filesystem... I'm using the following
script (in my tools but not checked in because I'm using nonstandard
conv2):
tools/rebuild
(cd rebuilt; gtar -O -cf ../u.tar u?.s)
./conv2 -o tape.tm u.tar
cp tape.tm ~/work/simh/unix-v7-4/run/
Anyway to assemble I run:
as - sys.s u0.s u1.s ux.s
btw, I noticed some unicode characters in the files you committed.
I havent' had a chance to spend time editing it yet.. The ocr
often uses unicode for things like "-".
> I think there is a binary format. I think I figured it out once and
> wrote something to turn an a.out into it. hmmm. I'll go digging.
a.out is so simple, it wouldnt be hard to reproduce if we had to.
> I checked in the missing pages from e3, e4 and e8. I have not tried
> to assemble them yet, however.
I noticed that. Thank you.
> -brad
Tim Newsham
http://www.thenewsh.com/~newsham/
> I can happily deal with the jsr pc,do type of jsr, but the ones
> involving r5 have me stumped, e.g.:
>
> jsr r5,questf; < nonexistent\n\0>; .even
I have encountered this type of construct a lot when doing disassemblers
over the years. My usual strategy for dealing with this is:
1. If it's quick and dirty and I am not running huge amounts of code,
then the disassembler allows the user to provide a list of "hints" to
it. The hints for this would describe the arguments to each subroutine.
For illustrative purposes, you might have a side file that contains
the following:
subr 002004 questf string
meaning that location 002004 is a subroutine names questf that expects
a null-terminated string as the argument. As an additional benefit,
you get a nice name for the subroutine that the disassembler can put
into the output.
And if a subroutine takes two 16-bit arguments, you might have:
subr 003436 mysub arg16 arg16
If the disassembler identifies each of the targets of the jsr
instructions, then you can usually do a quick look at the code to
see what it expects, then add to the side file, then re-run the
disassembler.
2. If you want to be less quick and dirty, you can have the disassembler
do a partial flow analysis of the code to figure out what is expected
for arguments. This is usually much more involved and you still often
need to add hints for cases where the '60s or '70s programmer did some
kind of "neat trick" when coding.
My philosophy on these is to use tools to get to the 95%+ level of
automation and provide hints to pick up the rest. Using strategy
number 1 above will probably get you a lot of success with a small
amount of coding in your disassembler.
James Markevitch