[ This whole article is just a flight of fancy; feel free to ignore it or
at least treat it as the whimsy that it is. ]
The PDP-7 Unix system is the first step on the evolution of Unix as we know it.
We have a snapshot of the system around the end of 1970/beginning of 1971 at
http://www.tuhs.org/Archive/PDP-11/Distributions/research/McIlroy_v0/
and a reconstructed and working partial system at
https://github.com/DoctorWkt/pdp7-unix
PDP-7 Unix was a playground for Ken, Dennis and others to try out ideas and
implementations, and it was quickly superseded by 1st Edition PDP-11 Unix.
Details of how it evolved are at https://www.bell-labs.com/usr/dmr/www/hist.html
and https://www.bell-labs.com/usr/dmr/www/chist.html
All fine and good. However, I keep wondering, how far could they have taken
Unix on the PDP-7 platform?
The Kernel
----------
The reconstructed kernel only occupies 3070 words of 4096 words available,
so there is room left for more code. There is already an alternative
reconstruction where the "dd" concept has been replaced with the ".."
concept (see https://github.com/DoctorWkt/pdp7-unix/tree/master/src/alt)
PDP-7 Unix doesn't have the concept of absolute or relative filenames
(e.g. /usr/bin/ls or a/b/c or ../../file), Could the nami kernel function
be modified to do this? It would probably mean changing from two characters
packed into a word to a single character per word (to make searching for
'/' easier), and this would turn it into something more recognisable as Unix.
What about pipes? They should not be too hard to implement. Even sixteen
pipes with a 16-word buffer each would only be 256+ extra data words in
the kernel. And a hundred words of code?
There are only a few special devices in the kernel: ttyin, ttyout, keyboard,
display, pptin, pptout. What about a disk block device? Was there a PDP-7
tape device, and if so, why not a tape driver and block device?
Filesystem
----------
The implementation of the filesystem is, in some places, quite inefficient.
The free block list is implemented as follows. In each block, there are
10 free block numbers then a pointer to the next part of the free list.
However, each block can hold 64 block numbers, so why are only 10 free
block numbers stored in each block? By using the whole of a block to store
free block numbers, there would actually be more free blocks to use!
Each i-node (size 12 words, 7 of which are direct or indirect pointers)
has one word which holds a unique value. This doesn't seem to be used at
all. If it was reused as a block pointer, this would allow files to be
up to 8*64=512 (small) or 8*64*64=32768 (large) words in length, instead of
7*64=448 words (small) or 7*64*64=28672 (large) words.
The system is set up to only use one side of the two-sided disk device.
It looks like the other side was used to backup (snapshot) a working
system in case of catastrophic filesystem corruption: they could simply
copy the blocks from the "snapshot" side back to the working side. We
could double the size of the filesystem quite easily.
Macro Assembler
---------------
The kernel is written using fairly tight assembly code, and there probably
isn't a way to translate it into a high-level language. The PDP-7 has an
arcane instruction set, and the existing assembler syntax is nothing special.
What about a macro assembler that makes it easier to write code, especially
readable code? Here are some ideas based on the existing kernel:
u.rq := 8 ==> lac 8; dac u.rq
function swap ==> swap: 0
{
return; jmp swap i
}
subroutine .fork ==> fork: line 1 // i.e. not a function
{
line 1
}
loop ==> 1:
{
} jmp 1b
if (sad dm1) ==> sad dm1
{ jmp 1f
code1 code1
} else { 1: code2
code2
}
betwen(o101,o132) ==> jms betwen; o101; o132
There are probably a bunch more that could be added. The aim is to
make the control structures easier to read and write. The programmer
still has to grok the PDP-7 instruction set.
B (or other) Language
---------------------
PDP-7 Unix has a B compiler while compiles source down to a virtual
instruction set which is interpreted by the B interpreter. We have the
B interpreter code, and Robert Swierczek managed to rewrite the B compiler,
see https://github.com/DoctorWkt/pdp7-unix/blob/master/src/other/b.b
At first glance, the PDP-7 architecture is not that amenable to high level
languages, but it turns out that it is indeed possible to write a compiler for
a C subset that targets the PDP-7, see https://github.com/DoctorWkt/a-compiler.
So, could the B compiler be modified to actually output PDP-7 assembly code?
If so, we could rewrite the utilities (cp, mv, ls etc.) in a high-level
language and make the system easier to maintain. I would recommend treating
int and char as the same and only storing one char per word.
And then, even though the PDP-7 architecture doesn't support it, how hard
would it be to add int/char types and bring the language one step closer to C?
Conclusion
----------
All of this is pie in the sky. It can certainly be done, but a) who has
the time and b) it would be a "tour de force", nothing really useful. But
imagine if, at the beginning of 1970, Unix had a proper B or C compiler,
utilities written in this high-level language, a kernel written in a semi-high
level language, and a system with pipes and proper pathnames.
Cheers, Warren
Let me be the first to say that the International Earth Rotation Service
has announced that there will be a Leap Second inserted at 23:59:59 UTC on
the 31st December, due to the earth slowly slowing down. It's fun to
listen to see how the time beeps handle it; will your GPS clock display
23:59:60, or will it go nuts like my last one did (I had to power cycle
the thing)?
My recording of the last one: horsfall.org/leapsecond.webm .
--
Dave Horsfall DTM (VK2KFU) "Those who don't understand security will suffer."
> So was "/usr/bin" initially only for user-contributed binaries, or was
it from its inception a place for binaries that were not essential for
system boot and could not fit in the root partition?
The latter is my understanding, but early on the two
interpretations would have been nearly coextensive.
Remember, though, that even Ken wrote some "user-contributed"
code.
Doug
> I don't think I ever heard the appellation "phototypesetter C"
> before.
Interesting data point; thanks for passing that along.
> Certainly C and the phottypesetter developed independently, though in
> the same room. But the explanation that they got linked by appearing in
> the same tape release makes perfect sense.
I have this vague memory of being told, or reading somewhere, that many of the
changes from 'vanilla V6' C to 'phototypsetter' C were added because they were
needed for that project, hence the name. Alas, I have no idea where I might
have gotten that from (I had a quick look at a few likely documentary sources,
but no joy).
It's quite possible that this was a supposition on someone outside Bell's part
(or perhaps inside Bell, but outside the Unix group), because the two came out
in the same tape.
Reading the notes about the upgrades (in particular, "newstuff.nr") makes it
seem like a more likely driver of _some_ of the changes was the Interdata port
(which was also happening at around the same time, if I have the timeline
correct). And of course some might have been driven by general utility (e.g.
the ability to initialize structures).
It would be interesting to see if anyone remembers why these changes were made.
Noel
The document
http://www.tuhs.org/Archive/PDP-11/Distributions/research/1972_stuff/Readme
discusses the uncertainty regarding the epoch used for the file timestamps.
"The biggest problem here is to pin down the epoch for the files. In the
early version of UNIX, timestamps were in 1/60th second units. A 32-bit
counter using these units overflows in 2.5 years, so the epoch had to
be changed periodically, and I believe 1970, 1971, 1972 and 1973 were
all epochs at one stage or another."
"Given that the C compiler passes, and the library, are dated in June
of the epoch year, and that Dennis has said ``1972-73 were the truly
formative years in the development of the C language'', it's therefore
unlikely that the epoch for the s2 tape is 1971: it is more likely to
be 1972. The tape also contains several 1st Edition a.out binaries,
which also makes it unlikely to be 1973."
"Therefore, Warren's decoding of the s2-bits file, in s2-bits.tar.gz,
uses 1972 as the epoch. However, Dennis decoding in s2.tar.gz uses 1973."
"Finally, the date(1) a.out on the tape uses 1971 as its archive. How
annoying! After a bit of discussion, Dennis and Warren have agreed that
1972 is the most probable epoch for s2-bits."
I thought I could validate the epoch by looking at the distribution of
weekdays for the three alternative years (1971 to 1973). Here are the
results.
wget
http://www.tuhs.org/Archive/PDP-11/Distributions/research/1972_stuff/Readme
for guess in 1971 1972 1973 ; do
echo $guess
EPOCH=$(date +'%s' -d "$guess/01/01 00:00 UTC")
awk '/\/core/,/\/etc\/init/ {
if ($9) print strftime("%a", '$EPOCH' + $9 / 60)}' Readme |
sort |
uniq -c |
sort -n
done
1971
1 Sat
6 Mon
8 Thu
8 Tue
17 Fri
21 Wed
34 Sun
1972
1 Sun
6 Tue
8 Fri
8 Wed
17 Sat
21 Thu
34 Mon
1973
1 Tue
6 Thu
8 Fri
8 Sun
17 Mon
21 Sat
34 Wed
As you can see, unless weekends at the Bell Labs were highly atypical,
1972 has the most probable distribution of work among the days of the week.
> of course some [of the changes to C in this time period] might have been
> driven by general utility (e.g. the ability to initialize structures).
I was thinking about this, with my memory of the changes refreshed by
re-reading those 'changes to C' notes, and it's interested how many of them
were ones I personally (and most of the people working with me) didn't use.
Here is a list of the changes described in those 3 documents:
'newstuff':
- Longs
- Unsigneds
- Blocks (locals declared inside a non-top-level {} pair)
- Initialization of structures
- Initialization of automatic variables
- Bit fields
- Macro functions
- Macro conditionals (#ifdef)
- Arguments in registers
- Typedefs
- 'Static' scope
'Changes':
- Multi-line macros
- Undefine
- Conditional expressions (#if)
- Unions
- Casts
- Sizeof() on abstractions
- '=' in initializations
- Change binary operators from trailing to leading
- 'extern' does not allocate storage
(This note also includes unsigneds, blocks, and structure initializations,
from the earlier? note.)
'cchanges':
- Structure assignment and argument/return-value
- Enum
Of these, I never really used quite a few: blocks, automatic initializations,
typedefs, unions, structure assignment/etc, or enum. I'm not sure if I ever
used bit fields, either. Some of these are understandable; e.g. automatic
initializations are just syntactic sugar (as are register arguments, but I did
use those).
Typedef is also effectively syntactic sugar; you can always use a macro and
get almost (entirely?) the same result. In fact, I devised an entire system of
types to make the code I was working on (almost entirely networking code, so
lots of packet headers from other machines, etc) more rigorous - and later it
turned out it had made it much more portable; it all used macros, not typedef.
I don't think I ever used typedef...
(The details of that might be of some interest: instead of int, long, etc we
had things of the form {type}{len}, where {type}pe} was 'int', 'uns', 'bit',
etc and length was 'b', 's', 'l', or two other interesting ones 'w' and 'f' -
'w' meant the machine's native word length, and 'f' meant whatever was fastest
on the machine. So 'unsl' mean '32-bit unsigned'. Depending on the machine,
the compiler couldn't always produce them all - e.g. the PDP-11 didn't have
unsl - so sometimes you had to live with non-optimal replacements. There were
also un-typed types, i.e. 'byte', 'swrd', 'lwrd' - 8, 16 and 32 bits - and
'word' - the machine's native length.)
Unions didn't get used much either, in our stuff, although one would think it
would be useful in network code - you get a packet with a pile of bits inside
it, which can be one of N different formats, seems like a perfect application
for a union. The problemis that it tied two different subsystems intimately
together. If you have protocol A and protocol B, if you use a union to define
the header format, the union has to have both A and B in it. Now if you want
to add protocol C, that requires modifying that union definition. It was much
easier to just take a pointer to the outer packet's data area, and assign
(with cast) it to a new pointer variable which was of the correct type for the
header you were trying to process.
Some of the new things were incredibly useful, though - or, in fact, one
couldn't get by without them. Casts were incredibly useful once the compiler
got pickier. Initialization of structures was huge - other than 'bdevsw'-like
hacks, there was just no other way to do that.
Noel
> From: Warren Toomey
> Ritchie, D.M. The UNIX Time Sharing System. MM 71-1273-4.
> which makes me think that the draft version Doug McIlroy found
Not really a response to your question, but I'd looked at that
'UnixEditionZero' and was very taken with this line, early on:
"the most important features of UNIX are its simplicity [and] elegance"
and had been meaning for some time to send in a rant.
The variants of Unix done later by others sure fixed that, didn't they? :-(
On a related note, great as my respect is for Ken and Doug for their work on
early Unix (surely the system with the greatest bang/buck ratio ever), I have
to disagree with them about Multics. In particular, if one is going to have a
system as complex as modern Unices have become, one might as well get the
power of Multics for it. Alas, we have the worst of both worlds - the size,
_without_ the power.
(Of course, Multics made some mistakes - primarly in thinking that the future
of computing lay in large, powerful central machines, but other aspects of
the system - such as the single-level store - clearly were the right
direction. And wouldn't it be nice to have AIM boxes to run our browers and
mail-readers in - so much for malware!)
Noel
The Second Edition manual has a section titled "User Maintained
Programs" listing the following utilities and games: basic, bc, bj, cal,
chash, cre, das, dli, dpt, moo, ptx, tmg, and ttt. In the Introduction,
the reader is asked to consult their authors for more information.
Does anyone remember whether at the time these were installed in the
system-wide /bin directory, or whether they were only available in their
owners' home directories?
All, I've just got back from a few days away to find 14 new subscription
requests to the TUHS mailing list. Welcome aboard to you all.
Normally I only get one request a month, so I have some concerns about
the legitimacy of all these requests, so accept my apologies in advance
if there is any off-topic e-mails in the next few days.
P.S The mail while I was away was very interesting. Noel, you might
also be interested in the B interpreter and Robert Swierczick's B
compiler the PDP-7 Unix. The original B compiler doesn't exist, so
Robert took the 11/20 C compiler and "undid" the code that does types
so that it "became" a B compiler.
https://github.com/DoctorWkt/pdp7-unix/tree/master/src/other
Cheers all
Warren