I've assembled some notes from old manuals and other sources
on the formats used for on-disk file systems through the
Seventh Edition:
http://www.cita.utoronto.ca/~norman/old-unix/old-fs.html
Additional notes, comments on style, and whatnot are welcome.
(It may be sensible to send anything in the last two categories
directly to me, rather than to the whole list.)
Hi,
I successfully made SIMH VAX-11/780 emulator run 32V, 3BSD and 4.0BSD.
Details are on my web site (thogh rather tarse):
http://zazie.tom-yam.or.jp/starunix/
Enjoy!
Naoki Hamada
nao(a)tom-yam.or.jp
> Ken and Dennis and the other guys behind
> the earliest UNIX code were smart guys and good programmers,
> but they were far from perfect; and back in those days we
> were all a lot sloppier.
The observation that exploits may be able to parlay
mundane bugs into security holes was not a commonplace
back then--even in the Unix room. So input buffers were
often made "bigger than ever will be needed" and left
that way on the understanding that crashes are tolerable
on outlandish data. In an idle moment one day, Dennis fed
a huge line of input to most everything in /bin. To the
surprise of nobody, including Dennis, lots of programs
crashed. We WERE surprised a few years later, when a journal
published this fact as a research result. Does anybody
remember who published that deep new insight and/or where?
Doug
So it turns out the 'dcheck' distributed with V6 has two (well, three, but
the third one was only a potential problem for me) bugs it.
The first was a fence-post error on a table clearing operation; it could
cause the entry for the last inode of the disk in the constructed table of
directory entry counts to start with a non-zero count when a second disk was
scanned. However, it was only triggered in very specific circumstances:
- A larger disk was listed before a smaller one (either in the command line,
or compiled in)
- The inode on the larger disk corresponding to the last inode on the smaller
one was in use
I can understand how they never ran across this one.
The other one, however, which was an un-initalized variable, should have
bitten them anytime they had more than one disk listed! It caused the
constructed table of directory entry counts to be partially or wholly
(depending on the size of the two disks) blank in all disks after the first
one, causing numerous (bogus) error reports.
(It was also amusing to find an un-used procedure in the source; it looks
like dcheck was written starting with the code for 'icheck' - which explains
the second bug; since the logic in icheck is subtly different, that variable
_is_ set properly in icheck.)
How this bug never bit them I cannot understand - unless they saw it, and
couldn't be bothered to find and fix it!
To me, it's completely amazing to find such a serious bug in such a critical
piece of widely-distributd code! A lesson for archaeologists...
Anyway, a fixed version is here:
http://ana-3.lcs.mit.edu/~jnc/tech/unix/ucmd/dcheck.c
if anyone cares/needs it.
Noel
Larry McVoy scripsit:
> I love Rob Pike, he's spot on on a lot of stuff. I'm a big fan of
> "if you think you need threads then your processes are too fat".
Oh, he's a brilliant fellow. I don't know him personally, but I know
people who do, and I don't think I'd love him if I knew him. Humanity has
always found it useful to keep its (demi)gods at arm's length at least.
--
John Cowan http://www.ccil.org/~cowan cowan(a)ccil.org
Barry thirteen gules and argent on a canton azure fifty mullets of five
points of the second, six, five, six, five, six, five, six, five, and six.
--blazoning the U.S. flag
> From: jnc(a)mercury.lcs.mit.edu (Noel Chiappa)
> the second (the un-initialized variable) should have happened every
> time.
OK, so I was wrong! The variable in question was a global static, 'ino' (the
current inode number), so the answer isn't something simple like 'it was an
auto that happened to be cleared for each disk'. But now that I look closely,
I think I see a way it might have worked.
'dcheck' is a two-pass per disk thing: it begins each disk by clearing its
'inode link count' table; then the first pass does a pass over all the inodes,
and for ones that are directories, increments counts for all the entries; the
second pass re-scans all the inodes, and makes sure that the link count in the
inode itself matches the computed count in the table.
'ino' was cleared before the _second_ pass, but not the _first_. So it was
zero for the first pass of the first disk, but non-zero for the first pass on
the second disk.
This looks like the kind of bug that should almost always be fatal, right?
That's what I thought at first... (and I tried the original version on one of
my machines to make sure it did fail). But...
The loop in each pass has two index variables, one of which is 'ino', which it
compares with the maximum inode number for that disk (per the super-block),
and bails if it reaches the max:
for(i=0; ino<nfiles; i =+ NIBLK)
If the first disk is _larger_ than the second, the first pass will never
execute at all for the second desk (producing errors).
However, if the _second_ is larger, then the second disk's first pass will in
fact examine the starting (nfilesSUBsecond - nfilesSUBfirst) inodes of the
second disk to see if they are directories (and if so, count their links).
So if the last nfilesSUBfirst inodes of the second disk are empty (which is
often the case with large drives - I had modified 'df' to count the free
inodes as well as disk blocks, and after doing so I noticed that Unix seems to
be quite generous in its default inode allocations), it will in fact work!
The fact that 'ino' is wrong all throughout the first pass of the second disk
(it counts up from nfilesSUBfirst to nfilesSUBsecond) turns out to be
harmless, because the first pass never uses the current inode number, it only
looks at the inode numbers in the directories.
Note that with two disks of _equal size_, it fails. Only if the second is
larger does it work! (And this generalizes out to N disks - as long as each
one is enough larger than the one before!) So for the config they were
running (rk2, dp0) it probably did in fact work!
Noel
Noel Chiappa:
To me, it's completely amazing to find such a serious bug in such a critical
piece of widely-distributd code! A lesson for archaeologists...
======
To me it's not surprising at all.
On one hand, current examples of widely-distributed critical
code containing serious flaws are legion. What, after all,
were the Heartbleed and OS X goto fail; bugs? What is every
version of Internet Explorer?
On the other hand, Ken and Dennis and the other guys behind
the earliest UNIX code were smart guys and good programmers,
but they were far from perfect; and back in those days we
were all a lot sloppier.
So surprising? No. Interesting? Certainly. All bugs are
interesting.
(To me, anyway. Back in the 1980s, when I was at Bell Labs,
SP&E published a paper by Don Knuth discussing all the many
bugs found in TeX, including some statistical analysis. I
thought it fascinating and revealing and think reading it
made me a better programmer. Rob Pike thought it was terribly
boring and shouldn't have been published. Decidedly different
viewpoints.)
Norman Wilson
Toronto ON
> From: Ronald Natalie <ron(a)ronnatalie.com>
> If I understand what you are saying, it only occurs when you run dcheck
> with mutliple volumes at one time?
Right, _both_ bugs have that characteristic. But the first one (the
fence-post) only happens in very particular circumstances; the second (the
un-initialized variable) should have happened every time.
> From: norman(a)oclsc.org (Norman Wilson)
> To me it's not surprising at all.
> On one hand, current examples of widely-distributed critical code
> containing serious flaws are legion.
What astonished me was not that there was a bug (which I can easily believe),
but that it was one that would have happened _every time they ran it_.
'dcheck' has this list of disks compiled into it. (Oh, BTW, my fixed version
now reads a file, /etc/disks; I am running a number of simulated machines,
and the compiled-in table was a pain.)
So I would have thought they must have at least tried that mode of operation
once? And running it that way just once should have shown the bug. Or did
they try it, see the bug, and 'dealt' with it by just never running it that
way?
Noel
> From: asbesto <asbesto(a)freaknet.org>
> We have about 40 disks, with RT-11 on them
Ah. You should definitely try Unix - a much more pleasant computing/etc
environment!
Although without a video editor... although I hope to have one available
'soon', from the MIT V6+ system (I think I have found some backup tapes from
it).
> This PDP-11/34 was used for a medical CAT equipment
As, so it probably has the floating point, then. If so, you should be able to
use the Shoppa V6 Unix disk as it is, then - that has a Unix on it which will
work on an 11/23 (which don't have the switch register that V6 normally
requires).
But if not, let me know, and I can provide a V6 Unix for it (I already have
the tweaked version running on a /23 in the simulator).
Noel
PS: For those who downloaded the 'fixed' ctime.c (if anyone :-), it turns out
there was a bug in my fix - in some cases, one variable wasn't initialized
properly. There's a fixed one up there now.
> From: asbesto <asbesto(a)freaknet.org>
> Just in these days we restored a PDP-11/23PLUS here at our Museum! :)
> ...
> CPU is working
That is good to hear! You all seem to have been very resourceful in making
the power supply for it!
> and we're trying to boot from a RL02 unit :)
Is your RL02 drive and RLV11 controller all working? Here are some
interesting pages:
http://www.retrocmp.com/pdp-11/pdp-1144/my-pdp-1144/rl02-disk-troublehttp://www.retrocmp.com/pdp-11/pdp-1144/my-pdp-1144/more-on-rl01rl02
from someone in Germany about getting their RL11 and RL02 to work.
Also, when you say "boot from an RL02", what are you trying to boot? Do you
have an RL02 pack with a working system on it? If so, what kind - a Unix
of some sort, or some DEC operating system?
> From: SPC <spedraja(a)gmail.com>
> I'll keep a reference of this message and try it as soon as possible...
Speaking of getting Unix to run on an 11/23 with an RL02... I just realized
that the hard part of getting a Unix running, for you, will not be getting V6
to run on a machine without a switch register (which is actually pretty easy
- I have worked out a way to do it that involves changing one line in
param.h, and adding two lines of code to main.c).
The hard part is going getting the bits onto the disk! If all you have is an
RL02, you are going to have to load bits into the computer over a serial line.
WKT has done this for V7 Unix:
http://www.tuhs.org/Archive/PDP-11/Tools/Tapes/Vtserver/
but V7 really wants a machine with split I/D (which the /23 does not have). I
guess V7 'sort of' works on a machine without I/D, but I'm not a V7 expert,
so I can't say for sure.
It would not be hard to do something similar to the VTServer thing for V6,
though. If you would like to go this way, let me know, I would be very
interested in helping with this.
Also, do you only have one working RL02 drive, or more than one? If you only
have one, you will not be able to do backups (unless you have something else
connected to the machine, e.g. some sort of tape drive, or something).
Noel