I once had to rebuild the root directory using ROM based disk IO and memory peek and poke
commands! At this late date I don’t recall many details but at least the root block was
trashed during some last minute testing. Unlikely it was “rm -rf / something” as that
would’ve done far more damage.
The situation was this:
1. We had *one* working hardware system[1].
2. At that time we only had working floppy controller and driver[2].
3. Our devel system was a VAX running 4.1BSD. no support for the floppy drive so
everything had to be downloaded over a 9600 serial link.
4. The backup floppy either didn’t exist or wasn’t readable.
5. We had custom stuff that was done for a show and not backed up anywhere.
Luckily, due to earlier debugging sessions I knew what the file system layout was — this
was on a version 7 system, with 16 byte dir entries, 2 bytes inode #, followed by and 14
byte name. IIRC inode 0 was not used, inode 1 was used as a temp during fsck based
recovery and inode 2 was for the real root. So I read in the root inode, found the disk
address of its data block, read it in, using fixed up “.” and “..” and “/bin” entries —
usually inode #3 was bin, wrote it back to the floppy and rebooted. I think the unix image
was in /bin. On reboot fsck worked and reconnected all the lost directories, which were
easy to figure out and rename.
The back story is that this happened on a Saturday just before the Fall Comdex 1981, where
we were going to show our system for the first time and we had to take a working system to
Las Vegas by Sunday[3]. We were able to fly out on time and the Comdex show was a success,
with thousands of leads generated.
In terms of lessons, I don’t know what we learned. I guess it pays to be intimately
familiar with the system as we were in an extreme bootstrap situation with very little
working reliably. Seems often startups end up doing acrobatics without a net!
— bakul
[1] Two actually, but the second was a wire wrapped prototype. This was at an early Unix
workstation startup; our main app was “word processing”.
[2] Later we hooked up a Western Digital ST506 controller via parallel port and I was very
happy when my disk driver could do 25 KB/s! No DMA so had to do programmed IO.
[3] Rob Warnock (whom some of you may know from his SGI days) was our very hands-on CTO,
and he was/is great at debugging/problem solving. I don’t know why they waited for me
rather than call Rob. a) I was a junior engineer as that my first job with C & Unix
(6 months at that time) and b) I had already worked about 80+ hours that week and gone
home to sleep at 9AM on Saturday. May be Rob was smart enough to not answer his phone!
On Apr 25, 2017, at 8:28 AM, Dan Cross
<crossd(a)gmail.com> wrote:
On Tue, Apr 25, 2017 at 10:18 AM, Clem Cole <clemc(a)ccc.com
<mailto:clemc@ccc.com>> wrote:
Problem was /etc has been burned too... so the mknod command is off the table.
Either boot into standalone media like some kind of miniroot (that hopefully has a copy
of mknod) or look for some kind of shell builtin? E.g., if the shell provides some
mechanism to make a raw system call, you can do it. E.g., an escape hatch to syscall() or
indir(). If a copy of `mkdir` survived, then on older systems where directory creation was
done by calling mknod(), one might be able to modify `mkdir` enough to create device file
for a tape device to launch a restore off of. I thought some systems came with a
syscall(1) utility, but it does't seem to be current anymore and I can't find
any references to it so perhaps I'm misremembering.
I once messed up a NeXT machine by "mv"'ing the system shared libraries to
an unexpected path. Oops. I had to boot off the CD to fix it, but that's child's
play compared to some of the esoterica you guys are talking about.
- Dan C.
On Tue, Apr 25, 2017 at 10:08 AM, Larry McVoy <lm(a)mcvoy.com
<mailto:lm@mcvoy.com>> wrote:
Whoever was the genuis that put mknod in /etc has my gratitude.
We had other working Masscomp boxen but after I screwed up that
badly nobody would let me near them until I fixed mine :)
And you have to share who it was, I admitted I did it, I think
it's just a thing many people do..... Once :)
On Tue, Apr 25, 2017 at 10:02:26AM -0400, Clem Cole wrote:
Larry,
I had to laugh when I read that because what you don't know is it was part
of my old Unix wizards test which was left over from a the day when one of
our hackers (whom I think you would later get to know so I'll not name him)
accidentally typed: rm -rf . as root from his / on his workstation.
Because /bin/rmdir had been lost, he started getting errors when rmdir was
forked. So he hit ^C, but he had already lost: /bin, /dev, /etc, /lib,
most of /usr. He was a developer in the networking group so he was working
on network code which we could not trust would not panic (in fact we
disconnected the node from the ethernet immediately just in case). But we
did have pretty much everything in /usr/bin/[s-z]* -- that is we think it
was deleting files in /usr/bin when he stopped it.
We obviously had another working Masscomp box just like it. And of course
the shell was working on the machine that was in trouble. We recovered
the system as it was. Hint the key item is you have to start by putting
/dev back together and the solution to that problem has had been discussed
on this list.
Clem
On Mon, Apr 24, 2017 at 7:59 PM, Larry McVoy <lm(a)mcvoy.com
<mailto:lm@mcvoy.com>> wrote:
> This is gonna seem like I'm tooting my own horn, and I am a little, but
> here's an rm -rf / story.
>
> Clem will be amused because I was a junior or senior in college and a sys
> admin for a Masscomp with a 40MB disk with 20 users. And I did some
> version
> of rm -rf /, realized part way through that I screwed up, and killed it.
> But /bin and /dev were gone so putting things back together was hard.
>
> But I did it and wrote up this little note for the people who came after
> me, if I was stupid enough to do this someone else would, was my thinking.
> You can get a sense of how scared I was in it if you read it carefully.
> It was a very long night.
>
> For an undergrad, I think it's not bad? Maybe? I dunno, I look at how
> much I needed to have understood to get the system back up, that's a lot
> of reading, playing, experience. Love that Geophysics department, they
> pushed me.
>
> And it was during my (brief) foray into the *roff -me macros (I went
> -ms and never looked back). Roff source on request to anyone who is
> twisted enough to want it.
>
>
http://mcvoy.com/lm/masscomp-restore.pdf
<http://mcvoy.com/lm/masscomp-restore.pdf>
>
> Complete with all the typos.
>
> --lm
>
--
---
Larry McVoy lm at
mcvoy.com <http://mcvoy.com/>
http://www.mcvoy.com/lm <http://www.mcvoy.com/lm>