TUHS

tuhs@tuhs.org

6 participants
6532 discussions

Is it time to resurrect the original dsw (delete with switches)?

by Jon Steinhart

I recently upgraded my machines to fc34. I just did a stock uncomplicated installation using the defaults and it failed miserably. Fc34 uses btrfs as the default filesystem so I thought that I'd give it a try. I was especially interested in the automatic checksumming because the majority of my storage is large media files and I worry about bit rot in seldom used files. I have been keeping a separate database of file hashes and in theory btrfs would make that automatic and transparent. I have 32T of disk on my system, so it took a long time to convert everything over. A few weeks after I did this I went to unload my camera and couldn't because the filesystem that holds my photos was mounted read-only. WTF? I didn't do that. After a bit of poking around I discovered that btrfs SILENTLY remounted the filesystem because it had errors. Sure, it put something in a log file, but I don't spend all day surfing logs for things that shouldn't be going wrong. Maybe my expectation that filesystems just work is antiquated. This was on a brand new 16T drive, so I didn't think that it was worth the month that it would take to run the badblocks program which doesn't really scale to modern disk sizes. Besides, SMART said that it was fine. Although it's been discredited by some, I'm still a believer in "stop and fsck" policing of disk drives. Unmounted the filesystem and ran fsck to discover that btrfs had to do its own thing. No idea why; I guess some think that incompatibility is a good thing. Ran "btrfs check" which reported errors in the filesystem but was otherwise useless BECAUSE IT DIDN'T FIX ANYTHING. What good is knowing that the filesystem has errors if you can't fix them? Near the top of the manual page it says: Warning Do not use --repair unless you are advised to do so by a developer or an experienced user, and then only after having accepted that no fsck successfully repair all types of filesystem corruption. Eg. some other software or hardware bugs can fatally damage a volume. Whoa! I'm sure that operators are standing by, call 1-800-FIX-BTRFS. Really? Is a ploy by the developers to form a support business? Later on, the manual page says: DANGEROUS OPTIONS --repair enable the repair mode and attempt to fix problems where possible Note there’s a warning and 10 second delay when this option is run without --force to give users a chance to think twice before running repair, the warnings in documentation have shown to be insufficient Since when is it dangerous to repair a filesystem? That's a new one to me. Having no option other than not being able to use the disk, I ran btrfs check with the --repair option. It crashed. Lesson so far is that trusting my data to an unreliable unrepairable filesystem is not a good idea. Since this was one of my media disks I just rebuilt it using ext4. Last week I was working away and tried to write out a file to discover that /home and /root had become read-only. Charming. Tried rebooting, but couldn't since btrfs filesystems aren't checked and repaired. Plugged in a flash drive with a live version, managed to successfully run --repair, and rebooted. Lasted about 15 minutes before flipping back to read only with the same error. Time to suck it up and revert. Started a clean reinstall. Got stuck because it crashed during disk setup with anaconda giving me a completely useless big python stack trace. Eventually figured out that it was unable to delete the btrfs filesystem that had errors so it just crashed instead. Wiped it using dd; nice that some reliable tools still survive. Finished the installation and am back up and running. Any of the rest of you have any experiences with btrfs? I'm sure that it works fine at large companies that can afford a team of disk babysitters. What benefits does btrfs provide that other filesystem formats such as ext4 and ZFS don't? Is it just a continuation of the "we have to do everything ourselves and under no circumstances use anything that came from the BSD world" mentality? So what's the future for filesystem repair? Does it look like the past? Is Ken's original need for dsw going to rise from the dead? In my limited experience btrfs is a BiTteR FileSystem to swallow. Or, as Saturday Night Live might put it: And now, linux, starring the not ready for prime time filesystem. Seems like something that's been under development for around 15 years should be in better shape. Jon

3 years, 10 months

Who said ...

by arnold＠skeeve.com

... DEC Diagnositcs would run on a beached whale ? Anyone remember and/or know? (It seems to apply to other manufacturer's diagnostics as well, even today.) Thanks, Arnold

3 years, 10 months

A language question

by Richard Salz

I hope that this does not start any kind of language flaming and that if something starts the moderator will shut it down quickly. Where did the name for abort(3) and SIGABRT come from? I believe it was derived from the IBM term ABEND, but would like to know one way or the other.

3 years, 10 months

Re: [TUHS] Who said ...

by norman＠oclsc.org

Clem Cole: I believe the line was: *"running **DEC Diagnostics is like kicking a dead whale down the beach.*" As for who said it, I'm not sure, but I think it was someone like Rob Kolstad or Henry Spencer. ===== The nearest I can remember encountering before was a somewhat different quote, attributed to Steve Johnson: Running TSO is like kicking a dead whale down the beach. Since scj is on this list, maybe he can confirm that part. I don't remember hearing it applied to diagnostics. I can imagine someone saying it, because DEC's hardware diags were written by hardware people, not software people; they required a somewhat arcane configuration language, one that made more sense if you understood how the different pieces of hardware connected together. I learned to work with it and found it no less usable than, say, the clunky verbose command languages of DEC's operating systems; but I have always preferred to think in low levels. DEC's diags were far from perfect, but they were a hell of a lot better than the largely-nonexistent diags available for modern Intel-architecture systems. I am right now dealing with a system that has an intermittent fault, that causes the OS to crash in the middle of some device driver every so often. Other identical systems don't, so I don't think it's software. Were it a PDP-11 or a VAX I'd fire up the diagnostics for a while, and have at least a chance of spotting the problem; today, memtest is about the only such option, and a solid week of running memtest didn't shake out anything (reasonably enough, who says it's a memory problem?). Give me XXDP, not just the Blue Screen of Death. Norman Wilson Toronto ON

3 years, 10 months

Re: [TUHS] Is it time to resurrect the original dsw (delete with switches)?

by norman＠oclsc.org

Not to get into what is soemthing of a religious war, but this was the paper that convinced me that silent data corruption in storage is worth thinking about: http://www.cs.toronto.edu/~bianca/papers/fast08.pdf A key point is that the character of the errors they found suggests it's not just the disks one ought to worry about, but all the hardware and software (much of the latter inside disks and storage controllers and the like) in the storage stack. I had heard anecdotes long before (e.g. from Andrew Hume) suggesting silent data corruption had become prominent enough to matter, but this paper was the first real study I came across. I have used ZFS for my home file server for more than a decade; presently on an antique version of Solaris, but I hope to migrate to OpenZFS on a newer OS and hardware. So far as I can tell ZFS in old Solaris is quite stable and reliable. As Ted has said, there are philosophical reasons why some prefer to avoid it, but if you don't subscribe to those it's a fine answer. I've been hearing anecdotes since forever about sharp edges lurking here and there in BtrFS. It does seem to be eternally unready for production use if you really care about your data. It's all anecdotes so I don't know how seriously to take it, but since I'm comfortable with ZFS I don't worry about it. Norman Wilson Toronto ON PS: Disclosure: I work in the same (large) CS department as Bianca Schroeder, and admire her work in general, though the paper cited above was my first taste of it.

3 years, 10 months

Re: [TUHS] Can't break line warning

by Douglas McIlroy

This may be due to logic similar to that of a classic feature that I always deemed a bug: troff begins a new page when the current page is exactly filled, rather than waiting until forced by content that doesn't fit. If this condition happens at the end of a document, a spurious blank page results. Worse, if the page header happens to change just after the exactly filled page, the old heading will be produced before the new heading is read. Doug

3 years, 10 months

Systematic approach to command-line interfaces

by Douglas McIlroy

> fork() is a great model for a single-threaded text processing pipeline to do > automated typesetting. (More generally, anything that is a straightforward > composition of filter/transform stages.) Which is, y'know, what Unix is *for*. > It's not so great for a responsive GUI in front of a multi-function interactive program. "Single-threaded" is not a term I would apply to multiple processes in a pipeline. If you mean a single track of data flow, fine, but the fact that that's a prevalent configuration of cooperating processes in Unix is an artifact of shell syntax, not an inherent property of pipe-style IPC. The cooperating processes in Rob Pike's 20th century window systems and screen editors, for example, worked smoothly without interrupts or events - only stream connections. I see no abstract distinction between these programs and "stuff people play with on their phones." It bears repeating, too, that stream connections are much easier to reason about than asynchronous communication. Thus code built on streams is far less vulnerable to timing bugs. At last a prince has come to awaken the sleeping beauty of stream connections. In Go (Pike again) we have a widely accepted programming language that can fully exploit them, "[w]hich is, y'know, what Unix is 'for'." (If you wish, you may read "process" above to include threads, but I'll stay out of that.) Doug

3 years, 10 months

Who is that in the framed photograph near the ceiling?

by norman＠oclsc.org

Steve Simon: once again i am taken aback at the good taste of the residents of the unix room. As a whilom denizen of that esteemed playroom, I question both the accuracy and the relevance of that metric. Besides, what happened to the sheep shelf? Was it scrubbed away after I left? And, Ken, whatever happened to Dolly the Sheep (after she was hidden to avoid upsetting visitors)? Norman Wilson Toronto ON No longer a subscriber to sheep! magazine

3 years, 11 months

Who is that in the framed photograph near the ceiling?

by Steve Simon

great picture. once again i am taken aback at the good taste of the residents of the unix room. -Steve

3 years, 11 months

Who is that in the framed photograph near the ceiling?

by William Cheswick

> I don't think anyone knows. Nobody relevant, I believe. > > -rob I understand that Dave Presotto bought that photo at a garage sale for $1. The photo hung in the Unix Room for years, at one point labeled “Peter Weinberger.” One day I removed it from its careful mounting and scanned in the photo. It bore the label “what, no steak?” The photo was stolen from a wall sometime after I left. The scanned image is at https://cheswick.com/ches/tmp/whatnosteak.jpeg ches

3 years, 11 months

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

TUHS