[TUHS] Is it time to resurrect the original dsw (delete with switches)?
krewat at kilonet.net
Tue Aug 31 02:46:59 AEST 2021
On 8/30/2021 9:06 AM, Norman Wilson wrote:
> A key point is that the character of the errors they
> found suggests it's not just the disks one ought to worry
> about, but all the hardware and software (much of the latter
> inside disks and storage controllers and the like) in the
> storage stack.
I had a pair of Dell MD1000's, full of SATA drives (28 total), with the
SATA/SAS interposers on the back of the drive. Was getting checksum
errors in ZFS on a handful of the drives. Took the time to build a new
array, on a Supermicro backplane, and no more errors with the exact same
I'm theorizing it was either the interposers, or the SAS
backplane/controllers in the MD1000. Without ZFS, who knows who
swiss-cheesy my data would be.
Not to mention the time I setup a Solaris x86 cluster zoned to a
Compellent and periodically would get one or two checksum errors in ZFS.
This was the only cluster out of a handful that had issues, and only on
that one filesystem. Of course, it was a production PeopleSoft Oracle
database. I guess moving to a VMware Linux guest and XFS just swept the
problem under the rug, but the hardware is not being reused so there's that.
> I had heard anecdotes long before (e.g. from Andrew Hume)
> suggesting silent data corruption had become prominent
> enough to matter, but this paper was the first real study
> I came across.
> I have used ZFS for my home file server for more than a
> decade; presently on an antique version of Solaris, but
> I hope to migrate to OpenZFS on a newer OS and hardware.
> So far as I can tell ZFS in old Solaris is quite stable
> and reliable. As Ted has said, there are philosophical
> reasons why some prefer to avoid it, but if you don't
> subscribe to those it's a fine answer.
Been running Solaris 11.3 and ZFS for quite a few years now, at home.
Before that, Solaris 10. I recently setup a home Redhat 8 server, w/ZoL
(.8), earlier this year - so far, no issues, with 40+TB online. I have
various test servers with ZoL 2.0 on them, too.
I have so much online data that I use as the "live copy" - going back to
the early 80's copies of my TOPS-10 stuff. Even though I have copious
amounts of LTO tape copies of this data, I won't go back to the "out of
sight out of mind" mentality.
Trying to get customers to buy into that idea is another story.
PS: I refuse to use a workstation that doesn't use ECC RAM, either. I
like swiss-cheese on a sandwich. I don't like my (or my customers') data
More information about the TUHS