If you want to argue against ZFS, I think you'd do better by comparing the probability of an error being detected by ZFS to other risks. I don't believe that ZFS addresses the most dangerous risks myself. Not only would the argument be more convincing in my opinion but it would also be more constructive because it would give your readers alternative solutions they can implement instead of implementing ZFS. I'm not saying you haven't done any of that but doing more would be more productive than claiming that every report of non-massive errors is wrong or something.
I'm not arguing against ZFS as a whole, just the error checking. Even then, I'm not saying the error checking is completely useless, just in most of the situations the proponents claim. You say it yourself, ZFS doesn't address the most dangerous risks. ZFS only does error checking where there is already massive error checking in place. It doesn't do anything in the areas data is actually likely to get corrupted. It's a false sense of security because it puts error checking in the wrong place. If you are really paranoid about data integrity the file-system is the wrong place to do it.
I have given examples of better alternatives, such as PAR. There is also ICE ECC, but it's Windows only. Some archivers like RAR also have the option of built in error recovery.
And we've never handled so much data. The manufacturers of many components state error rates as a function of the amount of data handled.
Your argument could be used against any kind of progress.
Take a look at those error rates. Note they are all un-correctable
, not un-detectable
. Again, if you HD throws an error, ZFS can do nothing except report it, and can only recover if it has another good copy. Exactly the same as the RAID systems we have all be using for years. The only thing ZFS can do that a conventional RAID could not is map around a bad sector and keep all your data mirrored, but why would you want to continue using a bad disk?
Edit: I went and looked up the numbers to be sure. Worst case for ZFS is a 128K block with a 256 bit checksum. For 128K stored on a conventional hard drive, it would have 12.8K of ECC, or 102400 bits. You really think ZFS is doing a better job? Even in the best case, a 512 byte block, ZFS still has 256 bit compared to 2500 on a hard drive. If you hard drive somehow "misses" an error here and there, there is at least an order of magnitude greater chance that ZFS would miss one.
Progress would be improving those error rates to match the increasing volume of data so we don't lose so often. Arguing against a redundant band-aid solution in the wrong place is not arguing against progress.
ZFS is not recommended for systems without ECC RAM to begin with, right?
But even RAM errors could potentially be detected by ZFS as you point out yourself (even if you understate the probability). Such a detection would be valuable information.
I think you misunderstood my example. In a 1 bit error in the checksum, ZFS would report a failure in good data
. In effect ZFS would actually be denying you access to your correct data. It's an extremely unlikely situation though. Much more likely is that it will put a correct checksum on bad data, IE not reporting an error on corrupt data.
Yes ECC is recommended, but that doesn't mean ZFS will always be used on such systems. It's claimed feature set makes it very tempting for home / low end users. The other major failing of ZFS that would most likely affect low end systems is it's massive RAM use. Do you know what ZFS does when it runs out of memory? It crashes your system, hard. No orderly shutdown or flushing of buffers, just a kernel panic. That combined with the lack of recovery tools is a very scary situation.