please critique my home ZFS build

Quiet Mind · Post by **Quiet Mind** » Tue Nov 02, 2010 4:12 pm

washu wrote:The advantage of file based checksums/PAR files over something like ZFS is that they are file-system independent. You can copy your files along with the PARs to anything and still verify their integrity. Once ZFS does its mostly useless checksum verification and hands off the data to the OS it is no longer protected. Also in the case of PAR, files can actually be repaired if they are damaged, assuming enough redundancy is in the PARs.

Yeah, I'm starting to grok this. The higher up you go in the abstraction layer, the more leverage you get from redundancy. So instead of just redundant drives, I'll now be using redundant servers, and I'm also adding heterogeneity in the OS and file systems. I'm feeling a twinge of guilt about not using heterogeneous hardware too, but I'm not going to worry about that.

I'm curious about why you think ZFS's checksum verification is mostly useless. Surely your process with PARs could be automated, and should be, at the file system level. I imagine that was something ZFS was striving for. How did it fail?

Post by **MikeC** » Tue Nov 02, 2010 4:13 pm

Quiet Mind wrote:
Jay_S wrote:As long as you don't need to grow beyond 4 drives, the HP Proliant Microserver might work for you.
WOW! That's perfect! A whole system that supports ECC RAM for $300! And it's cooled with a 120mm fan! I just ordered two of them; thank you!

Not exactly just one 120mm fan. There's also a wee 40mm fan for the PSU. It's not silent even w/o a HDD, and if you get the same Seagate 7200.12 160gb drive that our sample has, it'll be kind of noisy with a distinct high pitched whine. But definitely a very good deal, and very sturdily, intelligently built, in a modular format.

Quiet Mind · Post by **Quiet Mind** » Tue Nov 02, 2010 4:27 pm

HFat wrote:Why not simply buy servers instead of NAS boxes or desktops? You want to use them as servers, right? You could buy used servers to save money but low-end servers similar to the build outlined in your first post are not that expensive compared to the cost of 16G of RAM (for example). If you bought an identical pair you could diagnose hardware issues more easily or move your drives from the one that's not working to the one that works without any worries about drivers and so on.

The Microserver is not as powerful as you originally wanted, by a large margin. And it's not hugely cheaper than a ML110 (for instance). It's a neat box but it's more of an office server than a basement server. CentOS works fine on it but it doesn't have the drivers to take advantage of the hardware's features.

I choose the Supermicro motherboard because it was the least powerful motherboard with ECC support that I knew would support Nexenta. Finding good hardware for Nexenta is tricky. I didn't need that much computing power.

The MicroServer is a neat box and I've fallen in love with it, so you're too late.

I like the easy access to drives and that it's got nothing more than what I need. I'm buying an identical pair as you suggested.

HFat wrote:By the way, make sure you have a good gigE switch or at least a 100M switch that has one or two gigE ports (cheaper and easier to get fanless). You might also want to set up a dedicated link between you servers.

Good idea, thank you. I ordered with the MicroServer the optional "Micro Server Remote Access Card Kit". I can't find ANY information about it online, but I'm hoping it's something like IPMI. It appears to include its own ethernet port, so if it's gigE and I use it intelligently maybe I can set up a dedicated link between the servers and not need to buy a switch.

HFat wrote:How is this Synology contraption and their proprietary schemes you might not know how to recover data from any better than bog-standard Linux on a server?

You're right. I played around with mdadm and LVM on Linux for an hour and convinced myself that I can configure my drives better than Synology can. I bought a second MicroServer to install Linux on instead of buying an OTS NAS like the Synology.

HFat wrote:... you obviously like overkill so have fun!

Thanks.

HFat · Post by **HFat** » Tue Nov 02, 2010 4:49 pm

Quiet Mind wrote:I ordered with the MicroServer the optional "Micro Server Remote Access Card Kit". I can't find ANY information about it online, but I'm hoping it's something like IPMI. It appears to include its own ethernet port, so if it's gigE and I use it intelligently maybe I can set up a dedicated link between the servers and not need to buy a switch.

You don't need a switch for a dedicated link. But you'd need 2 NICs per server or at least two NICs in your primary server and I doubt the remote management NIC can be used for anything else than... remote management.
Do tell what features the card has and how well they work when you get it, please. I asked about it on the Microserver thread. I didn't order one because I didn't want to spend money on an unknown quantity.

Quiet Mind wrote:Surely your process with PARs could be automated, and should be, at the file system level. I imagine that was something ZFS was striving for.

So far as I know, in typical configurations, ZFS does the same thing. It's somewhat brittle but it's got the features.
The thing is, no filesystem can do what creating archives (standard or with parity) does. The filesystem's job is to do what the OS tells it to do and that's that. But if an application creates hashes or parity, no user error, system malfunction or malicious action can corrupt data without deliberately replicating recreating the hash or parity for what has just been corrupted... an unlikely occurence. Well, the data can still get corrupted of course but you'd be able to detect it and possibly correct it. No filesystem can do that precisely because of its automation: it would recreate the hash or parity on each write. Automation has its downsides...

I don't put much stock in heterogenous software myself but you can still get heterogenous hardware where it matters by getting different brands and models of drives (within reason) by the way.

washu · Post by **washu** » Tue Nov 02, 2010 5:25 pm

Quiet Mind wrote: I'm curious about why you think ZFS's checksum verification is mostly useless. Surely your process with PARs could be automated, and should be, at the file system level. I imagine that was something ZFS was striving for. How did it fail?

Well, the main difference between a checksum as ZFS uses it and PARs is that checksums can only detect errors, they cannot repair them. When ZFS gets a checksum failure it has to resort to the mirror copy. In a common home use or low end server without multiple redundant parts, it's very likely that whatever caused the checksum failure in the first place affected both copies and they are both corrupt. ZFS at that point cannot give you the correct data.

It's not that the error detection in ZFS fails, it's that most of the errors it is claimed detect are already caught by the HDs themselves. It just doesn't provide any additional benefit in most situations.

I know I'm going to catch some flack for this, but when was the last time you ever had a modern HD return the wrong data, when you can guarantee the correct data was written in the first place? I'm not talking about bad sectors or other failures that cause the HD to fail to return data, but actually returning something different than what was successfully written without throwing an error. I'm sure lots of people think this has happened to them, but it's almost certain that it was a different hardware or OS failure which caused incorrect data to be written in the first place. There was a RAID level, RAID 2, designed for when HDs actually were this unreliable, It was basically the same idea as ECC, but striped across several drives. It fell out of use years ago because it isn't needed.

So, your HD fails and throws you a bad sector. ZFS will restore from your mirror. A normal RAID1 will restore from your mirror. Same goes for ZFS RAID-Z or RAID5/6. So ZFS has no real advantage in this case, despite claims of "bitrot". And while ZFS might let you work a bit longer with a drive that has bad sectors, why would you want to keep using a failing drive?

- Disk electronics or cable failure? If your drives work at all in this situation they would throw so many errors to the OS as to be practically useless.
- Disk controller and/or bus failure? If your storage system even works, it would likely duplicate write errors across drives. If you actually had separate disk controllers and/or buses ZFS error detection might help, but that would be a rare situation in a home server.
- Memory failure? Would likely duplicate write errors so again the error detection doesn't help much. There is a good chance the calculated checksums would match the corrupt data so ZFS would never show an error in this case. This is the most common case in a low end PC without ECC. PAR actually does a memory check before it does its calculations to prevent this.
- OS error? ZFS will happily calculate the checksums of whatever wrong data the OS hands to it.

The error detection in ZFS is mainly useful in one rather specific situation most likely found in very high end servers.

Quiet Mind · Post by **Quiet Mind** » Tue Nov 02, 2010 7:27 pm

morgue wrote:I will be following this build. Have you considered btrfs?

I'm very curious about Btrfs. I'm tempted to use it instead of ext4 but I can already feel washu slapping my wrist. I suppose we should wait for the fsck tool to be completed?

morgue wrote:EDIT: Link aggregation (sometimes called 802.3ad or LACP) is pretty sweet if you can afford it and need the speed. Your speed to the www will not improve but locally might make a huge difference. This is if you have the others connected with higher speed as well and/or several of them connecting at the same time. I just thought of this because that Supermicro has two gigabit ports which could net 2Gbit/s to the switch.

Cool, RAID for networking! I'll see if I can find a way to rationalize needing this.

Quiet Mind · Post by **Quiet Mind** » Tue Nov 02, 2010 7:30 pm

MikeC wrote:Not exactly just one 120mm fan. There's also a wee 40mm fan for the PSU. It's not silent even w/o a HDD, and if you get the same Seagate 7200.12 160gb drive that our sample has, it'll be kind of noisy with a distinct high pitched whine. But definitely a very good deal, and very sturdily, intelligently built, in a modular format.

Oh, so we're not at nirvana yet. That's OK, I'm going to throw them into my basement anyway. I'm glad to hear they're well built.

HFat · Post by **HFat** » Wed Nov 03, 2010 9:29 am

washu wrote:I know I'm going to catch some flack for this, but when was the last time you ever had a modern HD return the wrong data, when you can guarantee the correct data was written in the first place? I'm not talking about bad sectors or other failures that cause the HD to fail to return data, but actually returning something different than what was successfully written without throwing an error. I'm sure lots of people think this has happened to them, but it's almost certain that it was a different hardware or OS failure which caused incorrect data to be written in the first place.

ZFS (if it works well) should detect any error originating not only in the drive itself but between the RAM and the drive. Does it matter where it originated? What matters is that there are errors.
We don't know what's written on our drives so we can't answer your question but the last time I stumbled on hardware which would happily write and read back bad data was in 2007. This was a case of massive failure so it was wasy to detect but I might not have noticed before it was too late if I hadn't been careful. This is not the case ZFS evangelists like to talk about I would rather have a filesystem which detects this automatically (unless I want the best possible performance which I usually don't).
Since you agree that such massive errors happen why do you believe similar errors can't happen in a less spectacular and harder to detect fashion? There are a lot of components besides the surface of a hard drive which could introduce errors. These errors would be hard to detect if they only happened on one bit out of ten trillions or something and I never went out of my way to try to detect them but that's because I don't routinely handle terabytes of data. The people who do say they're detecting a small amount of errors. Some of these errors may be traced back to RAM errors, RAID controllers and the like but I would very much want to be able to detect such errors if I was in such a situation regardless of the cause. ZFS should be able to detect some of these errors (probably most of them on decent servers, even low-end ones).

washu · Post by **washu** » Wed Nov 03, 2010 11:02 am

HFat wrote: ZFS (if it works well) should detect any error originating not only in the drive itself but between the RAM and the drive. Does it matter where it originated? What matters is that there are errors.

My point was that any such error will either be massive and immediately noticeable (like your case below) or very likely will corrupt all copies of the data making ZFS unable to repair. PAR could deal with the second case in some situations.

The data over interface cables (SATA or PATA after UDMA) is error checked. PCI express has a CRC, even PCI has parity. The controller is the most likely suspect, but again it would either be a massive failure, or affect all drives equally making recovery impossible.

HFat wrote: We don't know what's written on our drives so we can't answer your question but the last time I stumbled on hardware which would happily write and read back bad data was in 2007. This was a case of massive failure so it was wasy to detect but I might not have noticed before it was too late if I hadn't been careful. This is not the case ZFS evangelists like to talk about I would rather have a filesystem which detects this automatically (unless I want the best possible performance which I usually don't).

We've be using file-systems since their invention without built in error checking of this type because it isn't needed and it's the wrong place for it. Hard drives already have error detection and error correction far stronger than ZFS implements.

HFat wrote: Since you agree that such massive errors happen why do you believe similar errors can't happen in a less spectacular and harder to detect fashion? There are a lot of components besides the surface of a hard drive which could introduce errors. These errors would be hard to detect if they only happened on one bit out of ten trillions or something and I never went out of my way to try to detect them but that's because I don't routinely handle terabytes of data. The people who do say they're detecting a small amount of errors. Some of these errors may be traced back to RAM errors, RAID controllers and the like but I would very much want to be able to detect such errors if I was in such a situation regardless of the cause. ZFS should be able to detect some of these errors (probably most of them on decent servers, even low-end ones).

I understand what you are saying, but I really don't think it's a possibility with modern hard drives. Pretty much everything that could fail on a modern HD is either error checked or would cause a massive unusable failure. If this was actually a problem, someone would have noticed and made a solution long before ZFS. The only one was RAID 2 and it was dropped after HDs started implementing ECC.

Even the most common case of data corruption, RAM errors, isn't handled by ZFS. Say you have 1 bad bit in your RAM somewhere. Since ZFS is such a RAM hog, the most likely case is that the bad bit is in the disk cache. If ZFS then hands that data to your OS, it isn't checked and is now corrupt. If it is in a block to be written then ZFS will checksum it, bad bit and all, and write it out. Now your disk has corrupt data with a correct checksum. If by rare chance the bad bit is in the checksum area you will get a wrong checksum written to all your disks, so none of them will report the "correct" data and you have still lost. If the bad bit is in the code area then all bets are off on what will happen.

Quiet Mind · Post by **Quiet Mind** » Wed Nov 03, 2010 9:34 pm

Wow, washu, well-articulated and thought-provoking, thank you.

Does this mean that the ZFS "scrub" operation is useless?

You write that error checking at the file system level is the wrong place for it, implying that the right place is at the hard drive level. However, you're also encouraging the use of PAR, which is error checking higher than the file system level. Hmm. I suppose we should have error checking in three places: at the lowest levels (digital circuits vs. analog), at the highest level (PAR), and wherever there are still likely to be errors. Perhaps, even with all of the error checking mechanisms in place lower than the file system, there's an argument to be made that there are still errors worth checking for at the file system level, and thus ZFS has utility.

In other news, I just added a UPS to my build to protect my two servers: APC Smart-UPS 750VA LCD 120V

HFat · Post by **HFat** » Thu Nov 04, 2010 6:08 am

washu wrote:My point was that any such error will either be massive and immediately noticeable (like your case below) or very likely will corrupt all copies of the data making ZFS unable to repair.

How do you know that?
Unlike ECC, the main value of ZFS lies in error detection anyway.

The fact of the matter is that people have been reporting non-massive failures in the real-world. Some were detected by ZFS and some were detected by testing on systems without ZFS (or similar). In some of these cases, ZFS should have been able to detect these errors if it had been deployed on the systems tested. The people who reported these occurences could be mistaken or they could be lying of course but some of these people have credibility. You don't since you're anyonymous. You might be right but your arguments are empty. Do you have a solid argument based on logic or evidence? None of the error detection (and correction) technologies you have talked about guarantee that there will be no corruption that ZFS can detect.

If you want to argue against ZFS, I think you'd do better by comparing the probability of an error being detected by ZFS to other risks. I don't believe that ZFS addresses the most dangerous risks myself. Not only would the argument be more convincing in my opinion but it would also be more constructive because it would give your readers alternative solutions they can implement instead of implementing ZFS. I'm not saying you haven't done any of that but doing more would be more productive than claiming that every report of non-massive errors is wrong or something.

washu wrote:We've be using file-systems since their invention without built in error checking of this type because it isn't needed

And we've never handled so much data. The manufacturers of many components state error rates as a function of the amount of data handled.
Your argument could be used against any kind of progress.

washu wrote:Even the most common case of data corruption, RAM errors, isn't handled by ZFS.

ZFS is not recommended for systems without ECC RAM to begin with, right?
But even RAM errors could potentially be detected by ZFS as you point out yourself (even if you understate the probability). Such a detection would be valuable information.

washu · Post by **washu** » Thu Nov 04, 2010 8:14 am

HFat wrote:If you want to argue against ZFS, I think you'd do better by comparing the probability of an error being detected by ZFS to other risks. I don't believe that ZFS addresses the most dangerous risks myself. Not only would the argument be more convincing in my opinion but it would also be more constructive because it would give your readers alternative solutions they can implement instead of implementing ZFS. I'm not saying you haven't done any of that but doing more would be more productive than claiming that every report of non-massive errors is wrong or something.

I'm not arguing against ZFS as a whole, just the error checking. Even then, I'm not saying the error checking is completely useless, just in most of the situations the proponents claim. You say it yourself, ZFS doesn't address the most dangerous risks. ZFS only does error checking where there is already massive error checking in place. It doesn't do anything in the areas data is actually likely to get corrupted. It's a false sense of security because it puts error checking in the wrong place. If you are really paranoid about data integrity the file-system is the wrong place to do it.

I have given examples of better alternatives, such as PAR. There is also ICE ECC, but it's Windows only. Some archivers like RAR also have the option of built in error recovery.

HFat wrote: And we've never handled so much data. The manufacturers of many components state error rates as a function of the amount of data handled.
Your argument could be used against any kind of progress.

Take a look at those error rates. Note they are all un-correctable, not un-detectable. Again, if you HD throws an error, ZFS can do nothing except report it, and can only recover if it has another good copy. Exactly the same as the RAID systems we have all be using for years. The only thing ZFS can do that a conventional RAID could not is map around a bad sector and keep all your data mirrored, but why would you want to continue using a bad disk?

Edit: I went and looked up the numbers to be sure. Worst case for ZFS is a 128K block with a 256 bit checksum. For 128K stored on a conventional hard drive, it would have 12.8K of ECC, or 102400 bits. You really think ZFS is doing a better job? Even in the best case, a 512 byte block, ZFS still has 256 bit compared to 2500 on a hard drive. If you hard drive somehow "misses" an error here and there, there is at least an order of magnitude greater chance that ZFS would miss one.

Progress would be improving those error rates to match the increasing volume of data so we don't lose so often. Arguing against a redundant band-aid solution in the wrong place is not arguing against progress.

HFat wrote: ZFS is not recommended for systems without ECC RAM to begin with, right?
But even RAM errors could potentially be detected by ZFS as you point out yourself (even if you understate the probability). Such a detection would be valuable information.

I think you misunderstood my example. In a 1 bit error in the checksum, ZFS would report a failure in good data. In effect ZFS would actually be denying you access to your correct data. It's an extremely unlikely situation though. Much more likely is that it will put a correct checksum on bad data, IE not reporting an error on corrupt data.

Yes ECC is recommended, but that doesn't mean ZFS will always be used on such systems. It's claimed feature set makes it very tempting for home / low end users. The other major failing of ZFS that would most likely affect low end systems is it's massive RAM use. Do you know what ZFS does when it runs out of memory? It crashes your system, hard. No orderly shutdown or flushing of buffers, just a kernel panic. That combined with the lack of recovery tools is a very scary situation.

washu · Post by **washu** » Thu Nov 04, 2010 8:30 am

Quiet Mind wrote: Does this mean that the ZFS "scrub" operation is useless?

It's not useless, just not in any way unique. Lots of conventional RAID implementations have a "verify" or something similar to scrub which would do the same thing. Even on a single drive, you could use chkdsk or whatever tool to scan for unreadable data/bad sectors and at least know what data was gone.

Quiet Mind wrote: You write that error checking at the file system level is the wrong place for it, implying that the right place is at the hard drive level. However, you're also encouraging the use of PAR, which is error checking higher than the file system level. Hmm. I suppose we should have error checking in three places: at the lowest levels (digital circuits vs. analog), at the highest level (PAR), and wherever there are still likely to be errors. Perhaps, even with all of the error checking mechanisms in place lower than the file system, there's an argument to be made that there are still errors worth checking for at the file system level, and thus ZFS has utility.

I say that error checking in the file-system is the wrong place for two reasons. 1. It's redundant when the hard drive already has it. 2. It gives a false sense of security because most data corruption errors happen elsewhere.

PAR is error checking at the data level. As long as the PARs go along with your files then you can always check and potentially recover your data. If for example you copy your data from ZFS to a FAT thumbdrive. ZFS is done once it hands data to the OS, long before (in computer terms) it even touches the FAT file-system. If you copy PARs along with your data files then you can check them at any point, even on a different computer that has no concept of ZFS. PAR doesn't run on everything, but runs on way more OSes than ZFS.

HFat · Post by **HFat** » Thu Nov 04, 2010 10:35 am

I agree about the false sense of security. And about warning people against using ZFS (or other high-end filesystems like xfs for that matter) on dubious hardware or if they don't have adequate skill, good backup procedures and so on.

I also agree ideal systems would be preferable to band-aid solutions. But ideal systems are not available and bandaids are. With that logic, you could argue against all RAID except RAID0. We should improve drive reliability instead!

Yes, Parchive is obviously superior to ZFS for archiving. There is a management burden if you want to use it for live but fairly static data but it's more secure. But using Parchive is impractical or impossible when you're talking about dynamic data.
Unless you want to patch or rewrite applications that assume block devices are reliable, you sometimes have a use for a features like ZFS's at the filesystem level or at a lower level. Do you have any solutions which are better than ZFS? I'm not using ZFS myself so I'd be interested. Scrubbing through RAID (for instance) is a partial solution, yes. But ZFS sometimes replaces RAID!

You can't have it both ways: either ZFS is redundant because drives have error correction which means that the drives' error correction is unreliable or (more likely) ZFS detects problems which do not come from interpreting what's on the platters in which case it's not redundant (or at least not redundant because drives have an ECC feature). The errors are real or a lot of people are lying.
Doing error detection/correction twice is better than once if it's not done well enough the first time. ECC on drives is needed to correct a lot more error than ZFS's feature so you can't compare bits like that. It doesn't matter if ZFS would miss what the drives' ECC catch because it never sees what is caught upstream.

In many cases, I would want ZFS to detect an error in good data because my RAM subsystem is defective. I did not misunderstand. I need to know if there is a problem even if it's not the drives which caused it!

Quiet Mind · Post by **Quiet Mind** » Thu Nov 04, 2010 10:55 am

washu wrote:Do you know what ZFS does when it runs out of memory? It crashes your system, hard. No orderly shutdown or flushing of buffers, just a kernel panic. That combined with the lack of recovery tools is a very scary situation.

WHAT?!

I just read through the FreeBSD mailing list archive and got the impression that this was true in FreeBSD 7 but I don't see any evidence that it's still true in FreeBSD 8. Can you confirm that?

washu · Post by **washu** » Thu Nov 04, 2010 11:12 am

HFat wrote:I also agree ideal systems would be preferable to band-aid solutions. But ideal systems are not available and bandaids are. With that logic, you could argue against all RAID except RAID0. We should improve drive reliability instead!

That's not what I meant at all. Of course nothing is ideal, but a redundant solution is redundant. We already have a good solution to the problem of hard drive reliability, it's called RAID (except 0 of course) and backups. ZFS adds nothing here except complexity, more failure modes and a performance hit (admittedly minor).

HFat wrote: Do you have any solutions which are better than ZFS? I'm not using ZFS myself so I'd be interested.

As of right now, yes. The better solutions are well tested file-systems that have been around for years combined with good hardware. Take your pick depending on your OS and needs. Even if I am wrong and the checksumming in ZFS actually is needed, I would still say it is far to new and untested to use for critical data. As I've said before, it has pretty much no repair tools and still will crash hard if it runs out of memory. There are plenty of horror stories about ZFS losing all your data, not just a small part. ZFS is a huge, complicated bit of code, far bigger and more complex than other file-systems. It can and does have more bugs than simpler systems. Sometimes simpler is better. Now this may change in the future as ZFS improves, that's why I said right now. However, given that it is now in Oracle's hands, I wouldn't hold my breath for improvements quickly if at all.

HFat wrote: The errors are real or a lot of people are lying.

I don't think people are lying, just not investigating enough. I would say a large portion of the errors ZFS throws would have been caught anyway. ZFS gives a checksum error and no one checks the block failure on the HD in dmesg. We also have to take into account systems without ECC and ZFS itself screwing up.

HFat wrote: In many cases, I would want ZFS to detect an error in good data because my RAM subsystem is defective. I did not misunderstand. I need to know if there is a problem even if it's not the drives which caused it!

Fair enough, but as I said this particular failure mode would be extremely rare. Much more likely for ZFS to not notice and you continue unknowingly using defective RAM.

washu · Post by **washu** » Thu Nov 04, 2010 11:18 am

Quiet Mind wrote: WHAT?!

I just read through the FreeBSD mailing list archive and got the impression that this was true in FreeBSD 7 but I don't see any evidence that it's still true in FreeBSD 8. Can you confirm that?

It's much much better in FreeBSD 8 than in 7, but it's still possible. ZFS is completely in the kernel and cannot use swap. If the kernel runs out of memory or ZFS just screws up, you get a panic and your system is dead. Even if it's not ZFS that caused the crash, it reacts very badly to improper shutdowns.

HFat · Post by **HFat** » Thu Nov 04, 2010 1:53 pm

ZFS isn't the only thing which crashes if you runs it without having enough memory. But the OP plans to use 8G I think... is that not a lot more than enough?

washu wrote:but a redundant solution is redundant. We already have a good solution to the problem of hard drive reliability, it's called RAID (except 0 of course) and backups.

You have not shown that ZFS's error detection is redundant in the real world in which people are getting bad data from their block devices.
I would rather have something like ZFS than RAID. I agree ZFS has issues, some of which are not likely to go away anytime soon but I like the concept. Hopefully we'll get a different implementation someday and I'll be glad to get rid of RAID. But maybe you're right and the filesystem shouldn't replace the volume manager. LVM has some of RAID and ZFS's features already. But LVM has issues too (which may have been fixed since the last time I looked).

washu wrote:The better solutions are well tested file-systems that have been around for years combined with good hardware.

Which well-tested FS has the features of ZFS? How do we know what hardware will not return bad data once in a blue moon?

washu wrote:I would say a large portion of the errors ZFS throws would have been caught anyway. ZFS gives a checksum error and no one checks the block failure on the HD in dmesg. We also have to take into account systems without ECC and ZFS itself screwing up.

There are apparently systems with ECC and without ZFS whose block devices return a tiny amout of bad data (detected by testing). Even systems which have cost an untrivial amount of money...

washu · Post by **washu** » Thu Nov 04, 2010 2:23 pm

HFat wrote:You have not shown that ZFS's error detection is redundant in the real world in which people are getting bad data from their block devices.

And you haven't shown any proven examples of it happening.

If this really was a problem, someone would have made a solution years ago. Wait, they did and dropped it when it became unessisary.

Before you pull out the "more data now" card again, remember that huge storage has been around for a while in SANs and other large arrays. If you pull 1 TB of data from a single drive, or 100 10 GB drives because that's all you had, you still pulling 1 TB. Did they use a checksumming file-system on those old SANs? No, because it wasn't needed. And there were SANs much bigger than 1 TB when a 10 GB drive was big. In fact, those old 10 GB drives (more likely 9.1 GB SCSI or FC) had worse error rates than today's cheap 1 TB drives. Yes, I did check the specs on a 9.1 GB FC drive VS a 1 TB WD Green.

HFat wrote: Which well-tested FS has the features of ZFS? How do we know what hardware will not return bad data once in a blue moon?

So you are saying you would rather have the new file-system with the most features over simpler ones with proven track records?

HFat wrote: There are apparently systems with ECC and without ZFS whose block devices return a tiny amout of bad data (detected by testing). Even systems which have cost an untrivial amount of money...

So show some real examples. And show how those errors were detected and how a normal non-ZFS system would have missed them.

HFat · Post by **HFat** » Fri Nov 05, 2010 4:26 pm

We really have more data now (arguing this point is silly, frankly) and not all problems have easy solutions.
Perhaps you'll have an easier time looking at examples in cases where ZFS is less safe than simple filesystems. We know there are drives which lie about committing data to their platters (or equivalent). People have been losing data and availability over this for years. Why hasn't this been fixed? Why could you easily lose data with something as pricey as the X25-M? These are well-known problems. The Microservers the OP has ordered disable write caches by default. This won't work with all drives but it will probably make most drives less prone to causing data loss and doesn't require admins or software to be clever about the limitations of real-world block devices. But what about the performance hit? There are tradeoffs with performance and tradeoffs with price. There are vendors who cripple their products to enforce market segmentation, vendors who lie and so on. By and large, people seem to understand these imprefections and limitations and therefore the point of not trusting hardware, of double-checking and redundancy.

In order to avoid repeating tedious discussions which happened elsewhere already, why don't we look at the examples cited by ZFS proponents like the CERN study? There are ready-made arguments everyone can find on the web for why it's wrong to use this as an argument for ZFS and so on so there's no point in repeating them here.
Some of the errors detected at the CERN were hard to pin down but, if they are to be believed, they mainly had problems with some WD drives in their hardware RAID arrays (where have we heard that before?) and they were occasionally getting bad data from them (or perhaps from a bizarre bug in their RAID controllers that was only triggered by a particular WD firmware)... not enough that it would be obvious but enough that it could be tracked down.
Like Parchive (or just about any archiving software really), ZFS would obviously have been able to detect such a hardware problem (if configured to do so anyway). There's software that blindly trusts block devices and software that verifies. ZFS is by no means the only software that verifies but it's a nice filesystem (or volume manager) feature to have if you can live with the performance hit because it means all the data gets verified automatically. What's so complicated to understand about that?

edit: to answer the question, as I said already, I don't use ZFS so no, I won't necessarily go for the fancier filesystem. But in some case I would. And I've been burned by fancy filesystems already, mind you. Even a well-tested one... not well-tested on my hardware however but I didn't think about that! It's not so simple.

kamina · Post by **kamina** » Sat Nov 27, 2010 1:46 pm

MikeC wrote:
Quiet Mind wrote:
Jay_S wrote:As long as you don't need to grow beyond 4 drives, the HP Proliant Microserver might work for you.
WOW! That's perfect! A whole system that supports ECC RAM for $300! And it's cooled with a 120mm fan! I just ordered two of them; thank you!
Not exactly just one 120mm fan. There's also a wee 40mm fan for the PSU. It's not silent even w/o a HDD, and if you get the same Seagate 7200.12 160gb drive that our sample has, it'll be kind of noisy with a distinct high pitched whine. But definitely a very good deal, and very sturdily, intelligently built, in a modular format.

Are you planning on doing a review? How noisy is the 40mm fan? Do you think it could be made silent?

Post by **MikeC** » Sat Nov 27, 2010 1:58 pm

kamina wrote:Are you planning on doing a review? How noisy is the 40mm fan? Do you think it could be made silent?

Yes.

Have not isolated just that fan yet. Measured only whole system so far.

Probably can be made effectively silent -- or very quiet. It's a very low power CPU/PC so...

kamina · Post by **kamina** » Sun Nov 28, 2010 12:03 am

MikeC wrote:
kamina wrote:Are you planning on doing a review? How noisy is the 40mm fan? Do you think it could be made silent?
Yes.

Have not isolated just that fan yet. Measured only whole system so far.

Probably can be made effectively silent -- or very quiet. It's a very low power CPU/PC so...

Thanks, any ETA on when you'll publish the review?

nerdmagic · Post by **nerdmagic** » Sun Dec 05, 2010 7:25 pm

Quiet Mind, how's your server implementation going? I seriously considered one of those little HP microservers but wanted something a little faster. I'm also rebuilding my home quiet/small fileserver now and will post some info and questions separately.

Unfortunately it's possibly too late for me to offer any insight but I was an early ZFS adopter (in production in 2006 on zpool version 5 in Solaris 10/Sparc) and I also have some experience with ZFS and Macs, having used it on fileservers for my wife's video design studio. Their Macs mount via SMB directly from the fileservers.

HFat · Post by **HFat** » Sun Jan 23, 2011 2:13 pm

HFat wrote:the last time I stumbled on hardware which would happily write and read back bad data was in 2007.

update: I just had a bunch of mismatches on fairly unimportant archived data.
I have two copies but I don't know which file (if any) is good in all cases because I don't have hashes for everything (yeah, I should know better). I do have some hashes so I'll be able to make an educated guess.
I would probably have caught this earlier if I had been using a checksumming filesystem. As explained above, it's not the best solution but it protects the careless...

Quiet Mind · Post by **Quiet Mind** » Mon Jan 24, 2011 4:30 pm

nerdmagic wrote:Quiet Mind, how's your server implementation going? I seriously considered one of those little HP microservers but wanted something a little faster. I'm also rebuilding my home quiet/small fileserver now and will post some info and questions separately.

Unfortunately it's possibly too late for me to offer any insight but I was an early ZFS adopter (in production in 2006 on zpool version 5 in Solaris 10/Sparc) and I also have some experience with ZFS and Macs, having used it on fileservers for my wife's video design studio. Their Macs mount via SMB directly from the fileservers.

nerdmagic, I'm sorry I didn't see your post earlier. I'm normally notified by responses with an email message so I'm not sure how I missed it.

I'm very happy with my implementation. I plan to document the whole thing and post it here, but haven't put the finishing touches on yet because I've also been in the middle of moving to another state, so I've been unusually busy.

Why did you use SMB instead of AFP? Wouldn't that have been more Mac friendly?

please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build

Re: please critique my home ZFS build