HFat wrote:
You want to store 4T on their servers for $5 a month? That'd be some hardcore leeching. Did you think about how much that would cost them? Maybe they can make it work if they have enough customers who pay $5 to upload 10G but you'd be operating on their sufferance. The day they decide they're losing too much money because of people like you, they'll crack down.
I have no desire to leech. I just sent them an email asking them this question directly, so we'll see what they think.
HFat wrote:
Do you realize it would take you over a year to upload 4T over your link at max speed (which you'll never reach) 24/7? Even if they agreed to allow you to send them a couple of drives in the mail and to copy that to their system, there's no way you can back up large changes in anything close to 15 minutes. If you changed 0.1% of your capacity, uploading that would require at least 10 hours.
Yes, but that's not my use case. I have about 300 GB now already in their system, and I generally don't generate more than 1 MB/s of data at a time. It would be interesting to measure my degree of data "staticness" in terms of MB/s somehow.
HFat wrote:
If it wasn't rigged, why wouldn't they let you upload using your own software, making their service compatible with all platforms? Don't take "unlimited" offers litteraly. They can only work as long as people don't use much of the resource and there are ways to enforce that.
In other words, your remote backups are not complete or reliable and you need more frequent local backups before you can talk about "utmost protection".
The benefit of their software is ease of use, and they'd have broader support costs if they let people use their own software. That's a different market. Although I like using my own software (especially when it comes to encryption), I'm drawn to CrashPlan's fantastic ease of use. I'm waiting to hear back their answer to my question, but
elsewhere they've written, "We do reserve the right to introduce a limit, however if we do, we'll give you fair warning. Our guess is, we can add disk faster than you can create versions."
HFat wrote:
Which is why I'm saying you should start by dropping triple-mirroring and keeping drives out of your array where they are somewhat protected (not as well as if they were disconnected). The only thing you need to change in order to achieve that is to split the extra drives from the array after resilvering instead of before splitting. You might also resilver daily or something. And it would be prudent to physically hotswap the drives more often (if you only have one server).
Let me be clear about the triple-mirroring. I'm not suggesting triple-mirroring because I think it's necessary to protect my data. I believe a normal mirror is fine for that. I'm suggesting triple-mirroring only to save my self a step every month when I split the mirrors:
This is what I believe you're suggesting:
Run single parity mirrors and once a month, do this:
1. Add extra drives to create double parity mirrors.
2. Wait for resilvering.
3. Split mirrors.
4. Remove drives.
This means I'll have to wait for the resilvering process to complete, whereas in my plan I won't.
Regarding resilvering daily, I plan to do it weekly, and in ZFS vernacular it's called "scrubbing".
HFat wrote:
Alternatively, you might do automatic remote incremental backups overnight (for instance). What an incremental would involve is to make a file containing all the changes since the last version you disconnected from your server and to upload that file. This is (a lot) more reliable than the way you plan to use Crashplan and would be quicker to restore as well. If ZFS doesn't have a facility to do that (with the help of a snapshot), it can be done on a second server or by keeping the same version of your data you have offline on your third set of drives (in which case you wouldn't have a use for hotplugging and racks anymore because you'd do offline backups from the third set of drives to an external dock over eSATA or something). Or you could find backup software which can keep an inventory of the files (with hashes) in the last version you took offline and dispense with the need to keep an additional copy of that version online at the cost of larger uploads. Possibly Crashplan can do this (I have no idea).
Your thinking has influenced me and I've decided to add another CrashPlan backup destination onto the computer in my office (CrashPlan's software allows me to do this for free). I'll still be limited by the speed of my home uplink but it will be another copy with full version histories and if I need to restore a lot of data at once, all I'll have to do is drive to my office.
ZFS does have a snapshot facility but doesn't provide (as far as I can tell) the ease of use that CrashPlan does, with the automatic snapshots every 15 minutes, encryption, and sending offsite.
HFat wrote:
Splitting your array will not affect performance much but resilvering will. And resilvering times will also be affected by regular usage of the array. But that's not much of an issue if your gear is overkill for your needs to begin with of course. edit: Note that splitting while the array is in use might result in an inconsistent splitted copy.
I believe the performance impact of resilvering and scrubbing will be mitigated by my overkill gear. Splitting the array while it's in use is a mild concern that's nagging at me. The Solaris ZFS Administration Guide
admonishes me "Data and application operations should be quiesced before attempting a zpool split operation", which I don't plan on doing. I notice it also advises, "A good way to keep your data redundant during a split operation is to split a mirrored storage pool that is composed of three disks so that the original pool is comprised of two mirrored disks after the split operation", which validates my choice of a triple mirror.
Another way to achieve a split using ZFS snapshots that might be safer would be like this:
Code:
$ zpool import offline
$ zfs snapshot online/
[email protected]$ zfs send -I online/
[email protected] online/
[email protected] | zfs receive -F offline/volume
$ zpool export offline
This would send only the incremental changes made during the last two months to the offline drive. Hmm. This would keep monthly snapshots on the running copy, require fewer writes than a resilver, and wouldn't need a triple mirror. I might do this instead.
HFat wrote:
ZFS will not protect you against all sorts of bit errors. That's a misconception. Your Mac Mini is of course a potential cause of errors but they are other potential sources of errors (see my previous posts). End-to-end verification with hashes created and verified by the client can handle that but no server-side technology can possibly do it. Manual creation of such hashes is best but automatic verfication is fine. ZFS takes care of errors introduced by drives which is valuable but it's not a panacea.
Yes, of course. Perhaps I shouldn't have used the word "utmost" in my original post. I'm looking for value, not a panacea.
HFat wrote:
This is why I'm saying that ZFS is harder to justify for static data. There are simpler and better ways to protect the integrity of static data. Which is not to say that ZFS is useless of course, only that it might not be worth the trouble (depending on the data and the resources available to protect it).
What are they? Will they work for quasi-static data (my use case)?
HFat wrote:
edit: nobody mentioned it so far but ZFS does not like to lose power so it would be prudent to have a battery unless your power is rock-solid.
Good catch; I'm adding a UPS to my shopping list.