backup to guard against bit rot.

Got a shopping cart of parts that you want opinions on? Get advice from members on your planned or existing system (or upgrade).

Moderators: NeilBlanchard, Ralf Hutter, sthayashi, Lawrence Lee

Post Reply
Nicias
Posts: 80
Joined: Fri Apr 08, 2011 11:14 am

backup to guard against bit rot.

Post by Nicias » Fri Dec 12, 2014 10:37 am

Not related to silent stuff at all, but I thought some of you might have good advice anyway, so I'll ask here.

I'm not sure if this is the best forum for this, so feel free to move this topic.

I'm looking to revamp my backup system.

Currently I backup the following items to a pair of disks on a monthly basis (1 disk each month)

A) Work and personal files ~6Gb stored in git repos. All repos are on at least two machines.
B) System and config files (~500GB) for three machines.

One of my two backup disks is 2 TB on that disk I also put

C) Rips of DVDS (~1.5TB) mostly just inconvenient to replace. (Every two months)

All of these backups are being done with rsync.

So a couple of things have made me reconsider this plan.
1) I'm pretty close to full on the 2TB drive.
2) I'm getting concerned that I'm not protecting against bit-rot.

The solution to 1) is clear. Purchase new drive, or delete files.
2) Is not as clear. One of the machines does a "git gc" on each repo on a montly basis. Does that provide any protection against bit rot?
I'm clear that I'm doing nothing to protect against bit rot on C), and I'm not too concerned about it on B).

So, I guess my questions are this:

I'm also filling up the drives on the system that has the DVD rips and git repos, so I'm getting a new drive for that as well. That gives me the opportunity to change file-systems or whatever. (All machines are linux except one Mac. Data is on a variety of filesystems, most data is on ext4)

What can I do to protect against bit-rot? This seems like a case for maybe using btfrs but I'm not sure.

Any advice?

cerbie
Posts: 46
Joined: Sat Dec 28, 2013 3:21 pm

Re: backup to guard against bit rot.

Post by cerbie » Fri Dec 12, 2014 12:58 pm

Git will not protect you from the bit rot, but it can allow you to detect it, if the file checksums have changed, but the data has not. If the data has also changed, there's no way to tell corruption like that so simply. You also have to be looking for it. With BTRFS or ZFS, you may find it, but you still then have to find the good data, too.

You want both RAID and either ZFS or BTRFS, and to make sure they get scrubbed with regularity. They can detect corruption that may bypass other file systems, but the mirror or parity from the RAID are needed to correct it, which is what provides protection. IoW, I think a very good step would be a NAS, either FreeBSD based, or custom Linux w/ BTRFS ZoL RAID. For light use, you do not a million GBs or RAM, 10 SSD cache devices, and an 8-core Xeon, either. All that stuff is based on concurrent multi-user business use cases (though, an SSD read cache would be very helpful, if editing files on the NAS).

BTRFS for the Linux boxes would be good, but you are still a guinea pig with that FS. Definitely, absolutely, positively, make a swap partition, if using swap, and make a separate EXT4 /boot, of reasonable size. BTRFS /boot, or boot under a BTRFS /, still isn't 100% there.

washu
Posts: 571
Joined: Thu Nov 19, 2009 10:20 am
Location: Ottawa

Re: backup to guard against bit rot.

Post by washu » Fri Dec 12, 2014 2:05 pm

RAID and ZFS or BTRFS are not backups. They can be used as part of a backup system but by themselves they are not backups, just high availability.

ZFS and BTRFS protect only against storage bit-rot, which is good but actually quite unlikely compared to other sources. Bad RAM is the most likely cause, with network errors next on the list. Your data is far more likely to be corrupted in a memory buffer or while being transferred over a network than sitting on a disk. If you don't have ECC RAM then the protections offered by ZFS/BTRFS are pretty much useless as they cannot be trusted.

If you are really concerned about bit-rot then you need something filesytem agnostic. Most archive formats (ZIP/RAR/7zip/etc) have checksums built in which will tell you if your files got corrupted. RAR even has error correction as an option. However keeping you files in archives is inconvenient. Look at using something like PAR files for your mostly static data. PAR creates recovery files that go along with your data. As long as you copy the PAR files along with your data then you can check them for corruption and possibly fix them and it doesn't matter what filesystem you are using.

HFat
Posts: 1753
Joined: Thu Jul 03, 2008 4:27 am
Location: Switzerland

Re: backup to guard against bit rot.

Post by HFat » Sat Dec 13, 2014 3:03 am

In my experience and according to my reading, other causes of bitrot such as crappy controllers (it might be their RAM, but most people would think system RAM when you say "RAM") and buggy software (uncommon firmware especially) seem to be leading causes.
Although it seems most things (RAM included) has become more reliable, if you don't have ECC you might want to run memtest overnight occasionally or something.

I don't think bitrot ought to be much of a concern to the OP anyway.
What ought to be a higher-priority concerns I think are bad sectors (possibly that's what the OP meant with "bit rot") and disk failure. It's not clear the OP is set up to detect bad sectors.
If your data is either in a self-verifiable format (like the archive formats washu mentionned) or if you manually generate checksums for static data, you can detect both bit rot and bad sectors by verifying the checkums periodically (there's no need to do it often). But you can also detect bad sectors simply by reading the data without trying to verify it. Anyone can do this and no preparation is required. You only need a controller which raises an error when you get a bad sector. All controllers should do this but it can be a problem with some USB drives (possibly only old ones). If you have honest drives, SMART should also tell you if your drives have ever had bad sectors (if they do, it's more likely they will have more problems going forward). Again, SMART is (or at least was) not always available over USB.
As to disk failure, (most) drives have gotten better but an extra backup drive wouldn't hurt. And it looks like the OP needs to buy a drive anyway. The older drives could be kept around both to archive old versions of the higher-value files and repos (in case data loss caused by a bug or user error is propagated to the regular backups before it's detected) and as insurance against multiple drives failures (highly unlikely, yes but then so is bit rot in sane setups).

The OP ought also consider disaster recovery (extreme domestic strife, theft, fire, whatever). One of the backup drives could simply be stored at a different location for instance.

EDIT: after a quick look at discussion threads, it looks like git gc will normally not verify all your data (not by default anyway). You should call git fsck explicitely with the appropriate options. I'm not knowledgable about git so don't take my word for it. Maybe git defaults changed recently for instance.

Nicias
Posts: 80
Joined: Fri Apr 08, 2011 11:14 am

Re: backup to guard against bit rot.

Post by Nicias » Sat Dec 13, 2014 7:07 am

Thanks for all of the input.

I don't think I want to invest the required money & labor for the multiple drive solution that you are suggesting. As I said, I don't have anything irreplacable. Just inconvient.

Right now I'm leaning towards the following:

For category A) adding git fsck to the weekly git gc run.
For category B) doing nothing more
For category C) adding weekly par2 checks

I also do SMART self-tests regularly on all the drives involved.

The backup drives do live off-site, in my office, not a safe deposit box, but still offsite. So that covers "domestic strife," theft, or fire. Something that would destroy my office (4th floor of concrete building) would have to be a significant disaster.

I've put a bash script together to do the par2 checks. It would be wrapped in a cron job and have the logfile emailed to me.

Code: Select all

#!/bin/bash

/usr/bin/ionice -c3 -p$$

PARPREFIX="."
BASEDIR=/tmp/parcheck
LOGFILE=${BASEDIR}/partest-log.txt
FAILS=${BASEDIR}/par-fails.txt

/bin/mkdir -p ${BASEDIR}

LogString () {
                /bin/echo "$@" >> $LOGFILE
                #/bin/echo "$@" 
}


LogString "`/bin/date`: Checking Par Files" 

LIST=`/usr/bin/find $@ -size +10M -exec /usr/bin/readlink -f \{} \; | /bin/grep -v ".par2" |  /usr/bin/sort | head -n3`

for FULLPATH in ${LIST} ; do 

        DIR=$(dirname ${FULLPATH})
        NAME=$(basename ${FULLPATH})

        LogString "checking file ${NAME} in ${DIR}"
        cd ${DIR}
        PARTARGET=${PARPREFIX}${NAME}.par2
        if [ ! -e ${PARTARGET} ]; then
                LogString "${PARTARGET} missing, creating"
                /usr/bin/par2 c -n1 -m1024 -r1 -qq ${PARTARGET} ${NAME}
        else
                LogString "${PARTARGET} exists, checking"
                if  /usr/bin/par2 v -m1024 -qq  ${PARTARGET}  ; then
                        LogString "par checks"
                else
                        LogString "PAR FAILS!!!"
                        echo `/bin/date` ${FULLPATH} >> ${FAILS}
                fi
        fi
        cd ${BASEDIR}
done

LogString "`/usr/bin/wc ${FAILS} -l`"
LogString "`/bin/date`: Done Checking Par Files"
Any thoughts

HFat
Posts: 1753
Joined: Thu Jul 03, 2008 4:27 am
Location: Switzerland

Re: backup to guard against bit rot.

Post by HFat » Sat Dec 13, 2014 7:56 am

I think weekly fsck/par2 checks are way overkill. You might start out doing weekly checks but if you don't find any issues for months on end (as expected), consider make such checks a lot less frequent. Then again maybe git fsck would find problems with git rather than storage problems so perhaps that part is worthwhile. But weekly par2 checks?
Also, if you have important static files in your category B, you could manually add hashes to them and then have your periodic script verify any hashes it finds.
Finally, maybe your SMART tests already cover that (do you know what they actually do?) but you could simply read all your data from time to time as well. It less resource-intensive than verifying and would cover all your categories, B included. This would be useful if you have files that never change and that you never check in B as you might never touch the parts of the disk where they reside unless you read everything. Assuming reasonable controllers, you could use dd (using of=/dev/null and a reasonable block size) on the whole device actually, which would be even faster and more straightforward to script.
And finally-for-real-this-time note that, depending on your drive firmware (if you've got WD Greens for instance), the symptom of a drive problem might be an operation that (virtually) never finishes rather than an error. So you might want to take that into acccount in your script, depending on how you're using it.

washu
Posts: 571
Joined: Thu Nov 19, 2009 10:20 am
Location: Ottawa

Re: backup to guard against bit rot.

Post by washu » Sat Dec 13, 2014 9:19 am

HFat is spot on that you don't need to check your files that often. I have files with pars that are over 10 years old that still check out fine. I have ZIP files nearly 20 years old that still extract with no CRC errors. I check my files when I do a major storage change/migration with occasional spot checks. I have caught a faulty network card this way, but I have never had a file corrupt itself just sitting on disk.

Nicias
Posts: 80
Joined: Fri Apr 08, 2011 11:14 am

Re: backup to guard against bit rot.

Post by Nicias » Sat Dec 13, 2014 9:27 am

Ok, so changing checks to monthly, and adding in a dd to /dev/null.

cerbie
Posts: 46
Joined: Sat Dec 28, 2013 3:21 pm

Re: backup to guard against bit rot.

Post by cerbie » Sat Dec 13, 2014 11:13 am

If it's just a little extra work to replace the data, rather than a ton of work or impossible (such as files changing often), then sure, just archive them, par them, and check the pars every now and then. If redoing Linux boxes anew, I'd go with BTRFS, though, provided you're using a distro with a fairly new kernel.

Post Reply