Friday, August 15, 2014

Backup thoughts, hard won from recent experience.

Time machine is cool when it works.

When it does not though it is not so cool. 

Same goes for windows backup and rdiff-backup to another disk.

But! If the backup disk (or sparsebundle) is directly mounted on the machine that is doing the backup...  what if there is some hickup in your filesystem code somehow while the backup disk is mounted...? BOOM! goes your system disk AND backup.

This is exactly what happened recently with my wife's MacBook Pro. It appears that the machine had a crash somehow going to sleep while doing a backup to the NetATalk share on my home linux server. The system partition was irrecoverably corrupted and so was the sparsebundle containing time machine. (After cloning the system drive with dd) I ended up having to reinstall OSX and restore the profiles from a backup of the server, then recover some of the more recent files from the system drive image using DiskWarrior.  

I am rebuilding the home server real soon now (the new HDD is backordered and now I am on call until next weekend). I was already planning to set up a complex set of btrfs subvolumes, so now the plan is to have the TimeMachine NetATalk shares be subvolumes with frequent snapshots so that previous snapshots are out of scope of the machine doing the backing up. Likewise with the smb shares that 2 windows machines are backing up onto.

So the server's subvolumes are like so:

root (and snapshots)
home (and snapshots)
mytimemachine AFP (and snapshots)
wifestimemachine AFP (and snapshots)
aperturevault AFP (and snapshots)
testWin7Box SMB (and snapshots)
worklaptop SMB (and snapshots)
persistantshare SMB (and snapshots)
volatileshare SMB (NO Snapshots)


And after several steps, the intirim backup solution for the server itself will be one 2TB disk in   
an eSATA cradle with one main subvolume (and it's snapshots) to which the current version of each of those (except the volatileshare) are rsync'd.

Eventually the plan is to score a gratis small desktop with room for up to 4 drives that can live in another building (we have power line ethernet to it working already) so it can wake on schedule daily, receive the backup via rsync, snapshot the backup, and shut itself down when complete. Thus the server also will not have write access to it's own past backups. Also the uptime of the second server should be minimal (first backup will be done in the house via gigabit LAN) and thus power consumption as well.