Tuesday, March 04, 2008

Data/Documents backup script

I used to always create directory under my home called data. I'd store files which I need to do work and my working files. Then, this would be the only directory I had to synchronize with Unison (http://www.cis.upenn.edu/~bcpierce/unison/)between machines. Since Fedora has included a directory called Documents, which has a similar purpose (and so has Mac OS X), I've just changed the name of this special directory from data to Documents.



That brings me to the topic of this article--how do I manage if I mess up a file in this directory, and I synchronize the mistake to other mirrors? That would mean none of the computers I work on has a copy anymore. The answer is rdiff-backup (http://www.nongnu.org/rdiff-backup/). I used to make just a user cron job to call rdiff-backup and update an incremental backup repo on my behalf, but I wanted this to be more mechanical (what if I added a new user, etc.). So I wrote the following script which replaces an old script I used to have which would only make a mirror with rsync.



Note that, on my configuration, the incremental backup repo is on the same device as the original files, but both are mirrored in an overnight backup job. This means I have (at least) four mirrors of the same files, which is sort of a waste but I don't really worry about it since it's about 200 MB / user. Ideally, you would keep just the local mirror on the disk, and the rdiff-backup repository on the backup device (or system over network). The downside here would be, if the backup device (or remote system) crashes you loose the ability to restore a previous increment.



Following is the script for maintaining incremental backup repositories with rdiff-backup for all users. Caution: the backup path should be as read-only as possible, but it is not if the user logs in to the system. For this reason, you'll have to count on them not to modify the files, or keep the files on a filesystem they cannot mount. For simple remote access over samba (the kind of access in my setup), you would have to configure a per-user share (so Joe can't see Susan's backup, for instance) and make each one read-only (so Joe cannot corrupt his repository by changing the files there).




#!/bin/sh
#
# Make incremental backups of 'Documents' directories for users
#
# Ryan Helinski

LOGFILE=/var/log/rdiff-backup.log

# Close stdout and stderr
1>&-; 2>&-;

# Direct stdout and stderr to logfile
exec 1>>$LOGFILE; exec 2>>$LOGFILE;

echo Begin $0, `date`;

# For debugging
#set -x

# Script-specific configuration
HOMESPATH=/home
BACKUPPATH=/opt/backup
RDIFF_OPTIONS="--print-statistics"
DOCSDIR=Documents

# Max increment life
# The time interval is an integer followed by the character s, m, h, D, W,
# M, or Y, indicating seconds, minutes, hours, days, weeks, months, or
# years respectively, or a number of these con-catenated.
MAX_LIFE=4W

for USERHOME in $HOMESPATH/* ; do

# USERDIR=`basename $USERHOME`;
USER=`stat -c "%U" $USERHOME`;

if [ -d "$USERHOME/$DOCSDIR" ] ; then
echo "Updating $BACKUPPATH/$USER/$DOCSDIR";

# Create backup repo if it doesn't exist
if [ ! -d "$BACKUPPATH/$USER/$DOCSDIR" ] ; then
echo "Creating rdiff-backup repo";
mkdir -p "$BACKUPPATH/$USER/$DOCSDIR";
chown $USER "$BACKUPPATH/$USER";
chown $USER "$BACKUPPATH/$USER/$DOCSDIR";
fi

# Update repo
rdiff-backup $RDIFF_OPTIONS "$USERHOME/$DOCSDIR" \
"$BACKUPPATH/$USER/$DOCSDIR" ;

# Clean up repo
#rdiff-backup --remove-older-than $MAX_LIFE --force \
# "$BACKUPPATH/$USER/$DOCSDIR" ;

fi

done

echo End $0, `date`;
echo

exit


To actually remove old backups, uncomment the code in the "Clean up repo" section.