Skip to content

So I want a backup solution

I look after a lot of systems, and most of them want identical and simple backups taking of their filesystems. Currently I use backup2l which works but suffers from a couple of minor issues.

In short I want to take a full filesystem backup (i.e. Backup "/"). I wish to only exclude a few directories and mounted filesystems.

So my configuration looks like this:

# List of directories to make backups of.
# All paths MUST be absolute and start with a '/'!
SRCLIST=(  / /boot -xdev )

# The following expression specifies the files not to be archived.
SKIPCOND=( -path '/var/backups/localhost' -o -path '/var/run/' -o \
    -path '/tmp'  -o -path '/var/tmp' \
    -o -path '/dev' -o -path '/spam' \
    -o -path '/var/spool/' )

The only surprising thing here is that I abuse the internals of backup2l because I know that it uses "find" to build up a list of files - so I sneakily add int "-xdev" to the first argument. This means I don't accidentally backup any mounted gluster filesystem, mounted MySQL binary/log mounts, etc.

backup2l then goes and does its jobs. It allows me to define things to run before and after the backup runs via code like this:

# This user-defined bash function is executed before a backup is made
   if [ -d /etc/backup2l/pre.d/ ]; then
      run-parts /etc/backup2l/pre.d/

So what is my gripe? Well I get a daily email, per-system, which shows lots of detail - but the key thing. The important thing. The thing I care about more than anything else, the actual "success" or "fail" result is only discoverable by reading the mail.

If the backup fails, due to out of disk, I won't know unless I read the middle of the mail.

If the pre/post-steps fail I won't know unless I examine the output.

As I said to a colleague today in my view the success or failure of the backup is the combination of each of three distinct steps:

  • pre-backup jobs.
  • backup itself
  • post-backup jobs.

If any of the three fail I want to know. If they succeed then ideally I don't want a mail at all - but if I get one it should have:

Subject: Backup Success - $(hostname) - $(date)

So I've looked around at programs such as backup-ninja, backup-manager and they seem similar. It is a shame as I mostly like backup2l, but in short I want to do the same thing on about 50-250 hosts:

  • Dump mysql, optionally.
  • Dump postgresql, optionally.
  • Dump the filesystem. Incrementals are great, but full copies are probably tolerable.
  • Rsync those local filesystem backups to a remote location.

In my case it is usually the rsync-step that fails. Which is horrific if you don't notice (quota exceeded. connection reset by peer. etc). The local backups are good enough for 95% of recovery times - but if the hardware is fried having the backups be available, albeit slowly, is required.

Using GNU Tar incrementally is trivial. If it weren't such a messy program I'd probably be inclined to hack on backup2l - but in 2012 I can't believe I need to.

(Yes, backuppc rocks. So does duplicity. So does amanda. But they're not appropriate here. Sadly.)

ObQuote: "Oh, I get it. I see now. You've been training for two years to take me out, and now here I am. Whew! " - Blade II - An example of a rare breed, a sequel that doesn't suck. No pun intended.

Comments On This Entry

  1. [gravitar] Kint

    Obnam takes care of most of your requirements, and is written by a Debian dev. Syncing to remote filesystem is implemented via sftp with keys, GPG can be used to encrypt backups and you can ignore directories and mounted filesystem. The only thing that it won't do by itself is run arbitrary shell commands like mysqldump or whatever you need to run to dump/backup your DBs. That can br worked around by calling obnam from a script.

    As for logging, obnam will either log to a file or to syslog, you could always use logwatch to alert you when an obnam backup fails.

    My nightly backups currently look like this :

    obnam backup --repository s --quiet --keep 90d --compress-with gzip --log syslog --exclude '/home/somedir' /

  2. [author] Steve Kemp

    Thanks for the tip - I had forgotten about obnam, despite reading about its development.

    Syncing to off-site locations using rsync is a key requirement. (Because I have a number of remote backup servers which only present themselves via rsync, with nice sensible ACLs and quotas in place.) I accept that SFTP is perhaps more generally useful, and rsync over ssh perhaps a good runner-up, but for me I need rsync.

  3. [author] Steve Kemp

    Interestingly I have now discovered that I can solve the failure of the pre/post sections with the addition of the "--exit-on-error" flag to run-parts.

    skx@precious:~$ mkdir pre.d
    skx@precious:~$ echo -e '#!/bin/sh\nfalse' > pre.d/false
    skx@precious:~$ echo -e '#!/bin/sh\ntrue' > pre.d/true
    skx@precious:~$ chmod 755 pre.d/*
    skx@precious:~$ if ( ! run-parts --exit-on-error ./pre.d/ ); then echo "FAIL"; fi
    run-parts: ./pre.d//false exited with return code 1
    skx@precious:~$ rm pre.d/false 
    skx@precious:~$ if ( ! run-parts --exit-on-error ./pre.d/ ); then echo "FAIL"; fi

    That actually solves most of my problem. I just need to work out how to detect the actual backup process failing. Which I could do by looking for a TMP.* file in the output directory ..?

  4. [gravitar] JFS

    What about rsnapshot ?

    It fills also all of your requirement (rsync, only usefull log, etc.) and is highly configurable.

    Just one more tip :-)

  5. [gravitar] Carsten Aulbert

    Well, I second rsnapshot, but more importantly, with these many machines you aim for, don't rely on email. Simply don't.

    If you set it up that it won't send an email on failure, you will never know if it failed because the mail never left the system. If you let it send en email on success, you have to weed out all emails daily to find the broken ones and still have to detect if there's a machine missing.

    My advise would be to set-up a system (take Nagios/icinga or write a small shell/perl/python/whatever script) which will parse the results from your backup solution (or the email) and put the results somewhere. From these, you generate a web page, text summary where nothing is mentioned except any failures or any sign of life for say 48hrs.

    Works great for us with close to 2000 machines (well only ~ 100 are deeply monitored for various problems, go figure), but invest a little time and thinking now and be glad later on ;)

  6. [author] Steve Kemp

    Carsten - point on email definitely noted. I'd also be wanting to run a hook to alert via another means on success/failure.

    With a consistent mail though I'd file each one into a folder of its own - so we'd ideally have a list of mails:

    Success ...
    Success ...
    Failure ..
    Success ..

    Assuming the mail arrived we'd notice the failures easily. If the mail failed because the host was down we'd hopefully have already noticed via other monitoring.

  7. [gravitar] Carsten Aulbert

    Steve - trust me, Murphy will take care of the missing bits of 'n' nines (i.e. 100% - 99.9% ) ;)

  8. [gravitar] Steven Chamberlain

    Just wondered what makes AMANDA not appropriate?

    Since it is based on gnutar incrementals it supports --one-file-system and --exclude patterns.

    Before you run amdump to commence a backup, MySQL databases can be mysqlhotcopy'd to your main filesystem (for example /var/backups/mysql/). This ensures you're backing up a consistent snapshot.

    Likewise pg_dump for all Postgres databases.

    If the backup data must stored locally and then rsync'd, the backup data can be written to local 'virtual tape' directories first (don't forget to exclude them!).

    The email report will have FAIL (or sometimes STRANGE) quite prominently in the subject line if something went wrong; or a prior run of 'amcheck -m -w' may even alert you to a problem before the backup run starts.

    What I described here actually means configuring amanda-server on each host. More conventionally they would have each been set up as an amanda-client with a centralised amanda-server controlling them, but that is less flexible.

  9. [gravitar] Wouter Verhelst

    I think you should look at bacula:
    - Remote backup is the rule, not the exception (to the extent that it might be an issue if you need to pay for the bandwidth, in which case I retract my recommendation)
    - The subject contains "success" or "failure" of the backup job
    - Supports a "base" backup which is ideal if you have many almost-but-not-quite similar systems (you make a "base" backup once; bacula then makes its "full" backups be incrementals based on the "base" backup, by checking file checksums etc).