Skip to content

So about that off-site encrypted backup idea ..

I'm just back from having spent a week in Helsinki. Despite some minor irritations (the light-switches were always too damn low) it was a lovely trip.

There is a lot to be said for a place, and a culture, where shrugging and grunting counts as communication.

Now I'm back, catching up on things, and mostly plotting and planning how to handle my backups going forward.

Filesystem backups I generally take using backup2l, creating local incremental backup archives then shipping them offsite using rsync. For my personal stuff I have a bunch of space on a number of hosts and I just use rsync to literally copy my ~/Images, ~/Videos, etc..

In the near future I'm going to have access to a backup server which will run rsync, and pretty much nothing else. I want to decide how to archive my content to that - securely.

The biggest issue is that my images (.CR2 + .JPG) will want to be encrypted remotely, but not locally. So I guess if I re-encrypt transient copies and rsync them I'll end up having to send "full" changes each time I rsync. Clearly that will waste bandwidth.

So my alternatives are to use incrementals, as I do elsewhere, then GPG-encrypt the tar files that are produced - simple to do with backup2l - and us rsync. That seems like the best plan, but requires that I have more space available locally since :

  • I need the local .tar files.
  • I then need to .tar.gz.asc/.tar.gz.gpg files too.

I guess I will ponder. It isn't horrific to require local duplication, but it strikes me as something I'd rather avoid - especially given that we're talking about rsync from a home-broadband which will take weeks at best for the initial copy.

Comments On This Entry

  1. [gravitar] Wussy

    "There is a lot to be said for a place, and a culture, where shrugging and grunting counts as communication."

    Well, you've been in Helsinki. ;)

    I've never liked the place though (surprise surprise) I've lived my whole life in Finland. You must visit the inner land too. And the North. :)

  2. [gravitar] oscar

    have a look at Rsyncrypto http://rsyncrypto.lingnu.com

  3. [author] Steve Kemp

    Thanks for the pointer rsyncrypto looks ideal for my needs - :


    • It encrypts the contents remotely.
    • It doesn't require extra space locally to create .tar files, etc.
  4. [gravitar] Steven C

    This is handled quite well within the AMANDA backup system. GNU tar's --listed-incremental mode is used to stream out only new/changed files; this is piped through your compression program, and aespipe or GPG before going out to the remote AMANDA server without having to write anything to a local disk.

    A schedule and heuristics decide whether to do an incremental, or full backup of each directory on that run; usually if a large directory is being dumped in full, all others will be incremental on that run, so as to balance the amount of data transfer on each run.

    The remote AMANDA service doesn't have to run as a listening inetd TCP/UDP service, and can instead be invoked in an SSH session in exactly the same way that rsync+ssh works.

    Or if you ran the AMANDA server locally with 'virtual tapes' as files on a local disk, you can efficiently rsync those to some off-site storage. That still means there is duplication on-disk, but the amount of data would be kept about as small as can be.

    Or maybe you could imitate AMANDA's pipeline with something of your own, e.g. incremental tar|gzip|gpg, and storing each output file only temporarily while you rsync it out. Remember that gpg compresses by default, so use -z0 to save CPU time if you are piping already-compressed data into it. Have some other script remove old backup files on the remote side, by rsyncing an empty directory and specifying the filenames with the --include flag.

  5. [gravitar] Steven C

    Uhhh at first glance rsyncrypto sounds bad. Phrases like "industry standard" set alarm bells ringing, and "slightly modified version" reminds me of the unfortunate Debian OpenSSL fiasco.

    It sounds like Cipher Block Chaining without the Chaining part, so what you are left with may be more like OFB mode or a stream cipher.

    And for it to incrementally transfer changes in encrypted files, it must also be re-using the same key+IVs and thus weakening the encryption with each modification.

    And unless it encrypts filenames and metadata, it might be plainly obvious what a file's contents are. That too could reveal information about the key (known plaintext).

  6. [gravitar] John Eikenberry

    Store them in an ecryptfs filesystem, then just rsync up the encrypted versions?

  7. [author] Steve Kemp

    Storing things encrypted locally would be nice, but it would be more of a pain. At the moment some of my sensitive materials are in a crypted LUKS volume, but I only do full-disk encryption on my toy-laptop, not on my desktop.

    Looking over things more carefully I see that rsyncrypto isn't as perfect a fit as I'd thought - it only creates an encrypted tree locally, rather than remotely as I'd first believed. Still it seems the closest match at the moment - trading potential weaknesses for a doubling in local disk space.

    Ideally of course there would be the option to run files through a filter in rsync - where I could transparently do the crypto...

  8. [gravitar] RichiH

    git-annex?

  9. [author] Steve Kemp

    I'm not sure that solves the problem:

    • Data will be duplicated in .git, which I can tolerate.
    • Git doesn't support pushing via rsync.
    • Git doesn't encrypt the contents, even if I could just rsync .git.

    Am I missing something?

  10. [gravitar] MJD

    If you use encfs, it has a reverse mode. So Instead of encrypting the files and storing that, it creates encrypted versions of the files on-the-fly. You can then rsync that tree to the server. If you need to access the files, just run encfs in its normal mode pointed at those files and it will let you access them like normal.

  11. [gravitar] Mike Lowe

    Duplicity.nongnu.org

  12. [gravitar] kb

    Have you looked at obnam? It seems it will do what you want, and it's by Debian hacker Lars Wirzenius.

    http://liw.fi/obnam/

  13. [gravitar] Damien

    You might want to have a look at obnam (http://liw.fi/obnam/). I am currently testing it and it looks very promising:
    * incremental backups, but without the need of complete backups for roll-over. Every backup is seen like a complete snapshot
    * data deduplication
    * data encryption using gpg

    D.

  14. [author] Steve Kemp

    So after much wailing and gnashing of teeth I've resigned myself to the fact that I'll need at least one local duplicate of my archive(s).

    So on that basis I've configured obnam to backup :

    src ~/Imagesdst /home/backups/Images
    src ~/Videosdst /home/backups/Videos

    I've configured a suitable GPG-key and I'm currently waiting for 500Gb+ to be encrypted, archived, and deduplicated. I'm hoping that because I'm using common files that the headers/meta-data/random blocks will collide reducing in a net reduction of space, but time will tell.

    Once I've got this local backup in ~backups/ I'll use rsync to ship it offsite and hope for the best from there, though it will take weeks to upload.


  15. [gravitar] kb

    I don't understand why you need to make that local copy. Normally obnam takes care of the encrypting + push to remote stuff for you. You will get checkpoints thrown in for free.

  16. [author] Steve Kemp

    I don't love the local-copy, but need it because I will only be pushing to the remote host using rsync. (Not rsync over ssh, just rsyncd)

    obnam seems to require the use of sftp for its remote pushing.

  17. [gravitar] Helmut Grohne

    Yes, you are missing pieces to git annex.

    > Data will be duplicated in .git, which I can tolerate.

    Correct for git, but this is precisely what git annex avoids.

    > Git doesn't support pushing via rsync.

    Correct for git, but git annex does support rsync remotes.

    > Git doesn't encrypt the contents, even if I could just rsync .git.

    Correct for git, but git annex does support encryption before transfer.

    That said, switching a workflow to git annex is a pile of work. Just storing all the git annex (encrypted) blobs remotely is not enough. You also need to back up the git (non-annex) tree and here git annex does not help you. A rough estimate on the overhead is about 0.05% for raw image data. So if you have 500G of images, you will have an additional .git folder which weighs about 250M (excluding .git/annex/objects). You would have to come up with a different solution for backing up these essential 250M, cause if you lose them, your encrypted rsync backup is moot. Also working with your tree will be more difficult, because you cannot just edit your images in gimp. You need to manually "unlock" them. The full git annex solution may become more interesting if you encounter additional problems like low disk space or the need to synchronize your folder with multiple machines which git annex happens to solve as well.

    If you ever try using git annex for your images, I would like to see an experience report.

  18. [gravitar] RichiH

    git-annex uses git to distribute log data and the existence of files.

    It manages the actual data outside of git.

    > Data will be duplicated in .git, which I can tolerate.

    To be exact, it will live in .git/annex/objects and your normal files will soft-link to them.

    > Git doesn't support pushing via rsync.

    git-annex uses rsync as its default data synchronization mechanism.

    > Git doesn't encrypt the contents, even if I could just rsync .git.

    git-annex can encrypt certain remotes while keeping others unencrypted. Your normal machines will have unencrypted data, all remotes in untrusted locations will have encrypted data, only.

    If fits your requirements exactly; all you need to do is read up on it ;)

  19. [gravitar] Dale King

    brackup is good too

    http://search.cpan.org/~bradfitz/Brackup/lib/Brackup/Manual/Overview.pod