Encrypted and Version Controlled File Sync with git-annex(1)

Encrypted and Version Controlled File Sync with git-annex(1)

October 21, 2023
Linux, sync, git-annex

git-annex(1) is a versatile and cross-platform tool build on top of git, it can sync, backup, archive files and provides many useful primitives for building customized workflow and storage system, for example, by combining git-annex with gcrypt, it’s possible to fully encrypt data stored on a remote.

Partially due to its versatility, it has a steeper learning curve than some other tools in this field and it took me some time to figure out how to make it work for me, here is a quick guide that documents my journey.

Prerequisite and Installation #

git-annex and git-remote-gcrypt is available from many package manager, to install them on Debian:

# apt-get install git-annex git-remote-gcrypt

git-annex supports multiple encryption mode, I will be going with the default hybrid mode since it allows more keys to be added in future. In this mode, data is encrypted with gpg using a symmetric key generated during remote initialization, the key then is encrypted by a gpg public key specified during initremote. After that, the symmetric key is checked into the git repository. This is useful when multiple users wish to access the same encrypted repository, but doing so is outside the scope of this post, for doing that and other advanced operations, read git-annex’s gcrypt guide for more details.

I opt to create a new key for this use case, but any gpg key will do.

Setup Local Repository #

The first step is to create a local repository as base, which will then be synced to remotes:

laptop$ git init myrepo
laptop$ cd myrepo
laptop$ git annex init

To checkin and commit some file into it:

laptop$ touch example
laptop$ git annex .
laptop$ git commit -a -m 'test'

Setup Encrypted Remote #

First, create a bare repository on the server, it will hold encrypted data later:

server$ git init --bare myrepo-remote

Then, on the local machine, add the newly created repository on the server as an encrypted remote, it’s a good practice to give it a descriptive name:

(To find the KEYID, run gpg --list-key)

laptop$ git annex initremote homeserver type=gcrypt gitrepo=rsync://server_hostname/path/to/myrepo-remote keyid=$KEYID
gcrypt: Repository not found: rsync://server_hostname/path/to/myrepo-remote
gcrypt: Setting up new repository
gcrypt: Remote ID is :id:XXX
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Compressing objects: 100% (3/3), done.
Total 5 (delta 0), reused 0 (delta 0), pack-reused 0
gcrypt: Encrypting to:  -r XXX
gcrypt: Requesting manifest signature
To gcrypt::rsync://server_hostname/path/to/myrepo-remote
 * [new branch]      git-annex -> git-annex
ok
(recording state in git...)

With this done, it should now be possible to sync local repository to the remote:

laptop$ git annex sync --content

Work with Multiple Local Machines #

To accecss this encrypted repository from another machine (e.g. a desktop PC), first setup the gpg key on such machine, then clone and decrypt the repository:

desktop$ git clone gcrypt::rsync://server_hostname/path/to/myrepo-remote myrepo
Cloning into 'myrepo'...
gcrypt: Decrypting manifest
gpg: Good signature from "omnirepo (annex)" [unknown]
gcrypt: Remote ID is :id:XXX
Receiving objects: 100% (5/5), done.

Sync command will also work on the new machine for sending modified files to the remote:

desktop$ git annex sync --content
commit 
[master cec51a4] git-annex in XXX
 1 file changed, 1 insertion(+)
ok
pull origin 
gcrypt: Decrypting manifest
ok
push origin 
gcrypt: Decrypting manifest
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Compressing objects: 100% (4/4), done.
Total 6 (delta 0), reused 0 (delta 0), pack-reused 0
gcrypt: Encrypting to: --throw-keyids --default-recipient-self
gcrypt: Requesting manifest signature
To gcrypt::rsync://server_hostname/path/to/myrepo-remote
   bbed528..cec51a4  master -> synced/master
   c387409..575869a  git-annex -> synced/git-annex
ok

Troubleshooting #

Cannot write to annex file #

Annexed file is set to readonly (locked) to prevent accidental modification, run git annex unlock locked_file to unlock the file first.

Remove Unwanted Remote #

git-annex manages its remotes via git, to delete a remote, run git remote remove oldremote