Restic: Effective Backup from Stdin

I’ve previously discussed Restic in the article “Backup storage for thousands of virtual machines using free tools,” and it remains my favorite backup tool since then.
Today, I will describe a ready-made recipe for setting up effective backup from Stdin, with deduplication and automatic cleaning of the repository from old copies.
Despite Restic being great for saving entire data directories, this article emphasizes on-the-fly backup from Stdin, typically for virtual machine backups, databases, and other large-file data that can be sequentially read and immediately sent to the backup system.
Here are some rules I try to follow:
1. Do not use a monorepository for backups.
Always distribute entities across different repositories. For example, create a repository for each database or virtual machine. This is advisable for several reasons:
- Deduplication does not significantly increase efficiency, as data between different databases or virtual machines are always different.
- Locks operate at the repository level, preventing concurrent operations on each repository.
- In case of metadata corruption, you lose all data in the repository, not just individual copies.
- You can set a separate password for each repository.
2. Use Restic’s built-in compression
Using gzip
compression can result in a unique data set, even with minor source changes, which makes Restic’s deduplication ineffective. However, backing up without compression isn’t ideal either, as uncompressed backups can be tens to hundreds of times larger than compressed ones. Previously, using gzip
with the --rsyncable
flag, like mysqldump | gzip --rsyncable | restic
, was an option. But starting from version 0.14.0, Restic can efficiently compress data on its own and does so by default for all new repositories. To use this feature, simply create a repository with the option restic init --repository-version 2
. This will enable default compression for backups in that repository.
3. Always ensure that the backup is fully created.
When using a pipe, it’s not always obvious that ackup is complete, so let’s clarify:
If you’re backing up from Stdin:
<your_command> | restic backup --stdin
Your command might unexpectedly terminate with an error, and restic won’t know, saving a partially created backup in the repository.
One solution is to use tags:
set -e
set -o pipefail
JOB_ID="job-$(uuidgen|cut -f1 -d-)"
<your_command> | restic backup --tag "$JOB_ID" --stdin
restic tag --tag "$JOB_ID" --set "completed"
restic forget --keep-tag "completed"
set -o pipefail
returns a non-zero exit code for your command, even if it was run in a pipe, but that's not enough.set -e
stops script execution at the occurrence of any error. Alternatively, you can manually check the exit code for your pipe.- The
JOB_ID
variable is set to mark the snapshot being created with a temporary tag. - After the backup is successfully completed, change the snapshot’s tag to
completed
and delete all snapshots that don't contain it.
This method protects against partially completed backups and keeps you informed of any issues.
4. Do not restore directly to Stdout.
In other words, avoid doing this:
restic dump latest dump.sql | mysql
Due to Restic’s implementation features, restoring directly to Stdout is highly inefficient, as it is done single-threaded and very slowly.
Instead, restore the backup to an intermediate directory:
restic restore latest --target /tmp/
mysql < /tmp/dump.sql
This approach will significantly speed up the recovery operation.
As promised, here’s a ready script to performs backups for a set of databases, sequentially reading them through Stdin:
#!/bin/sh
set -e
set -o pipefail
# Generate a temporary tag for the job
JOB_ID="job-$(uuidgen|cut -f1 -d-)"
# Get the list of databases to backup and shuffle it
DB_LIST=$(psql -Atq -c 'SELECT datname FROM pg_catalog.pg_database;' | grep -v '^\(postgres\|app\|template.*\)$')
DB_LIST=$(echo "$DB_LIST" | shuf)
echo "Job ID: $JOB_ID"
echo "Target repo: $REPO_PREFIX"
echo "Cleanup strategy: $CLEANUP_STRATEGY"
echo "Start backup for:"
echo "$DB_LIST"
echo
echo "Backup started at `date +%Y-%m-%d\ %H:%M:%S`"
for db in $DB_LIST; do
(
set -x
# Create a repository if not created
restic -r "s3:${REPO_PREFIX}/$db" cat config >/dev/null 2>&1 || \
restic -r "s3:${REPO_PREFIX}/$db" init --repository-version 2
# In my case, Kubernetes ensures job exclusivity, so I confidently remove any potential locks
restic -r "s3:${REPO_PREFIX}/$db" unlock --remove-all >/dev/null 2>&1 || true
# Create the backup (remember to specify the file name and its extension for clarity on the data type in the backup)
pg_dump -Z0 -Ft -d "$db" | \
restic -r "s3:${REPO_PREFIX}/$db" backup --tag "$JOB_ID" --stdin --stdin-filename dump.tar
# Mark the backup as 'completed'
restic -r "s3:${REPO_PREFIX}/$db" tag --tag "$JOB_ID" --set "completed"
)
done
echo "Backup finished at `date +%Y-%m-%d\ %H:%M:%S`"
echo
echo "Run cleanup:"
echo
echo "Cleanup started at `date +%Y-%m-%d\ %H:%M:%S`"
for db in $DB_LIST; do
(
set -x
# Delete all snapshots without the 'completed' tag
restic forget -r "s3:${REPO_PREFIX}/$db" --group-by=tags --keep-tag "completed"
# Remove snapshots that don't match our cleanup strategy
restic forget -r "s3:${REPO_PREFIX}/$db" --group-by=tags $CLEANUP_STRATEGY
# Initiate the operation to delete data from the repository
restic prune -r "s3:${REPO_PREFIX}/$db"
)
done
echo "Cleanup finished at `date +%Y-%m-%d\ %H:%M:%S`"
This script runs in Kubernetes as a CronJob with concurrencyPolicy: Forbid
, and I pass the following environment variables to it:
# A set of keys for repository cleanup
- name: CLEANUP_STRATEGY
value: "--keep-last=3 --keep-daily=3 --keep-within-weekly=1m"
# S3 bucket address
- name: REPO_PREFIX
value: "s3.myprovider.com/backups/postgres-backups"
# Repository password
- name: RESTIC_PASSWORD
valueFrom:
secretKeyRef:
name: mydatabase-backup
key: resticPassword
# S3 bucket credentials
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: mydatabase-backup
key: s3AccessKey
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: mydatabase-backup
key: s3SecretKey
- name: AWS_DEFAULT_REGION
value: us-east-1
# Database access details
- name: PGUSER
valueFrom:
secretKeyRef:
name: mydatabase-superuser
key: username
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: mydatabase-superuser
key: password
- name: PGHOST
value: mydatabase-service
- name: PGPORT
value: "5432"
- name: PGDATABASE
value: postgres
Important note: When adapting this script to other platforms, pay special attention to locks. If your platform does not guarantee exclusive execution, do not use restic unlock --remove-all
.