Backing up TimescaleDB takes advantage of the reliable functionality already
available through PostgreSQL. There are several ways to accomplish this:
physical backups with pg_basebackup
or another tool, or logical backups with
pg_dump
and pg_restore
. Physical backups may also be used with Write-Ahead Log
(WAL) archiving to achieve an ongoing backup.
For full instance physical backups (which are especially useful for starting up
new replicas), pg_basebackup
works with all TimescaleDB installations. You can also use any of several
external backup and restore managers such as pg_backrest
,
barman
, or wal-e
. These allow you to take
online, hot physical backups of your entire instance, and many offer incremental
backups and other automation options.
In this section, we cover how to backup and restore an entire
database or individual hypertables using the native PostgreSQL
pg_dump
and pg_restore
commands.
:TIP: Upgrades between different versions of TimescaleDB can be done in place; you don't need to backup/restore your data. Follow these updating instructions.
To backup a database named tutorial, run from the command line:
pg_dump -Fc -f tutorial.bak tutorial
Restoring data from a backup currently requires some additional procedures,
which need to be run from psql
:
CREATE DATABASE tutorial;
\c tutorial --connect to the db where we'll perform the restore
CREATE EXTENSION IF NOT EXISTS timescaledb;
SELECT timescaledb_pre_restore();
-- execute the restore (or from a shell)
\! pg_restore -Fc -d tutorial tutorial.bak
SELECT timescaledb_post_restore();
:WARNING: Using the flag
-j
withpg_restore
to restore using multiple workers is not supported and might generate errors. You can use it if you first restore the entire schema and the_timescaledb_catalog
schema without the-j
option, for example:pg_restore -Fc -s -d tutorial tutorial.bak pg_restore -Fc -a -n _timescaledb_catalog -d tutorial tutorial.bak
And then restore the rest of the database with the
-j
option using the-N
option, for example:pg_restore -Fc -a -N _timescaledb_catalog -j4 -d tutorial tutorial.bak
:WARNING: PostgreSQL's
pg_dump
does not currently specify the version of the extension in its backup, which leads to problems if you are restoring into a database instance with a more recent extension version installed. (In particular, the backup could be for some version 1.6, but then theCREATE EXTENSION timescaledb
command just installs the latest (say, 1.7), and thus does not have the opportunity to run our upgrade scripts.)The workaround is that when restoring from a backup, you need to restore to a PostgreSQL instance with the same extension version installed, and then upgrade the version.
:WARNING: These instructions do not work if you use flags to selectively choose tables (
-t
) or schemas (--schema
), and so cannot be used to backup only an individual hypertable. In particular, even if you explicitly specify both the hypertable and all of its constituent chunks, this dump would still lack necessary information that TimescaleDB stores in the database catalog about the relation between these tables.You can, however, explicitly exclude tables from this whole database backup (
-T
), as well as continue to selectively backup plain tables (i.e., non-hypertable) as well.
Below is the procedure for performing a backup and restore of hypertable conditions
.
Backup the hypertable schema:
pg_dump -s -d old_db --table conditions -N _timescaledb_internal | \
grep -v _timescaledb_internal > schema.sql
Backup the hypertable data to a CSV file.
psql -d old_db \
-c "\COPY (SELECT * FROM conditions) TO data.csv DELIMITER ',' CSV"
Restore the schema:
psql -d new_db < schema.sql
Recreate the hypertables:
psql -d new_db -c "SELECT create_hypertable('conditions', 'time')"
:TIP: The parameters to
create_hypertable
do not need to be the same as in the old db, so this is a good way to re-organize your hypertables (e.g., change partitioning key, number of partitions, chunk interval sizes, etc.).
Restore the data:
psql -d new_db -c "\COPY conditions FROM data.csv CSV"
:TIP: The standard
COPY
command in PostgreSQL is single threaded. So to speed up importing larger amounts of data, we recommend using our parallel importer instead.
When using TimescaleDB in a containerized environment, it is possible
to do continuous archiving using a WAL-E "sidecar" container (i.e., a container that runs alongside
the main container). For this purpose, we provide a WAL-E sidecar
image that works with TimescaleDB as well as regular
PostgreSQL. In the following example, we will setup archiving to the
local filesystem using one main TimescaleDB container called
timescaledb
, and one WAL-E sidecar called wale
. For production
deployments, this example can be adapted to do archiving against,
e.g., AWS S3, and run in an orchestration framework like Kubernetes.
To make TimescaleDB use the WAL-E sidecar for archiving, the two containers need to share a network. Create a Docker network like so:
docker network create timescaledb-net
Then launch TimescaleDB with archiving turned on, using the newly created network:
docker run \
--name timescaledb \
--network timescaledb-net \
-e POSTGRES_PASSWORD=insecure \
-e POSTGRES_INITDB_WALDIR=/var/lib/postgresql/data/pg_wal \
-e PGDATA=/var/lib/postgresql/data/pg_data \
timescale/timescaledb:latest-pg10 postgres \
-cwal_level=archive \
-carchive_mode=on \
-carchive_command="/usr/bin/wget wale/wal-push/%f -O -" \
-carchive_timeout=600 \
-ccheckpoint_timeout=700 \
-cmax_wal_senders=1
We explicitly set the location of the write-ahead log
(POSTGRES_INITDB_WALDIR
) and data directory (PGDATA
) so that we
can share these with the WAL-E sidecar. Both must reside in a Docker
volume (a volume is created for /var/lib/postgresql/data
by
default).
It is now possible to log into the database and create tables and data:
docker exec -it timescaledb psql -U postgres
Our WAL-E Docker image runs a small Web endpoint that accepts WAL-E commands via a HTTP API. This allows PostgreSQL to communicate with the WAL-E sidecar over the internal network to trigger archiving. It is, of course, also possible to use the container to invoke WAL-E directly. The Docker image accepts the standard WAL-E environment variables to configure the archiving backend (e.g., AWS S3) and more. See WAL-E's documentation for more details.
To enable the WAL-E docker image to perform archiving, it needs to use the network and data volume(s) of the TimescaleDB container. It also needs to know the location of the write-ahead log and data directories. Thus, launch the WAL-E sidecar as follows:
docker run \
--name wale \
--network timescaledb-net \
--volumes-from timescaledb \
-v ~/backups:/backups \
-e WALE_LOG_DESTINATION=stderr \
-e PGWAL=/var/lib/postgresql/data/pg_wal \
-e PGDATA=/var/lib/postgresql/data/pg_data \
-e PGHOST=timescaledb \
-e PGPASSWORD=insecure \
-e PGUSER=postgres \
-e WALE_FILE_PREFIX=file://localhost/backups \
timescale/timescaledb-wale:latest
This will make the WAL-E image listen on commands on port 80 on the
timescaledb-net
internal network and write backups to ~/backups
on
the Docker host.
To do the initial base backup, execute the following command in the
running WAL-E container (assuming the container's name is
timescaledb-wale
):
docker exec wale wal-e backup-push /var/lib/postgresql/data/pg_data
Alternatively, do it via the sidecar's HTTP endpoint (this requires
exposing the sidecar's port 80
on the Docker host by mapping it to,
e.g., port 8080
):
curl http://localhost:8080/backup-push
Base backups should be done at regular intervals (e.g., every day) to minimize the amount of WAL replay, making recoveries faster. To make new base backups, simply re-trigger a base backup as shown above, either manually or on a schedule (e.g., via a CRON job). If you run TimescaleDB on Kubernetes, there is built-in support for scheduling cron jobs that can invoke base backups via, e.g., the WAL-E container's HTTP API.
To recover the database instance from the backup archive, create a new TimescaleDB container:
docker create \
--name timescaledb-recovered \
--network timescaledb-net \
-e POSTGRES_PASSWORD=insecure \
-e POSTGRES_INITDB_WALDIR=/var/lib/postgresql/data/pg_wal \
-e PGDATA=/var/lib/postgresql/data/pg_data \
timescale/timescaledb:latest-pg10 postgres
Now restore the database files from the base backup:
docker run -it --rm \
-v ~/backups:/backups \
--volumes-from timescaledb-recovered \
-e WALE_LOG_DESTINATION=stderr \
-e WALE_FILE_PREFIX=file://localhost/backups \
timescale/timescaledb-wale:latest \wal-e \
backup-fetch /var/lib/postgresql/data/pg_data LATEST
Recreate some configuration files (normally, these are backed up configuration files from the old database instance):
docker run -it --rm \
--volumes-from timescaledb-recovered \
timescale/timescaledb:latest-pg10 \
cp /usr/local/share/postgresql/pg_ident.conf.sample /var/lib/postgresql/data/pg_data/pg_ident.conf
docker run -it --rm \
--volumes-from timescaledb-recovered \
timescale/timescaledb:latest-pg10 \
cp /usr/local/share/postgresql/postgresql.conf.sample /var/lib/postgresql/data/pg_data/postgresql.conf
docker run -it --rm \
--volumes-from timescaledb-recovered \
timescale/timescaledb:latest-pg10 \
sh -c 'echo "local all postgres trust" > /var/lib/postgresql/data/pg_data/pg_hba.conf'
Now create a recovery.conf
that tells PostgreSQL how to recover:
docker run -it --rm \
--volumes-from timescaledb-recovered \
timescale/timescaledb:latest-pg10 \
sh -c 'echo "restore_command='\''/usr/bin/wget wale/wal-fetch/%f -O -'\''" > /var/lib/postgresql/data/pg_data/recovery.conf'
Then run the WAL-E sidecar again (you may have to remove the old one first). It will be used to replay the last WAL segments that may not be reflected in the base backup:
docker run \
--name wale \
--network timescaledb-net \
-v ~/backups:/backups \
--volumes-from timescaledb-recovered \
-e WALE_LOG_DESTINATION=stderr \
-e PGWAL=/var/lib/postgresql/data/pg_wal \
-e PGDATA=/var/lib/postgresql/data/pg_data \
-e PGHOST=timescaledb \
-e PGPASSWORD=insecure \
-e PGUSER=postgres \
-e WALE_FILE_PREFIX=file://localhost/backups \
timescale/timescaledb-wale:latest
Finally, launch the TimescaleDB docker container:
docker start timescaledb-recovered
Verify that the database started up and recovered successfully:
docker logs timescaledb-recovered
Note that it is normal to see some archive recovery "errors" at the end as the recovery will be complete when no further files can be found in the archive. See the PostgreSQL documentation on continuous archiving for more information.
TimescaleDB natively supports a restore point both for single-node and for multi-node environments. The restore point can later be used as a recovery target on single-node or on each node of the entire multi-node cluster, ensuring that it can be restored to a consistent state.
To create a restore point on a single-node, the pg_create_restore_point
function can be used.
On a multi-node cluster, the create_distributed_restore_point
function must be used
instead and executed on the access node. In order to achieve synchronization across the multi-node cluster, this function ensures
that the restore point created on the access node is synchronized with each data node. Care should be taken to ensure
that the data nodes have no locally-running user transactions executed without the access node's involvement when
creating distributed restore point (i.e., by an external user connecting directly to a data node and executing a local
transaction on that data node).
Third-party backup and restore managers such as pg_backrest
,
barman
, or wal-e
can be used and configured to use the restore points.