PostgreSQL relies on replication for high availability, failover, and balancing read loads across multiple nodes. Replication ensures that data written to the primary PostgreSQL database is mirrored on one or more nodes. By virtue of having multiple nodes with an exact copy of the primary database available, the primary database can be replaced with a replica node in the event of a failure or outage on the primary server. Replica nodes can also be used as read only databases (sometimes called "read replicas"), allowing reads to be horizontally scaled by spreading the read query volume across multiple nodes.
TimescaleDB supports replication using PostgreSQL's built-in streaming replication. Using logical replication with TimescaleDB is not recommended, as it requires schema synchronization between the primary and replica nodes and replicating partition root tables, which are not currently supported.
This tutorial will outline the basic configuration needed to set up streaming replication on one or more replicas, covering both synchronous and asynchronous options. It assumes you have at least two separate instances of TimescaleDB running. If you're using our Docker Image, we recommend using a PostgreSQL entrypoint script to run the configuration. For our sample Docker configuration and run scripts, check out our Streaming Replication Docker Repository.
:TIP: PostgreSQL achieves streaming replication by having replicas continuously stream the WAL from the primary database. See the official replication documentation for details. For more information about how PostgreSQL implements Write-Ahead Logging, see their WAL Documentation.
Create a PostgreSQL user with a role that allows it to initialize streaming
replication. This will be the user each replica uses to stream from the primary
database. Run the command as the postgres
user, or another user that is
configured with superuser privileges on the database you're working with.
SET password_encryption = 'scram-sha-256';
CREATE ROLE repuser WITH REPLICATION PASSWORD 'password' LOGIN;
:WARNING: scram-sha-256 is PostgreSQL's most secure password based authentication, but it is only available in PostgreSQL 10 and above. If you are using an earlier version, consider using
md5
authentication by replacing the first line in the above SQL withSET password_encryption = true;
and changing theAUTH_METHOD
inpg_hba
(see Configure Host Based Authentication) tomd5
.
There are several replication settings that must be added to postgresql.conf
(if you're unsure of where PostgreSQL is reading postgresql.conf
from, just
execute show config_file;
in a psql
shell). You can either comment out the
existing settings in postgresql.conf
and add the desired value, or you can
simply append the desired settings to the postgresql.conf
.
synchronous_commit
has a number of settings that strongly impact data
consistency and performance. For this tutorial, we'll focus on the common
setting of turning synchronous_commit
off. For more detail on the different
modes, see Replication Modes
max_wal_senders
- The total number of concurrent connections from replicas or backup clients. At the very least, this should equal the number of replicas you intend to have.wal_level
- The amount of information written to the PostgreSQL Write-Ahead Log (WAL). For replication to work, there needs to be enough data in the WAL to support archiving and replication. The default level ofreplica
covers this, but it bears mentioning here since it is an absolute requirement for streaming replication.max_replication_slots
- The total number of replication slots the primary database can support. See below for more information about replication slots.listen_address
- Since remote replicas will be connecting to the primary to stream the WAL, we'll need to make sure that the primary is not just listening on the local loopback.
The most common streaming replication use case is asynchronous replication with one or more replicas. We'll use that as that as our sample configuration.
In cases where you need stronger consistency on the replicas or where your query load is heavy enough to cause significant lag between the primary and replica nodes in asyncronous mode, you may want to consider one of the synchronous replication configurations.
listen_addresses = '*'
wal_level = replica
max_wal_senders = 1
max_replication_slots = 1
synchronous_commit = off
In this example, the WAL will be streamed to the replica, but the primary server will not wait for confirmation that the WAL has been written to disk on either the primary or the replica. This is the most performant replication configuration, but it does carry the risk of a small amount of data loss in the event of a system crash. It also makes no guarantees that the replica will be fully up to date with the primary, which could cause inconsistencies between read queries on the primary and the replica.
For replication settings to apply, you must restart PostgreSQL, not just reload the configuration file. This needs to be done before creating replication slots in the next step.
After configuring postgresql.conf
and restarting PostgreSQL, create a
replication slot for each replica. Replication slots
ensure that the primary does not delete segments from the WAL until they have
been received by the replicas. This is crucial for cases where a replica goes
down for extended periods of time -- without verifying that a WAL segment has
already been consumed by a replica, the primary may delete data needed for
replication. To some extent, you can achieve this using
archiving, but replication slots provide the strongest
protection of WAL data for streaming replication. The name of the slot is
arbitrary -- we'll call the slot for this replica replica_1_slot
.
SELECT * FROM pg_create_physical_replication_slot('replica_1_slot');
Configure the pg_hba.conf
file (run show hba_file;
in a psql
shell if
you're unsure of its location) to accept connections from the replication user
on the host of each replica.
# TYPE DATABASE USER ADDRESS METHOD AUTH_METHOD
host replication repuser <REPLICATION_HOST_IP>/32 scram-sha-256
:TIP: The above settings will restrict replication connections to traffic coming from
REPLICATION_HOST_IP
as the PostgreSQL userrepuser
with a valid password.REPLICATION_HOST_IP
will be able to initiate streaming replication from that machine without additional credentials. You may want to change theaddress
andmethod
values to match your security and network settings. Read more aboutpg_hba.conf
in the official documentation.
Replicas work by streaming the primary server's WAL log and replaying its transactions in what PostgreSQL calls "recovery mode". Before this can happen, the replica needs to be in a state where it can replay the log. This is achieved by restoring the replica from a base backup of the primary instance.
Stop PostgreSQL. If the replica's PostgreSQL database already has data, you will
need to remove it prior to running the backup. This can be done by removing the
contents of the PostgreSQL data directory. To determine the location of the
data directory, run show data_directory;
in a psql
shell.
rm -rf <DATA_DIRECTORY>/*
Now run the pg_basebackup
command using the IP address of the primary database
along with the replication username.
pg_basebackup -h <PRIMARY_IP> -D <DATA_DIRECTORY> -U repuser -vP -W
:WARNING: The -W flag will prompt you for a password on the command line. This may cause problems for automated setups. If you are using password based authentication in an automated setup, you may need to make use of a pgpass file.
When the backup finishes, create a recovery.conf file
in your data directory, ensuring it has the proper permissions. When
PostgreSQL finds a recovery.conf
file in its data directory, it knows to start
up in recovery mode and begin streaming the WAL through the replication
protocol.
touch <DATA_DIRECTORY>/recovery.conf
chmod 0600 <DATA_DIRECTORY>/recovery.conf
Add settings for communicating with the primary server to recovery.conf
. In
streaming replication, the application_name
in primary_conninfo
should be
the same as the name used in the primary's synchronous_standby_names
settings.
standby_mode = on # Ensures that the replica continues to fetch WAL records from the primary
primary_conninfo = 'host=<PRIMARY_IP> port=5432 user=repuser password=<POSTGRES_USER_PASSWORD> application_name=r1'
primary_slot_name = 'replica_1_slot' # Name of the replication slot we created on the master
Next, update the postgresql.conf
file to mirror the configuration of the
primary database. For asynchronous replication, this would look like:
hot_standby = on
wal_level = replica
max_wal_senders = 2
max_replication_slots = 2
synchronous_commit = off
:WARNING: In order to allow reads on the replica,
hot_standby
must be set toon
. This allows read-only queries on the replica. By default, this setting is set toon
in PostgreSQL 10, but in earlier versions it defaults tooff
.
Finally, restart PostgreSQL. At this point, the replica should be fully synchronized with the primary database and prepared to stream from it. The logs on the replica should look something like this:
LOG: database system was shut down in recovery at 2018-03-09 18:36:23 UTC
LOG: entering standby mode
LOG: redo starts at 0/2000028
LOG: consistent recovery state reached at 0/3000000
LOG: database system is ready to accept read only connections
LOG: started streaming WAL from primary at 0/3000000 on timeline 1
Any clients will be able to perform reads on the replica. Verify this by running inserts, updates, or other modifications to your data on the primary and querying the replica to ensure they have been properly copied over. This is fully compatible with TimescaleDB's functionality, provided you set up TimescaleDB on the primary database.
This walkthrough gets asynchronous streaming replication working, but
in many cases stronger consistency between the primary and replicas is
required. Under heavy workloads, replicas can lag far behind the primary,
providing stale data to clients reading from the replicas. Moreover, in cases
where any data loss is fatal, asynchronous replication may not provide enough
of a durability guarantee. Luckily synchronous_commit
has several options
with varying consistency/performance tradeoffs:
:WARNING: If
synchronous_standby_names
is empty, the settingson
,remote_apply
,remote_write
andlocal
all provide the same synchronization level: transaction commits only wait for local flush to disk.
on
- Default value. The server will not return "success" until the WAL transaction has been written to disk on the primary and any replicas.off
- The server will return "success" when the WAL transaction has been sent to the operating system to write to the WAL on disk on the primary, but will not wait for the operating system to actually write it. This can cause a small amount of data loss if the server crashes when some data has not been written, but it will not result in data corruption. Turningsynchronous_commit
off is a well known PostgreSQL optimization for workloads that can withstand some data loss in the event of a system crash.local
- Enforceson
behavior only on the primary server.remote_write
- The database will return "success" to a client when the WAL record has been sent to the operating system for writing to the WAL on the replicas, but before confirmation that the record has actually been persisted to disk. This is basically asynchronous commit except it waits for the replicas as well as the primary. In practice, the extra wait time incurred waiting for the replicas significantly decreases replication lag.remote_apply
- Requires confirmation that the WAL records have been written to the WAL and applied to the databases on all replicas. This provides the strongest consistency of any of thesynchronous_commit
options. In this mode, replicas will always reflect the latest state of the primary, and the concept of replication lag (see Replication Diagnostics) is basically non-existent.
This matrix visualizes the level of consistency each mode provides:
Mode | WAL Sent to OS (Primary) | WAL Persisted (Primary) | WAL Sent to OS (Primary + Replicas) | WAL Persisted (Primary + Replicas) | Transaction Applied (Primary + Replicas) |
---|---|---|---|---|---|
Off | X | ||||
Local | X | X | |||
Remote Write | X | X | X | ||
On | X | X | X | X | |
Remote Apply | X | X | X | X | X |
An important complementary setting to synchronous_commit
is
synchronous_standby_names
. This setting lists the names of all replicas the
primary database will support for synchronous replication, and configures how
the primary database will wait for them. The setting supports several
different formats:
FIRST num_sync (replica_name_1, replica_name_2)
- This will wait for confirmation from the firstnum_sync
replicas before returning "success". The list of replica_names determines the relative priority of the replicas. Replica names are determined by theapplication_name
setting on the replicas.ANY num_sync (replica_name_1, replica_name_2)
- This will wait for confirmation fromnum_sync
replicas in the provided list, regardless of their priority/position in the list. This is essentially a quorum function.
:WARNING: Any synchronous replication mode will force the primary to wait until all required replicas have written the WAL or applied the database transaction, depending on the
synchronous_commit
level. This could cause the primary to hang indefinitely if a required replica crashes. When the replica reconnects, it will replay any of the WAL it needs to catch up. Only then will the primary be able to resume writes. To mitigate this, provision more than the amount of nodes required under thesynchronous_standby_names
setting and list them in theFIRST
orANY
clauses. This will allow the primary to move forward as long as a quorum of replicas have written the most recent WAL transaction. Replicas that were out of service will be able to reconnect and replay the missed WAL transactions asynchronously.
PostgreSQL provides a valuable view for getting information about each replica
-- pg_stat_replication. Run select * from pg_stat_replication;
from the primary database to view this data. The output
looks like this:
-[ RECORD 1 ]----+------------------------------
pid | 52343
usesysid | 16384
usename | repuser
application_name | r2
client_addr | 10.0.13.6
client_hostname |
client_port | 59610
backend_start | 2018-02-07 19:07:15.261213+00
backend_xmin |
state | streaming
sent_lsn | 16B/43DB36A8
write_lsn | 16B/43DB36A8
flush_lsn | 16B/43DB36A8
replay_lsn | 16B/43107C28
write_lag | 00:00:00.009966
flush_lag | 00:00:00.03208
replay_lag | 00:00:00.43537
sync_priority | 2
sync_state | sync
-[ RECORD 2 ]----+------------------------------
pid | 54498
usesysid | 16384
usename | repuser
application_name | r1
client_addr | 10.0.13.5
client_hostname |
client_port | 43402
backend_start | 2018-02-07 19:45:41.410929+00
backend_xmin |
state | streaming
sent_lsn | 16B/43DB36A8
write_lsn | 16B/43DB36A8
flush_lsn | 16B/43DB36A8
replay_lsn | 16B/42C3B9C8
write_lag | 00:00:00.019736
flush_lag | 00:00:00.044073
replay_lag | 00:00:00.644004
sync_priority | 1
sync_state | sync
This view is particularly useful for calculating replication lag, which
measures how far behind the primary the current state of the replica is. The
replay_lag
field gives a measure of the seconds between the most recent WAL
transaction on the primary and the last reported database commit on the replica.
Coupled with write_lag
and flush_lag
, this provides insight into how far
behind the replica is. The *_lsn
fields also come in handy, allowing you to
compare WAL locations between the primary and the replicas. Finally, the
state
field is useful for determining exactly what each replica is currently
doing (available modes are startup
, catchup
, streaming
, backup
, and
stopping
).
PostgreSQL offers failover functionality (i.e., promoting the replica to the
primary in the event of a failure on the primary) through pg_ctl
or the trigger_file
, but it does not provide out-of-the-box support for
automatic failover. Read more in the PostgreSQL failover
documentation). patroni offers a configurable
high availability solution with automatic failover functionality.