Database Reference
In-Depth Information
You may also wish to increase
max_wal_senders
, so that it will be possible to reconnect
even before a dropped connection is noted; this allows a manual restart to re-establish
connections more easily. If you do this, then also increase the connection limit for the
replication user.
Data transfer may stop because the connection drops or the Standby server or the Standby
system is shutdown. If replication data transfer stops for any reason, it will attempt to restart
from the point of last transfer.
For streaming replication, the Master keeps a number of files that is at least
wal_keep_
segments
. If the Standby database server has been down for long enough, the Master will have
moved on and will no longer have the data for the last point of transfer. If that should occur, then
the Standby needs to be re-configured using the same procedure with which we started.
Note that the Standby database server will not be streaming during the initial base backup,
so if the base backup is long enough, we might end up with a situation where replication will
never start because the desired starting point is no longer available on the Master. This is the
error that you'll get:
FATAL: requested WAL segment 000000010000000000000002 has already
been removed
It's very annoying, and there's no way out of that. You need to start over. So, start with a very
high
wal_keep_segments
. If you still get that error, then we need to increase
wal_keep_
segments
and try again, possibly also using techniques to speed up the base backup,
discussed in the
Backup
chapter. If you can't set
wal_keep_segments
high enough, then we
must move to a configuration where the archive is on a third server with increased disk storage
capacity. The Master will need to have an
archive_command
that places files on the archive
server, rather than the dummy command shown in the preceding procedure, in addition to
parameter settings to allow streaming to take place. The Standby will need to retrieve files from
the archive using
restore_command
, as well as streaming using
primary_conninfo
. Thus,
both Master and Standby have two modes for sending and receiving, and can switch between
them should failures occur. This is the typical configuration for large databases. Note that this
means that WAL data will be copied twice: once to the archive and once directly to the Standby.
Two copies are more expensive, but also more robust.
The reason for setting
archive_mode
=
on
in the preceding procedure is that altering that
parameter requires a restart, so you may as well set it on just in case you need it later. All
we need to do is use a dummy
archive_command
to ensure everything still works OK; by
dummy command, I mean a command that will do nothing and then return
rc
=0.
One thing that is a possibility is to set
archive_command
only until the end of the catch
up period. After that you can reset it to the dummy value ("cd") and then continue just with
streaming replication. Data is only transferred from the Master to the Standby once that data
has been written (or more precisely, fsynced) to disk. So setting
synchronous_commit
=
off
will not improve the replication delay, even if that improves performance on the Master.
Once WAL data is received by the Standby, the WAL data is fsynced to disk on the Standby to
ensure that it is not lost if the Standby system restarts.