Creating a PostgreSQL hot replica using Salt Project
PostgreSQL replicas are great for continuous backups and reducing load on a main server. Salt Project is automation, infrastructure management, data-driven orchestration, remote execution, configuration management and so much more, so naturally it is a good choice for creating a PostgreSQL hot replica. In this article we'll ensure PostgreSQL server and client apps are installed, create a secure .pgpass file to access the server, create a Bourne shell script to create a replica using pg_basebackup, call the script to make it happen, then check that all is working as expected. We'll be using salt 3003 and PostgreSQL 14 on FreeBSD 13, but it's applicable to almost any combination and version of these apps, os, and salt.
Installing the apps
Salt Project can look after configuration so let's create a state file to do this.
states/postgres.sls
postgres14-client:
pkg.installed
postgres14-server:
pkg.installed
/home/postgres:
file.directory:
- user: postgres
- group: postgres
- mode: 750
- makedirs: True
/root/.pgpass:
- managed
- user: postgres
- group: postgres
- mode: 600
- template: jinja2
- source: salt://postgres/.pgpass
Creating a .pgpass file and keeping secrets
Salt Project can include many types of variables in configuration files and uses Jinja2 templating to include them at runtime.
Keeping passwords secret is essential and Salt Project has pillar to store sensitive data.
Here is a sample pillar to hold secrets:
pillar/secrets.sls
secrets:
postgresql:
host: primary.example.com
replica:
name: replica
password: secret
Here is the .pgpass template referred to in states/postgres.sls. As you can see it includes the username and password from pillar/secrets.sls using normal jinja2 moustache variables.
states/postgres/.pgpass
*:*:*:{{ salt['pillar.get']('secrets:replica:name') }}:{{ salt['pillar.get']('secrets:replica:password') }}
Create a Bourne shell script to generate a replica
This script will be run as root because this Salt Project minion runs as salt, change your user as required if your minion runs as another user. To lookup replication user and password, the .pgpass must be readable by the postgres server.
states/postgres/pg_make_replica.sh
#!/bin/sh
USER=replica
REMOTE_HOST=primary.example.com
PGDATA=/var/db/pg_data
PGOWNER=postgres
PGGROUP=postgres
PG_PASS=/root/.pgpass
do_check_root_dot_pgpass () {
if [ ! -f "$PG_PASS" ]
then
echo "Please create a $PG_PASS file for the postgresql user"
else
PG_DOT_PASS_OWNER=$(stat -f "%Su" $PG_PASS)
if [ "$PG_DOT_PASS_OWNER" = "${PGOWNER}" ]
then
echo "$PG_PASS is ok"
else
echo "$PG_PASS must be owned by ${PGOWNER} to enable replication"
exit 1
fi
fi
}
do_pg_make_replica() {
if [ "$(id -u )" != 0 ]
then
echo "This must be run as root"
exit 1
fi
# Check .pgpass permissions
do_check_root_dot_pgpass
if service postgresql status 2> /dev/null
then
echo "PostgreSQL is running"
echo "Stopping the postgresql service before continuing"
service postgresql stop
fi
if service postgresql status 2> /dev/null
then
echo "PostgreSQL is still running"
echo "Unable to continue"
exit 1
fi
if [ -d "${PGDATA}" ]
then
echo "Deleting existing data"
find "${PGDATA}" -delete
mkdir "${PGDATA}"
chmod 700 "${PGDATA}"
fi
echo Starting base backup
# Checking slot name
SLOT_HOST=$(hostname -s)
SLOT_NAME="node_${SLOT_HOST}_slot"
SLOT_EXISTS=$(psql --username="${USER}" --tuples-only -h ${REMOTE_HOST} -c "SELECT count(slot_name) as Slots FROM pg_replication_slots WHERE slot_name = '${SLOT_NAME}';")
MAKE_SLOT=--write-recovery-conf
if [ "$SLOT_EXISTS" -eq 0 ]
then
MAKE_SLOT=--create-slot
fi
pg_basebackup --pgdata="${PGDATA}" --host="${REMOTE_HOST}" --username="${USER}" --no-password --wal-method=stream --write-recovery-conf --slot="${SLOT_NAME}" "${MAKE_SLOT}"
if [ -d "${PGDATA}" ]
then
echo "Setting permissions"
chmod 700 "${PGDATA}"
chown -R "${PGOWNER}":"${PGGROUP}" "${PGDATA}"
fi
# Start the service
service postgresql start
if service postgresql status 2> /dev/null
then
echo "PostgreSQL successfully replicated and started"
else
echo "Something went wrong"
exit 1
fi
}
do_pg_make_replica
Call the pg_make_replica.sh script
As creating replicas is a rare event, Salt Project can fetch a script from the salt master then run it locally.
salt --async 'replica.example.com' cmd.script salt://postgres/pg_make_replica.sh runas=root
Output is captured and can be reviewed after the script has completed using "salt-run jobs.lookup_jid jid" using the Job ID returned from the salt --async command.
Check the status of replicas
By including a host specific slot name when calling pg_basebackup, it's easy to check the status of a replica using pg_replication_slots.
example=# SELECT slot_name, slot_type, active, restart_lsn
example-#FROM pg_replication_slots;
slot_name | slot_type | active | restart_lsn
--------------------+-----------+--------+--------------
node_replica_slot | physical | t | 2CE/53C15448
There are many monitoring tools such as nagios that can be configured to check replicas are up to date.
The following query might be a useful alert when replicas are offline:
"SELECT count(*) FROM pg_replication_slots WHERE active = false".
YMMV if your replicas are regularly disconnected.
Decommission replicas when no longer required
Replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys, and that the primary does not remove rows which could cause a recovery conflict even when the standby is disconnected.
These WAL files can take up considerable space so it's essential to remove replication slots when a replica is decommissioned to avoid keeping unwanted wal files unless you need them as an archive.
Replication slots are managed using Admin functions.
For example, to create a replication slot for a physical replica:
SELECT pg_create_physical_replication_slot('node_a_slot');
To drop a replication slot when it's no longer required:
SELECT pg_drop_replication_slot('node_a_slot');