Creating a PostgreSQL hot replica using Salt Project
PostgreSQL replicas are great for continuous backups and reducing load on a main server. Salt Project is automation, infrastructure management, data-driven orchestration, remote execution, configuration management and so much more, so naturally it is a good choice for creating a PostgreSQL hot replica. In this article we'll ensure PostgreSQL server and client apps are installed, create a secure .pgpass file to access the server, create a Bourne shell script to create a replica using pg_basebackup, call the script to make it happen, then check that all is working as expected. We'll be using salt 3003 and PostgreSQL 14 on FreeBSD 13, but it's applicable to almost any combination and version of these apps, os, and salt.
Installing the apps
Salt Project can look after configuration so let's create a state file to do this.
states/postgres.slspostgres14-client: pkg.installed
postgres14-server: pkg.installed
/home/postgres: file.directory: - user: postgres - group: postgres - mode: 750 - makedirs: True
/root/.pgpass: - managed - user: postgres - group: postgres - mode: 600 - template: jinja2 - source: salt://postgres/.pgpass
Creating a .pgpass file and keeping secrets
Salt Project can include many types of variables in configuration files and uses Jinja2 templating to include them at runtime.
Keeping passwords secret is essential and Salt Project has pillar to store sensitive data.
Here is a sample pillar to hold secrets:
pillar/secrets.slssecrets: postgresql: host: primary.example.com replica: name: replica password: secret
Here is the .pgpass template referred to in states/postgres.sls. As you can see it includes the username and password from pillar/secrets.sls using normal jinja2 moustache variables.
states/postgres/.pgpass*:*:*:{{ salt['pillar.get']('secrets:replica:name') }}:{{ salt['pillar.get']('secrets:replica:password') }}
Create a Bourne shell script to generate a replica
This script will be run as root because this Salt Project minion runs as salt, change your user as required if your minion runs as another user. To lookup replication user and password, the .pgpass must be readable by the postgres server.
states/postgres/pg_make_replica.sh#!/bin/sh
USER=replicaREMOTE_HOST=primary.example.comPGDATA=/var/db/pg_dataPGOWNER=postgresPGGROUP=postgresPG_PASS=/root/.pgpass
do_check_root_dot_pgpass () { if [ ! -f "$PG_PASS" ] then echo "Please create a $PG_PASS file for the postgresql user" else PG_DOT_PASS_OWNER=$(stat -f "%Su" $PG_PASS) if [ "$PG_DOT_PASS_OWNER" = "${PGOWNER}" ] then echo "$PG_PASS is ok" else echo "$PG_PASS must be owned by ${PGOWNER} to enable replication" exit 1 fi fi}
do_pg_make_replica() { if [ "$(id -u )" != 0 ] then echo "This must be run as root" exit 1 fi # Check .pgpass permissions do_check_root_dot_pgpass
if service postgresql status 2> /dev/null then echo "PostgreSQL is running" echo "Stopping the postgresql service before continuing" service postgresql stop fi if service postgresql status 2> /dev/null then echo "PostgreSQL is still running" echo "Unable to continue" exit 1 fi if [ -d "${PGDATA}" ] then echo "Deleting existing data" find "${PGDATA}" -delete mkdir "${PGDATA}" chmod 700 "${PGDATA}" fi echo Starting base backup # Checking slot name SLOT_HOST=$(hostname -s) SLOT_NAME="node_${SLOT_HOST}_slot" SLOT_EXISTS=$(psql --username="${USER}" --tuples-only -h ${REMOTE_HOST} -c "SELECT count(slot_name) as Slots FROM pg_replication_slots WHERE slot_name = '${SLOT_NAME}';") MAKE_SLOT=--write-recovery-conf if [ "$SLOT_EXISTS" -eq 0 ] then MAKE_SLOT=--create-slot fi pg_basebackup --pgdata="${PGDATA}" --host="${REMOTE_HOST}" --username="${USER}" --no-password --wal-method=stream --write-recovery-conf --slot="${SLOT_NAME}" "${MAKE_SLOT}" if [ -d "${PGDATA}" ] then echo "Setting permissions" chmod 700 "${PGDATA}" chown -R "${PGOWNER}":"${PGGROUP}" "${PGDATA}" fi
# Start the service service postgresql start if service postgresql status 2> /dev/null then echo "PostgreSQL successfully replicated and started" else echo "Something went wrong" exit 1 fi}
do_pg_make_replica
Call the pg_make_replica.sh script
As creating replicas is a rare event, Salt Project can fetch a script from the salt master then run it locally.
salt --async 'replica.example.com' cmd.script salt://postgres/pg_make_replica.sh runas=root
Output is captured and can be reviewed after the script has completed using "salt-run jobs.lookup_jid jid" using the Job ID returned from the salt --async command.
Check the status of replicas
By including a host specific slot name when calling pg_basebackup, it's easy to check the status of a replica using pg_replication_slots.
example=# SELECT slot_name, slot_type, active, restart_lsnexample-#FROM pg_replication_slots; slot_name | slot_type | active | restart_lsn--------------------+-----------+--------+-------------- node_replica_slot | physical | t | 2CE/53C15448
There are many monitoring tools such as nagios that can be configured to check replicas are up to date.
The following query might be a useful alert when replicas are offline:
"SELECT count(*) FROM pg_replication_slots WHERE active = false".
YMMV if your replicas are regularly disconnected.
Decommission replicas when no longer required
Replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys, and that the primary does not remove rows which could cause a recovery conflict even when the standby is disconnected.
These WAL files can take up considerable space so it's essential to remove replication slots when a replica is decommissioned to avoid keeping unwanted wal files unless you need them as an archive.
Replication slots are managed using Admin functions.
For example, to create a replication slot for a physical replica:
SELECT pg_create_physical_replication_slot('node_a_slot');
To drop a replication slot when it's no longer required:
SELECT pg_drop_replication_slot('node_a_slot');