Creating a PostgreSQL hot replica using Salt Project

PostgreSQL replicas are great for continuous backups and reducing load on a main server. Salt Project is automation, infrastructure management, data-driven orchestration, remote execution, configuration management and so much more, so naturally it is a good choice for creating a PostgreSQL hot replica. In this article we'll ensure PostgreSQL server and client apps are installed, create a secure .pgpass file to access the server, create a Bourne shell script to create a replica using pg_basebackup, call the script to make it happen, then check that all is working as expected. We'll be using salt 3003 and PostgreSQL 14 on FreeBSD 13, but it's applicable to almost any combination and version of these apps, os, and salt.

Installing the apps

Salt Project can look after configuration so let's create a state file to do this.

states/postgres.sls
postgres14-client:
    pkg.installed

postgres14-server:
    pkg.installed

/home/postgres:
    file.directory:
        - user: postgres
        - group: postgres
        - mode: 750
        - makedirs: True

/root/.pgpass:
    - managed
    - user: postgres
    - group: postgres
    - mode: 600
    - template: jinja2
    - source: salt://postgres/.pgpass

Creating a .pgpass file and keeping secrets

Salt Project can include many types of variables in configuration files and uses Jinja2 templating to include them at runtime.

Keeping passwords secret is essential and Salt Project has pillar to store sensitive data.

Here is a sample pillar to hold secrets:

pillar/secrets.sls
secrets:
    postgresql:
        host: primary.example.com
    replica:
        name: replica
        password: secret

Here is the .pgpass template referred to in states/postgres.sls. As you can see it includes the username and password from pillar/secrets.sls using normal jinja2 moustache variables.

states/postgres/.pgpass
*:*:*:{{ salt['pillar.get']('secrets:replica:name') }}:{{ salt['pillar.get']('secrets:replica:password') }}

Create a Bourne shell script to generate a replica

This script will be run as root because this Salt Project minion runs as salt, change your user as required if your minion runs as another user. To lookup replication user and password, the .pgpass must be readable by the postgres server.

states/postgres/pg_make_replica.sh
#!/bin/sh

USER=replica
REMOTE_HOST=primary.example.com
PGDATA=/var/db/pg_data
PGOWNER=postgres
PGGROUP=postgres
PG_PASS=/root/.pgpass

do_check_root_dot_pgpass () {
    if [ ! -f "$PG_PASS" ]
    then
        echo "Please create a $PG_PASS file for the postgresql user"
    else
        PG_DOT_PASS_OWNER=$(stat -f "%Su" $PG_PASS)
        if [ "$PG_DOT_PASS_OWNER" = "${PGOWNER}" ]
        then
            echo "$PG_PASS is ok"
        else
            echo "$PG_PASS must be owned by ${PGOWNER} to enable replication"
            exit 1
        fi
    fi
}

do_pg_make_replica() {
    if [ "$(id -u )" != 0 ]
    then
        echo "This must be run as root"
        exit 1
    fi
    # Check .pgpass permissions
    do_check_root_dot_pgpass

    if service postgresql status 2> /dev/null
    then
        echo "PostgreSQL is running"
        echo "Stopping the postgresql service before continuing"
        service postgresql stop
    fi
    if service postgresql status 2> /dev/null
    then
        echo "PostgreSQL is still running"
        echo "Unable to continue"
        exit 1
    fi
    
    if [ -d "${PGDATA}" ]
    then
        echo "Deleting existing data"
        find "${PGDATA}" -delete
        mkdir "${PGDATA}"
        chmod 700 "${PGDATA}"
    fi
    echo Starting base backup
    # Checking slot name
    SLOT_HOST=$(hostname -s)
    SLOT_NAME="node_${SLOT_HOST}_slot"
    SLOT_EXISTS=$(psql --username="${USER}" --tuples-only -h ${REMOTE_HOST} -c "SELECT count(slot_name) as Slots FROM pg_replication_slots WHERE slot_name = '${SLOT_NAME}';")
    MAKE_SLOT=--write-recovery-conf
    if [ "$SLOT_EXISTS" -eq 0 ]
    then
        MAKE_SLOT=--create-slot
    fi
    pg_basebackup --pgdata="${PGDATA}" --host="${REMOTE_HOST}" --username="${USER}" --no-password --wal-method=stream --write-recovery-conf --slot="${SLOT_NAME}" "${MAKE_SLOT}" 
    if [ -d "${PGDATA}" ]
    then
        echo "Setting permissions"
        chmod 700 "${PGDATA}"
        chown -R "${PGOWNER}":"${PGGROUP}" "${PGDATA}"
    fi

    # Start the service
    service postgresql start
    if service postgresql status 2> /dev/null
    then
        echo "PostgreSQL successfully replicated and started"
    else
        echo "Something went wrong"
        exit 1
    fi
}

do_pg_make_replica

Call the pg_make_replica.sh script

As creating replicas is a rare event, Salt Project can fetch a script from the salt master then run it locally.

salt --async 'replica.example.com' cmd.script salt://postgres/pg_make_replica.sh runas=root

Output is captured and can be reviewed after the script has completed using "salt-run jobs.lookup_jid jid" using the Job ID returned from the salt --async command.

Check the status of replicas

By including a host specific slot name when calling pg_basebackup, it's easy to check the status of a replica using pg_replication_slots.

example=# SELECT slot_name, slot_type, active, restart_lsn
example-#FROM pg_replication_slots;
     slot_name      | slot_type | active | restart_lsn
--------------------+-----------+--------+--------------
 node_replica_slot  | physical  | t      | 2CE/53C15448


There are many monitoring tools such as nagios that can be configured to check replicas are up to date.

The following query might be a useful alert when replicas are offline:

"SELECT count(*) FROM pg_replication_slots WHERE active = false".

YMMV if your replicas are regularly disconnected.

Decommission replicas when no longer required

Replication slots provide an automated way to ensure that the primary does not remove WAL segments until they have been received by all standbys, and that the primary does not remove rows which could cause a recovery conflict even when the standby is disconnected.

These WAL files can take up considerable space so it's essential to remove replication slots when a replica is decommissioned to avoid keeping unwanted wal files unless you need them as an archive.

Replication slots are managed using Admin functions.

For example, to create a replication slot for a physical replica:

SELECT pg_create_physical_replication_slot('node_a_slot');

To drop a replication slot when it's no longer required:

SELECT pg_drop_replication_slot('node_a_slot');