PostgreSQL Physical Backup Guide with pg_basebackup
PostgreSQL Physical Backups with pg_basebackup
pg_basebackup takes a physical, binary-level snapshot of a running PostgreSQL cluster. Unlike pg_dump, which produces a logical export of one database at a time, pg_basebackup captures the entire data directory — all databases, configuration files, and enough WAL to make the backup consistent. The result is suitable for two critical operations: initializing a streaming replication standby, and serving as the base for point-in-time recovery (PITR).
The connection used by pg_basebackup requires the REPLICATION privilege. On most production clusters, a dedicated replication role handles this. The backup process runs against a live cluster without requiring downtime or exclusive locks.
Sample Code
1# Compressed tar format — suitable for archiving and PITR
2pg_basebackup \
3 --host=localhost \
4 --port=5432 \
5 --username=replicator \
6 --pgdata=/var/backups/pgbase_$(date +%Y%m%d) \
7 --format=tar \
8 --gzip \
9 --wal-method=stream \
10 --checkpoint=fast \
11 --progress \
12 --verbose
13
14# Plain directory format — suitable for initializing a standby
15pg_basebackup \
16 --host=localhost \
17 --port=5432 \
18 --username=replicator \
19 --pgdata=/var/lib/postgresql/standby \
20 --format=plain \
21 --wal-method=stream \
22 --checkpoint=fast \
23 --progress
Notes: Requires PostgreSQL 9.1 or later. The replication user must have the REPLICATION attribute (CREATE ROLE replicator REPLICATION LOGIN). pg_hba.conf must allow a replication connection from the backup host. --wal-method=stream requires one extra replication slot or connection above max_wal_senders.
Code Breakdown
--pgdata/-D— output directory; must not exist or must be empty--format=tar— producesbase.tar.gzplus one tar per tablespace; safe for archiving--format=plain— copies files directly into--pgdata; the standby can start from this directory as-is--gzip— compress tar output; only valid with--format=tar--wal-method=stream— streams WAL concurrently during the backup, ensuring the backup is self-contained; preferred overfetchfor PITR safety--wal-method=fetch— copies WAL at the end; simpler but risks WAL being recycled before the backup completes on busy clusters--checkpoint=fast— triggers an immediate checkpoint at backup start; reduces backup duration at the cost of a brief I/O spike; usespreadon I/O-sensitive systems--progress— prints transfer progress to stderr--verbose— logs each file as it is copied--max-rate— optional; limits transfer rate in kB/s to reduce I/O impact on the primary
Key Points
- The backup directory produced by
--format=plainis a complete, ready-to-start PostgreSQL data directory — add astandby.signalfile and apostgresql.confwithprimary_conninfoto turn it into a streaming standby --wal-method=streamrequiresmax_wal_sendersto be set high enough to accommodate the extra connection; a common production value ismax_wal_senders = 10- Always verify the backup completed without errors before relying on it; the
--verify-checksumsflag (PostgreSQL 11+) validates data page checksums during the backup - Backup duration depends on cluster size and disk throughput; use
--max-rateto avoid saturating I/O on the primary during business hours - For PITR, archive WAL continuously with
archive_mode = onandarchive_command; the base backup is the starting point, archived WAL fills the gap to the recovery target
Insights and Explanations
To monitor a pg_basebackup in progress from another session (PostgreSQL 13+), query pg_stat_progress_basebackup:
1SELECT
2 pid,
3 phase,
4 backup_total,
5 backup_streamed,
6 CASE backup_total
7 WHEN 0 THEN 0
8 ELSE ROUND(
9 100.0 * backup_streamed / backup_total, 1
10 )
11 END AS pct_complete
12FROM pg_stat_progress_basebackup;
To create the replication role used by pg_basebackup:
1CREATE ROLE replicator
2 REPLICATION
3 LOGIN
4 PASSWORD 'strong_password_here';
Then add a line to pg_hba.conf on the primary:
1host replication replicator <backup_host_ip>/32 scram-sha-256
After adding the standby or restoring a PITR backup, verify the cluster came up clean by checking pg_is_in_recovery():
1SELECT pg_is_in_recovery();
Returns true on a standby or during PITR recovery, false on a primary that has reached the recovery target and promoted.
Additional Considerations
- Permissions: the connecting role must have
REPLICATIONattribute;pg_hba.confmust allow areplication-type connection wal_level: must bereplicaorlogicalon the primary; the default since PostgreSQL 10 isreplica- Tablespaces: if the cluster uses tablespaces,
--format=tarproduces a separate tar per tablespace; plain format recreates the symlink structure automatically - Verification: use
pg_verifybackup(PostgreSQL 13+) to verify the backup manifest after the backup completes:pg_verifybackup /var/backups/pgbase_<date>