Monitor PostgreSQL Replication Lag with pg_stat_replication

Monitor PostgreSQL Replication Lag with pg_stat_replication

A standby lagging more than a few seconds is rarely a transient hiccup — it is a query backlog on the standby, a network ceiling between the nodes, a misconfigured hot_standby_feedback, or a single-threaded apply storm. pg_stat_replication joined to pg_wal_lsn_diff() exposes each replica's byte lag and time lag in one row, on the primary, no SSH required.

Purpose and Overview

PostgreSQL streaming replication is asynchronous by default. The primary commits a transaction, writes the WAL record locally, the WAL sender forwards it to the standby, the standby writes it to its own WAL, flushes the WAL to disk, and finally replays it into the data files. Each stage runs on a separate thread of execution, and each stage has its own latency. Under load they diverge: write may be near-instant, flush a few milliseconds, and replay seconds or minutes behind on a busy standby.

pg_stat_replication exposes one row per connected standby, with the LSNs (log sequence numbers) tracking each stage — sent_lsn, write_lsn, flush_lsn, replay_lsn — plus the wall-clock intervals from primary commit until each acknowledgment. Subtracting the standby's replay_lsn from the primary's pg_current_wal_lsn() gives byte lag in WAL bytes; the pg_wal_lsn_diff function makes the subtraction safe across LSN boundaries.

Lag is operationally significant for three reasons: read-after-write queries against a lagging standby return stale data; synchronous-commit replicas that lag block primary commits; and failover to a lagging standby loses recent transactions. Measuring all three from a single query on the primary is what this view is for.

Sample Code

 1SELECT
 2    pid,
 3    application_name,
 4    client_addr,
 5    state,
 6    sync_state,
 7    pg_wal_lsn_diff(
 8        pg_current_wal_lsn(),
 9        replay_lsn
10    )                              AS replay_lag_bytes,
11    write_lag,
12    flush_lag,
13    replay_lag,
14    backend_start,
15    NOW() - backend_start          AS connection_age
16FROM pg_stat_replication
17ORDER BY replay_lag_bytes DESC NULLS LAST;

Notes: Requires PostgreSQL 10+ for the write_lag / flush_lag / replay_lag interval columns; the LSN columns and pg_wal_lsn_diff work on 9.4+ (replace _wal_ with _xlog_ on pre-10 servers). Run on the primarypg_stat_replication is empty on a standby. Requires pg_monitor or superuser to see all rows.

Code Breakdown

The query is a per-standby snapshot. Each column either identifies the standby, classifies its replication mode, or quantifies a specific stage of lag.

pg_stat_replication on the Primary

The view is populated by WAL-sender processes on the primary — one row per active standby connection. On a standby the view is empty, which is also a useful signal: a primary with no rows here has no standbys connected, regardless of what pg_hba.conf and primary_conninfo suggest is configured.

pg_wal_lsn_diff for Byte Lag

pg_wal_lsn_diff(a, b) returns the byte distance between two LSNs as a numeric. Positive when a is ahead of b. The standard byte-lag formula is pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) — bytes of WAL produced on the primary that the standby has not yet applied.

state and sync_state Columns

state is the WAL sender's status: startup during initial handshake, catchup while a recently-disconnected standby is streaming the backlog, streaming during normal operation, backup during a pg_basebackup run. A streaming row is the healthy steady state; a catchup row is temporary and expected after reconnection or for a brand-new replica.

sync_state is the synchronous-replication classification: async (no commit gating), sync (commits gate on this standby's flush), potential (next in line if a sync replica disconnects), quorum (one of a quorum-N group).

The Three Interval-Typed Lag Columns

write_lag, flush_lag, and replay_lag are wall-clock intervals between the primary's commit and the corresponding standby acknowledgment. Under steady load, expect write_lag < flush_lag < replay_lag; under unusual load patterns (synchronous_commit storms, parallel-bulk loads), the ordering can invert briefly.

Key Replication LSN Stages

sent_lsn (Primary Has Sent)

The LSN the WAL sender has handed off to the network socket. If sent_lsn lags pg_current_wal_lsn() significantly, the WAL sender on the primary is the bottleneck — usually CPU-bound encryption or compression, or the primary's network output buffer is saturating.

write_lsn (Standby Has Received and Written)

The LSN the standby has written to its own WAL buffers. Lag between sent_lsn and write_lsn indicates network latency or standby write-side saturation.

flush_lsn (Standby Has fsync'd)

The LSN durably persisted on the standby's disk. Synchronous replication with synchronous_commit = on blocks the primary commit until this LSN advances past the transaction's commit LSN. Standby disk I/O is the typical bottleneck here.

replay_lsn (Standby Has Applied)

The LSN actually visible to queries on the standby. Replay is single-threaded per database, so a heavy write workload on the primary can outpace replay even when the standby is idle of client queries. This is the LSN that matters for "is the read consistent" questions.

Practical Applications

Detecting a Standby Falling Behind in catchup State

A row with state = 'catchup' and replay_lag_bytes shrinking over successive query runs is a healthy reconnect. A row with state = 'catchup' and lag growing means the primary is outpacing what the standby can apply — escalate via larger wal_buffers, faster standby I/O, or synchronous_commit = off on the primary side until the standby catches up.

Synchronous Replication Health Checks

For a sync_state = 'sync' standby, monitor flush_lag rather than replay_lag — flush is what gates primary commits, and a high flush_lag with low replay_lag means the standby is acknowledging fsyncs but falling behind in apply. Both matter, but they signal different problems.

Logical Replication Subscriber Lag Verification

Logical replication subscribers also appear in pg_stat_replication with application_name matching the subscription name. Pair with the standby-side pg_stat_subscription and pg_stat_subscription_stats (PostgreSQL 15+) for apply errors. Divergence between the two views — primary says lag is bounded, subscriber reports apply errors — points at conflicts the apply worker is silently retrying.

1-- Run on the STANDBY
2SELECT
3    pg_is_in_recovery()                            AS in_recovery,
4    pg_last_wal_receive_lsn()                      AS last_received,
5    pg_last_wal_replay_lsn()                       AS last_replayed,
6    pg_last_xact_replay_timestamp()                AS last_xact_replayed,
7    NOW() - pg_last_xact_replay_timestamp()        AS time_lag;

Sizing WAL Retention with Replication Slots

If a standby disconnects long enough for the primary to recycle WAL it needs, the standby cannot resume from streaming and must be re-bootstrapped from a fresh base backup. pg_replication_slots lets the primary reserve WAL for a specific standby; wal_keep_size (PostgreSQL 13+) reserves unconditionally. Pair pg_stat_replication with pg_replication_slots to confirm both ends agree on what's being retained.

Version Compatibility

pg_stat_replication itself has existed since streaming replication landed in PostgreSQL 9.0, but the modern column set materialized over several releases. The write_lag, flush_lag, and replay_lag interval columns were added in PostgreSQL 10. The LSN columns and pg_wal_lsn_diff function work on PostgreSQL 9.4+; on 9.4 through 9.6 the function and the columns are named with xlog instead of wal (e.g. pg_current_xlog_location(), pg_xlog_location_diff).

pg_replication_slots was added in 9.4 alongside the physical replication slot feature; logical replication slots followed in 9.4 as well. wal_keep_size (replacing the integer wal_keep_segments) is PostgreSQL 13+ and accepts a human-readable size like '1GB'.

hot_standby_feedback interacts with replication: enabling it makes the standby send its oldest snapshot back to the primary so the primary holds off on vacuuming tuples the standby still needs. Without it, long-running queries on the standby can be cancelled by replay; with it, those queries can cause bloat on the primary. The trade-off is workload-specific.

Best Practices

  • Always query on the primarypg_stat_replication on a standby returns no rows; only the primary owns the WAL-sender process records
  • Pair byte lag with time lag — bytes scale with WAL volume, time scales with apply speed; both matter
  • Use replication slots for critical standbys — guarantees the primary retains the WAL the standby needs, at the cost of unbounded pg_wal growth if the standby disconnects forever
  • Alert on growing lag, not threshold crossings — a standby that has been 500 MB behind for an hour is more concerning than a brief 2 GB spike during a bulk load
  • Monitor pg_stat_subscription on logical subscribers — apply errors there are invisible to pg_stat_replication on the publisher
  • Audit sync_state periodically — a config drift that flips a sync standby to async is silent unless you watch for it

References

Posts in this series