Monitor PostgreSQL WAL Archiver Status with pg_stat_archiver
Monitor PostgreSQL WAL Archiver Status with pg_stat_archiver
pg_stat_archiver is the canonical place to check whether PostgreSQL's continuous archiving is keeping up — a single-row view that exposes last_archived_wal, last_failed_wal, and the rolling failure count without parsing log files. On a healthy cluster the row is uninteresting; on a stalled cluster it is the difference between catching a problem in minutes and finding a full WAL volume on Saturday night.
Purpose and Overview
A PostgreSQL cluster configured for continuous archiving runs the archive_command (or archive_library on PostgreSQL 15+) once per finalized WAL segment. The command's job is to copy the segment somewhere durable — typically S3, NFS, or a backup server — so the WAL stream can later feed point-in-time recovery (PITR) or warm-standby replay. When the archive command fails (target unreachable, disk full, permission revoked, network partition), PostgreSQL retries continuously, pg_wal accumulates segments, and the cluster eventually fills its WAL volume. At that point writes stop until space is freed.
pg_stat_archiver is the operational pulse of this subsystem. It exposes counts of archived and failed segments, the names of the last segment archived and the last that failed, and the timestamps of each — enough to alert on a stalled archiver before the WAL volume becomes an incident. The view returns exactly one row; the row updates as the archiver processes segments and never accumulates history beyond the "last success" and "last failure" pair.
Because the view is single-row and counter-based, the operational metric is not the raw value but the trend: a growing failed_count while archived_count stays flat is the unmistakable signature of a wedged archiver, and a last_failed_time newer than last_archived_time is the cleanest "broken right now" signal.
Sample Code
1SELECT
2 archived_count,
3 last_archived_wal,
4 last_archived_time,
5 failed_count,
6 last_failed_wal,
7 last_failed_time,
8 stats_reset,
9 NOW() - last_archived_time AS since_last_archive,
10 CASE
11 WHEN last_failed_time IS NULL
12 THEN 'archiver healthy'
13 WHEN last_archived_time IS NULL
14 THEN 'archiver has never succeeded'
15 WHEN last_failed_time > last_archived_time
16 THEN 'archiver currently failing'
17 ELSE 'archiver healthy'
18 END AS status
19FROM pg_stat_archiver;
Notes: Available in all supported PostgreSQL versions when archive_mode is on or always. The view always returns exactly one row. Readable by any role; no superuser required for the view itself, though archive_mode and archive_command settings require a restart and superuser to change.
Code Breakdown
The query is a single-row SELECT with two derived columns. Every counter and timestamp matters; the derived columns translate raw values into the metrics that drive an alert.
The pg_stat_archiver Single-Row View
pg_stat_archiver returns exactly one row per server. Counters accumulate from server start or from the last SELECT pg_stat_reset_shared('archiver') call. The view is global to the cluster (not per-database), which is why the row exists even on databases that perform no work directly — archive_mode is a cluster-level setting.
Success Counters
archived_count is the running total of WAL segments successfully archived. last_archived_wal is the filename of the most recent success (e.g. 000000010000000100000027). last_archived_time is the wall-clock timestamp of that success. On a busy cluster the success counters should advance every few seconds to minutes; on a low-write cluster with archive_timeout set, advancement is bounded by that timeout.
Failure Counters
failed_count, last_failed_wal, and last_failed_time are the parallel trio for failures. failed_count rises by one for every retry of an unsuccessful command — a steady increase while archived_count is flat means the archiver is wedged on a single segment.
The Derived since_last_archive and status Columns
since_last_archive is NOW() - last_archived_time, the interval since the last success. Combined with the cluster's normal WAL generation cadence, it is the primary alerting metric. The status CASE collapses the timestamps into a single verdict: healthy when no failure has occurred or the last failure predates the last success; currently failing when the last failure is newer than the last success; never succeeded when archiving started but no segment has yet shipped.
Key Archiver Health Signals
Time Since Last Archive
The most direct signal. Set an alerting threshold matched to your cluster's WAL generation rate — alert if no archive in 15 minutes on a system that normally archives every minute, or 24 hours on a low-write system protected by archive_timeout.
Counter Divergence (failed_count Climbing Without archived_count)
A growing failed_count while archived_count is flat is the unmistakable wedged-archiver signature. PostgreSQL retries the same segment until it succeeds, so the count grows linearly with retry rate.
pg_wal Directory Growth
The downstream symptom. A failing archiver means WAL segments accumulate in pg_wal because PostgreSQL will not recycle a segment until it has been archived (and replication slots, if present, have also moved past it).
ready-vs-done Status File Backlog
Each WAL segment has a corresponding file in pg_wal/archive_status whose name ends in .ready (waiting to be archived) or .done (successfully archived). The ratio is the most precise queue-depth signal.
1SELECT
2 COUNT(*) FILTER (WHERE name LIKE '%.ready') AS ready_to_archive,
3 COUNT(*) FILTER (WHERE name LIKE '%.done') AS already_archived
4FROM pg_ls_dir('pg_wal/archive_status') AS files(name);
Standby Archiving with archive_mode = always
archive_mode = always lets a standby archive too — useful when the standby is the durable copy you want shipping WAL to long-term storage. On a standby the archiver runs but only ships segments after replay, so since_last_archive lags the primary by replay latency.
Practical Applications
Early Alerting on a Stalled Archiver
Wire the since_last_archive value into your monitoring system with a threshold tuned to your normal cadence. The metric is cheap to sample (single-row view), updates the moment archiving stalls, and gives you minutes-to-hours of lead time before pg_wal saturates the volume.
PITR Readiness Verification
Before relying on PITR for a recovery scenario, verify the archive is current. A cluster with failed_count > 0 and last_failed_time recent has a gap in its WAL stream, and PITR to any target after that gap will fail at recovery time — not when you can still fix it.
Sizing the WAL Volume Against archive_timeout
archive_timeout forces a WAL segment switch even on idle clusters, bounding the recovery-point objective (RPO) but multiplying the segment count under low write load. Size the pg_wal volume for the worst case: peak WAL generation rate plus headroom for wal_keep_size, plus the full backlog you can tolerate during an extended archiver outage.
Diagnosing Silent archive_command Failure
A custom archive_command that returns success after a partial copy (truncated S3 upload, half-written NFS file) silently corrupts the archive. PostgreSQL trusts the exit code; once a segment is marked archived, the segment in pg_wal is recyclable. Defensive practice: write to a temporary path, fsync, atomic-rename to the final path, then return success. Standard tools (pgBackRest, barman, wal-g) already implement this; hand-rolled scripts often do not.
Version Compatibility
pg_stat_archiver was added in PostgreSQL 9.4 alongside the formalization of the WAL archiver as a separate background process. The columns shown above have been stable across versions. archive_mode = always (allowing standbys to archive) was added in 9.5. PostgreSQL 15 introduced archive_library — a loadable module alternative to the shell archive_command — and the same pg_stat_archiver view summarizes its activity identically.
pg_stat_reset_shared('archiver') has been available since 9.4 and affects only the archiver counters; it does not touch the WAL segments or interrupt archiving. Resetting after fixing a misconfiguration brings dashboards back to a clean baseline without restarting the cluster.
PostgreSQL 13 introduced wal_keep_size (replacing wal_keep_segments), which interacts with archiving by reserving WAL on top of what the archiver and replication slots require. Account for both when sizing the WAL volume.
Best Practices
- Alert on
since_last_archivenot raw counter values — counters reset onpg_stat_reset_shared, intervals are robust - Use a battle-tested archive tool (
pgBackRest,barman,wal-g) — they implement atomic-rename, retries, and verification correctly - Set
archive_timeoutto your RPO — forces a segment switch on idle clusters; never higher than the data loss you can tolerate - Monitor
pg_wal/archive_statusready-file count — independent ofpg_stat_archivercounters; catches misconfigurations that the counter view alone can miss - Test PITR end-to-end periodically — a never-tested archive is functionally untrustworthy regardless of what the counters say
- Size
pg_walvolume for the worst case — peak WAL rate × tolerated archiver outage +wal_keep_sizeheadroom
References
- PostgreSQL: pg_stat_archiver — official reference for the single-row archiver statistics view
- PostgreSQL: Continuous Archiving and Point-in-Time Recovery — conceptual guide to WAL archiving and PITR setup
- PostgreSQL: WAL Archiving Configuration — reference for
archive_mode,archive_command,archive_library, andarchive_timeout - depesz: Why is my WAL directory so large? — operational diagnosis of WAL accumulation including stalled archivers and replication slots