Monitor PostgreSQL ANALYZE Progress with pg_stat_progress_analyze
Monitor PostgreSQL ANALYZE Progress with pg_stat_progress_analyze
Running ANALYZE on a multi-hundred-gigabyte table and wondering whether it is halfway done or barely started is a common operational question with, until recently, no good answer. pg_stat_progress_analyze answers it: a live view that reports the current phase of every running ANALYZE and how many sample blocks it has scanned.
Purpose and Overview
ANALYZE collects column statistics that the query planner depends on to choose good plans. On a small table it finishes in milliseconds and nobody thinks about it. On a very large table, or one with many child partitions, it can run for minutes, and during that window two questions matter: is it actually making progress, and how long until it is done. A manual ANALYZE issued before a big read-heavy job, or an autovacuum-launched analyze competing for I/O, both raise the same need for visibility.
pg_stat_progress_analyze provides that visibility. It exposes one row per backend currently running ANALYZE, naming the relation, the current phase, and counters that quantify how far the work has gone. The headline pair is sample_blks_total and sample_blks_scanned — the number of heap blocks ANALYZE plans to sample and the number it has read so far. Their ratio is a real completion percentage for the dominant phase of the operation.
For partitioned tables the view adds child_tables_total and child_tables_done, so you can see ANALYZE working through the partitions of an inheritance or declarative-partitioning hierarchy. Because the view is a snapshot of in-memory progress state, querying it imposes negligible overhead and can be polled as often as you like.
Sample Code
1SELECT
2 a.pid,
3 a.datname,
4 c.relname,
5 a.phase,
6 a.sample_blks_scanned,
7 a.sample_blks_total,
8 round(100.0 * a.sample_blks_scanned
9 / NULLIF(a.sample_blks_total, 0), 1) AS pct_sampled,
10 a.child_tables_done,
11 a.child_tables_total
12FROM
13 pg_stat_progress_analyze a
14 JOIN pg_class c ON c.oid = a.relid;
Notes: Requires PostgreSQL 13 or later, where ANALYZE progress reporting was added. The join to pg_class turns the relid OID into a readable table name. NULLIF(sample_blks_total, 0) avoids a division-by-zero while ANALYZE is still in its initializing phase and the total has not yet been set. An empty result simply means no ANALYZE is running right now.
Code Breakdown
The query joins the progress view to pg_class for human-readable names and derives a single completion percentage. Everything else is a direct column read.
Resolving the Relation Name
1JOIN pg_class c ON c.oid = a.relid
The progress view identifies its target table by OID in relid. Joining to pg_class on that OID yields relname, which is what an operator actually wants to see. For a fully schema-qualified name you would add a join to pg_namespace; for a quick check the bare relation name is usually enough.
The pct_sampled Expression
1round(100.0 * sample_blks_scanned / NULLIF(sample_blks_total, 0), 1)
sample_blks_scanned over sample_blks_total is the fraction of the planned sample that has been read. 100.0 forces real division, NULLIF(..., 0) protects against the brief window where the total is still zero, and rounding to one decimal keeps the output readable. This percentage tracks the acquiring sample rows phase, which is where large ANALYZE runs spend the bulk of their time.
Child-Table Counters
1child_tables_done, child_tables_total
For a partitioned or inheritance parent, ANALYZE also gathers statistics across child tables. These two counters expose that outer loop, so on a partitioned table you watch progress at two levels: blocks within the current child, and children within the parent.
Key ANALYZE Progress Phases
initializing
ANALYZE is preparing to scan and has not yet committed to a block total. This phase is expected to be very brief; if a query catches it here, sample_blks_total may still read zero, which is exactly why the percentage expression guards against division by zero.
acquiring sample rows
The main event. ANALYZE reads a randomized sample of heap blocks to estimate column distributions, and sample_blks_scanned climbs toward sample_blks_total throughout. On a large table this phase dominates wall-clock time, so the sampled percentage is the most useful single progress number.
computing statistics and finalizing analyze
After sampling, ANALYZE computes the statistics (including a separate computing extended statistics phase when extended statistics objects exist) and then writes results into pg_class and the statistics catalog during finalizing analyze. These phases are typically short relative to sampling, but seeing the phase advance past acquiring sample rows confirms the scan is finished and the run is nearly complete.
Practical Applications
Estimating Completion Before a Critical Job
When a reporting job or migration depends on fresh statistics, run ANALYZE first and poll the view to estimate when it will finish — instead of guessing or cancelling prematurely. The sampled percentage gives a defensible estimate.
Deciding Whether to Wait or Cancel
A long-running manual ANALYZE that is 90% sampled is worth waiting out; one stuck near zero with a huge sample_blks_total may warrant rescheduling to a quieter window. The view turns that decision from a guess into a measurement.
Watching Autovacuum-Launched Analyze
Autovacuum issues analyze automatically, and on big tables those runs show up here too. Correlating a progress row with an I/O spike confirms that a background analyze, not a runaway query, is consuming bandwidth. Because the view reports the backend's process ID in pid, you can join it against the activity views to see exactly which session — manual or autovacuum-launched — owns the run, and how long that session has been active.
Partition-by-Partition Tracking
On declaratively partitioned tables, the child-table counters reveal whether ANALYZE is steadily working through partitions or has stalled on one, which is useful when statistics freshness varies across a large partition set.
Version Compatibility
Progress reporting for ANALYZE arrived in PostgreSQL 13. Earlier releases offered progress views for other long operations — pg_stat_progress_vacuum shipped in PostgreSQL 9.6, and views for CREATE INDEX, CLUSTER, and base backups followed — but ANALYZE had no equivalent until 13. On PostgreSQL 12 and earlier the only signals are the VERBOSE option's log output and indirect observation of I/O.
The column set has been stable from 13 onward, with the delay_time column added in a later release to expose time spent sleeping under cost-based delay. Across PostgreSQL 13 through 18 the phases and the core sample-block counters behave consistently, so a monitoring query written against 13 continues to work on current versions.
Best Practices
- Poll, do not block — the view is cheap to read; sample it every few seconds rather than committing to a single snapshot when tracking a long run.
- Join to pg_class (and pg_namespace) for names — raw OIDs in
relidare hard to act on; resolve them so alerts name the actual table. - Guard the percentage against zero —
sample_blks_totalis zero duringinitializing; always wrap the division inNULLIFto avoid an error. - Watch both loops on partitioned tables — combine block-level and child-table counters so a stall on one partition is visible.
- Pair with statistics freshness checks — progress tells you ANALYZE is running;
last_analyzein the table statistics views tells you whether it ran recently enough.
References
- PostgreSQL Documentation — Progress Reporting — defines
pg_stat_progress_analyze, its phases, and every counter column. - PostgreSQL Documentation — ANALYZE — the command whose execution this view reports on, including sampling behavior.
- PostgreSQL Documentation — Monitoring Statistics — the cumulative statistics views that complement live progress reporting.
- EDB Blog — PostgreSQL VACUUM Guide and Best Practices — operational guidance on when and how often to run vacuum and analyze.