Estimate PostgreSQL Table Bloat with SQL

How to Estimate Table Bloat in PostgreSQL

When you update or delete rows in PostgreSQL, the old row versions are not removed immediately. They stay in the table as dead tuples until VACUUM cleans them up. Over time, if VACUUM does not keep up, these dead rows pile up and the table grows larger than it needs to be. This is called table bloat.

Bloat wastes disk space and slows down queries because PostgreSQL has to read through more pages to find live data. This query estimates how much of each table is wasted space, so you can decide which tables need attention.

Sample Code

 1SELECT
 2    current_database(),
 3    schemaname,
 4    tblname,
 5    bs * tblpages AS real_size,
 6    (tblpages - est_tblpages) * bs AS extra_size,
 7    CASE WHEN tblpages - est_tblpages > 0
 8        THEN 100 * (tblpages - est_tblpages) / tblpages::float
 9        ELSE 0
10    END AS extra_pct,
11    fillfactor,
12    CASE WHEN tblpages - est_tblpages_ff > 0
13        THEN (tblpages - est_tblpages_ff) * bs
14        ELSE 0
15    END AS bloat_size,
16    CASE WHEN tblpages - est_tblpages_ff > 0
17        THEN 100 * (tblpages - est_tblpages_ff) / tblpages::float
18        ELSE 0
19    END AS bloat_pct,
20    is_na
21FROM (
22    SELECT
23        ceil(reltuples / ((bs - page_hdr) / tpl_size)) + ceil(toasttuples / 4) AS est_tblpages,
24        ceil(reltuples / ((bs - page_hdr) * fillfactor / (tpl_size * 100))) + ceil(toasttuples / 4) AS est_tblpages_ff,
25        tblpages, fillfactor, bs, tblid, schemaname, tblname, heappages, toastpages, is_na
26    FROM (
27        SELECT
28            (4 + tpl_hdr_size + tpl_data_size + (2 * ma)
29                - CASE WHEN tpl_hdr_size % ma = 0 THEN ma ELSE tpl_hdr_size % ma END
30                - CASE WHEN ceil(tpl_data_size)::int % ma = 0 THEN ma ELSE ceil(tpl_data_size)::int % ma END
31            ) AS tpl_size,
32            bs - page_hdr AS size_per_block,
33            (heappages + toastpages) AS tblpages, heappages,
34            toastpages, reltuples, toasttuples, bs, page_hdr, tblid, schemaname, tblname, fillfactor, is_na
35        FROM (
36            SELECT
37                tbl.oid AS tblid, ns.nspname AS schemaname, tbl.relname AS tblname, tbl.reltuples,
38                tbl.relpages AS heappages, coalesce(toast.relpages, 0) AS toastpages,
39                coalesce(toast.reltuples, 0) AS toasttuples,
40                coalesce(substring(
41                    array_to_string(tbl.reloptions, ' ')
42                    FROM 'fillfactor=([0-9]+)')::smallint, 100) AS fillfactor,
43                current_setting('block_size')::numeric AS bs,
44                CASE WHEN version() ~ 'mingw32' OR version() ~ '64-bit|x86_64|ppc64|ia64|amd64' THEN 8 ELSE 4 END AS ma,
45                24 AS page_hdr,
46                23 + CASE WHEN MAX(coalesce(s.null_frac, 0)) > 0 THEN (7 + count(s.attname)) / 8 ELSE 0::int END
47                   + CASE WHEN bool_or(att.attname = 'oid' AND att.attnum < 0) THEN 4 ELSE 0 END AS tpl_hdr_size,
48                sum((1 - coalesce(s.null_frac, 0)) * coalesce(s.avg_width, 0)) AS tpl_data_size,
49                bool_or(att.atttypid = 'pg_catalog.name'::regtype)
50                    OR sum(CASE WHEN att.attnum > 0 THEN 1 ELSE 0 END) <> count(s.attname) AS is_na
51            FROM pg_attribute AS att
52                JOIN pg_class AS tbl ON att.attrelid = tbl.oid
53                JOIN pg_namespace AS ns ON ns.oid = tbl.relnamespace
54                LEFT JOIN pg_stats AS s
55                    ON s.schemaname = ns.nspname
56                    AND s.tablename = tbl.relname
57                    AND s.inherited = false
58                    AND s.attname = att.attname
59                LEFT JOIN pg_class AS toast ON tbl.reltoastrelid = toast.oid
60            WHERE NOT att.attisdropped
61              AND tbl.relname NOT LIKE '%pg%'
62              AND tbl.relkind IN ('r', 'm')
63            GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
64            ORDER BY 2, 3
65        ) AS s
66    ) AS s2
67) AS s3
68ORDER BY bloat_pct DESC;

Notes: Compatible with PostgreSQL 9.0 and later. Results are estimates based on table statistics. Run ANALYZE first for more accurate results. The is_na column flags rows where the estimate may not be reliable.

Code Breakdown

  • real_size — The actual disk space the table is using right now, in bytes.
  • extra_size — The extra space beyond what the live data actually needs. This is the overhead from dead rows and alignment padding.
  • extra_pct — Extra space as a percentage of the total table size.
  • fillfactor — The percentage of each page PostgreSQL fills with live data (default: 100). A lower fillfactor intentionally leaves free space for updates.
  • bloat_size — The space that is truly wasted after accounting for the fillfactor. This is the amount you could reclaim.
  • bloat_pct — Bloat as a percentage of the total table size. This is the key metric to watch.
  • is_na — True when the estimate is not reliable, usually because column statistics are incomplete. Re-run ANALYZE on those tables.
  • tpl_size — The estimated size of one row, including alignment overhead.
  • est_tblpages_ff — The estimated number of pages the table would need if it were compacted to its fillfactor.

Key Points

  • A bloat_pct above 20–30% is generally worth investigating.
  • Results are sorted by bloat_pct descending so the worst tables appear first.
  • Tables with is_na = true should be analyzed with ANALYZE your_table before trusting the estimate.
  • The estimate is based on statistics from pg_stats and pg_class. It is not 100% exact, but it is accurate enough to guide decisions.

Insights and Explanations

Once you identify bloated tables, you have a few options:

Regular VACUUM: Marks dead rows as reusable but does not shrink the table file on disk. Best for ongoing maintenance.

1VACUUM your_table_name;

VACUUM FULL: Rewrites the table completely, reclaiming all dead space. It locks the table exclusively during the operation, so do this during a maintenance window.

1VACUUM FULL your_table_name;

pg_repack: A third-party tool that reclaims bloat without locking the table. It is the preferred option for large, busy tables in production.

For tables with a bloat percentage consistently above 30%, consider tuning autovacuum to run more aggressively on those specific tables:

1ALTER TABLE your_table SET (
2    autovacuum_vacuum_scale_factor = 0.01,
3    autovacuum_analyze_scale_factor = 0.005
4);

This tells autovacuum to trigger a vacuum after just 1% of rows change, rather than the default 20%.

Additional Considerations

  • Permissions: Running this query as a non-superuser will only show tables you have permission to read.
  • Statistics freshness: The bloat estimate depends on pg_stats. If ANALYZE has not run recently, the estimates may be off. Run ANALYZE before using this in a capacity planning decision.
  • Write-heavy tables: Tables that receive many updates and deletes will naturally accumulate more bloat. Tune autovacuum settings for those tables individually.

References

Posts in this series