Find PostgreSQL Index Bloat and Wasted Space

How to Find Index Bloat and Wasted Space in PostgreSQL

Just like tables, PostgreSQL indexes can become bloated over time. When rows are updated or deleted, the old index entries are not removed immediately. They stay in the index as dead entries, wasting space and slowing down index scans.

B-tree indexes are the most common type in PostgreSQL and the most prone to bloat. This query estimates how much space is wasted in each B-tree index, so you can decide which ones need to be rebuilt.

Sample Code

 1WITH btree_index_atts AS (
 2    SELECT nspname, relname, reltuples, relpages, indrelid, relam,
 3        regexp_split_to_table(indkey::text, ' ')::smallint AS attnum,
 4        indexrelid AS index_oid
 5    FROM pg_index
 6    JOIN pg_class ON pg_class.oid = pg_index.indexrelid
 7    JOIN pg_namespace ON pg_namespace.oid = pg_class.relnamespace
 8    JOIN pg_am ON pg_class.relam = pg_am.oid
 9    WHERE pg_am.amname = 'btree'
10),
11index_item_sizes AS (
12    SELECT
13        i.nspname, i.relname, i.reltuples, i.relpages, i.relam,
14        s.starelid, a.attrelid AS table_oid, index_oid,
15        current_setting('block_size')::numeric AS bs,
16        CASE
17            WHEN version() ~ 'mingw32' OR version() ~ '64-bit' THEN 8
18            ELSE 4
19        END AS maxalign,
20        24 AS pagehdr,
21        CASE WHEN max(coalesce(s.stanullfrac, 0)) = 0
22            THEN 2
23            ELSE 6
24        END AS index_tuple_hdr,
25        sum((1 - coalesce(s.stanullfrac, 0)) * coalesce(s.stawidth, 2048)) AS nulldatawidth
26    FROM pg_attribute AS a
27    JOIN pg_stats pgs
28    JOIN btree_index_atts AS i ON i.indrelid = a.attrelid AND a.attnum = i.attnum
29    WHERE a.attnum > 0
30      AND pgs.schemaname = 'public'
31    GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9
32),
33index_aligned AS (
34    SELECT maxalign, bs, nspname, relname AS index_name, reltuples,
35        relpages, relam, table_oid, index_oid,
36        (2 +
37            maxalign - CASE
38                WHEN index_tuple_hdr % maxalign = 0 THEN maxalign
39                ELSE index_tuple_hdr % maxalign
40            END
41            + nulldatawidth + maxalign - CASE
42                WHEN nulldatawidth::integer % maxalign = 0 THEN maxalign
43                ELSE nulldatawidth::integer % maxalign
44            END
45        )::numeric AS nulldatahdrwidth,
46        pagehdr
47    FROM index_item_sizes AS s1
48),
49otta_calc AS (
50    SELECT bs, nspname, table_oid, index_oid, index_name, relpages,
51        coalesce(
52            ceil((reltuples * (4 + nulldatahdrwidth)) / (bs - pagehdr::float)) +
53            CASE WHEN am.amname IN ('hash', 'btree') THEN 1 ELSE 0 END, 0
54        ) AS otta
55    FROM index_aligned AS s2
56    LEFT JOIN pg_am am ON s2.relam = am.oid
57),
58raw_bloat AS (
59    SELECT
60        current_database() AS dbname, nspname,
61        c.relname AS table_name, index_name,
62        bs * (sub.relpages)::bigint AS totalbytes,
63        CASE
64            WHEN sub.relpages <= otta THEN 0
65            ELSE bs * (sub.relpages - otta)::bigint
66        END AS wastedbytes,
67        CASE
68            WHEN sub.relpages <= otta THEN 0
69            ELSE bs * (sub.relpages - otta)::bigint * 100 / (bs * (sub.relpages)::bigint)
70        END AS realbloat,
71        pg_relation_size(sub.table_oid) AS table_bytes,
72        stat.idx_scan AS index_scans
73    FROM otta_calc AS sub
74    JOIN pg_class AS c ON c.oid = sub.table_oid
75    JOIN pg_stat_user_indexes AS stat ON sub.index_oid = stat.indexrelid
76)
77SELECT
78    dbname AS database_name,
79    nspname AS schema_name,
80    table_name,
81    index_name,
82    round(realbloat, 1) AS bloat_pct,
83    wastedbytes AS bloat_bytes,
84    pg_size_pretty(wastedbytes::bigint) AS bloat_size,
85    totalbytes AS index_bytes,
86    pg_size_pretty(totalbytes::bigint) AS index_size,
87    table_bytes,
88    pg_size_pretty(table_bytes) AS table_size,
89    index_scans
90FROM raw_bloat
91ORDER BY wastedbytes DESC
92LIMIT 10;

Notes: Works on PostgreSQL 9.0 and later. Covers B-tree indexes only, which are the most common index type. Change pgs.schemaname = 'public' to match your schema. Results are estimates based on table statistics; run ANALYZE first for best accuracy.

Code Breakdown

  • btree_index_atts CTE — Collects all B-tree indexes, their column numbers, and table relationships. Filters to amname = 'btree' to skip other index types like GIN and GiST.
  • index_item_sizes CTE — Calculates the estimated size of each index entry based on column statistics from pg_stats. stanullfrac and stawidth are the null fraction and average column width.
  • index_aligned CTE — Adds memory alignment padding to each entry size. PostgreSQL aligns data to 4 or 8 byte boundaries depending on the platform.
  • otta_calc CTE — Calculates the optimal number of pages the index should use (otta) based on the actual data. Anything above this is potential bloat.
  • raw_bloat CTE — Compares actual pages (relpages) to optimal pages (otta). The difference, multiplied by block size, is the wasted bytes.
  • wastedbytes — The raw number of bytes being wasted in the index.
  • bloat_pct — Wasted space as a percentage of total index size.
  • index_scans — How many times the index has been used. An index with high bloat and low scans is a candidate for removal, not just rebuilding.

Key Points

  • Results are sorted by wastedbytes so the largest absolute waste appears first.
  • A bloat_pct over 30% is a common threshold for considering a rebuild.
  • The index_scans column helps you decide: a bloated index that is never used may be worth dropping instead of rebuilding.
  • This query is limited to the top 10 results. Remove the LIMIT 10 clause to see all indexes.

Insights and Explanations

Once you identify bloated indexes, you can rebuild them. The simplest approach is REINDEX:

1REINDEX INDEX your_index_name;

This locks the index (and blocks reads on that index) while it rebuilds. For production use, prefer the concurrent version:

1REINDEX INDEX CONCURRENTLY your_index_name;

The concurrent option takes longer but does not block queries. It was added in PostgreSQL 12.

If you want to rebuild all indexes on a table at once:

1REINDEX TABLE CONCURRENTLY your_table_name;

For indexes on very busy tables, pg_repack is another option. It rebuilds the index online without any locking.

Index bloat builds up faster on tables with many updates and deletes. If an index keeps bloating quickly after rebuilding, check whether autovacuum is keeping up with the table it belongs to.

Additional Considerations

  • Permissions: This query reads from pg_stats, pg_index, pg_class, and pg_stat_user_indexes. Superuser or the table owner can see all indexes. Regular users may see fewer results.
  • GIN and GiST indexes: This query does not cover GIN or GiST indexes, which have different internal structures. Use pgstattuple for those.
  • Statistics accuracy: If ANALYZE has not run recently, pg_stats may be stale and the bloat estimate will be off.

References

Posts in this series