Detect Soft Delete Patterns in PostgreSQL
Detecting Soft Delete Patterns in PostgreSQL
Soft deletes are a common application pattern. Instead of removing a row with a DELETE statement, the application marks it as deleted by setting a column — typically deleted_at — to a non-null timestamp. The row stays in the table forever. The application simply filters it out with a WHERE deleted_at IS NULL clause.
This pattern is useful for audit trails, undo functionality, and referential integrity. It is also a significant source of problems for PostgreSQL DBAs who do not know it is in use.
When rows are deleted with DELETE, PostgreSQL marks them as dead tuples. Autovacuum reclaims that space. With soft deletes, those rows are never actually deleted. They accumulate silently. Tables grow large. Indexes balloon. Query performance degrades. And unless you know the pattern is there, the root cause is hard to diagnose.
The script covered in this article scans information_schema.columns to find every table in the database that contains a column named deleted_at. That is a reliable heuristic for identifying which tables use soft deletes.
SQL Script
1SELECT
2 table_schema,
3 table_name,
4 column_name,
5 data_type
6FROM
7 information_schema.columns
8WHERE
9 column_name = 'deleted_at';
Notes: Run as any role with access to information_schema.columns. The query scans all schemas visible to the connected user. Adjust the WHERE clause to add other common soft-delete column names such as is_deleted or deleted.
Code Breakdown
SELECT Clause
The query returns four columns: table_schema, table_name, column_name, and data_type. The schema is included because the same table name can exist in multiple schemas. The data_type tells you whether deleted_at is a timestamp with time zone, timestamp without time zone, boolean, or something else. The data type reveals how the application implements the pattern.
FROM information_schema.columns
information_schema.columns is a standard SQL catalog view. It contains one row for every column in every table, view, and materialized view that the current role has access to. It covers all schemas unless you add a table_schema filter.
WHERE column_name = 'deleted_at'
This is the heuristic filter. deleted_at is the most common name for a soft-delete timestamp column, popularised by Rails ActiveRecord's paranoia gem and many other ORM-level soft-delete libraries. The filter is case-sensitive in PostgreSQL catalog lookups, so lowercase deleted_at matches the overwhelming majority of real-world schemas.
To cast a wider net, extend the filter:
1WHERE column_name IN ('deleted_at', 'is_deleted', 'deleted', 'removed_at', 'archived_at')
Key Points
- The query identifies tables that likely use soft deletes by looking for a
deleted_atcolumn. - Soft-delete tables accumulate rows permanently. Dead-tuple pressure from
DELETEis absent, but the tables grow without bound. - Large soft-delete tables cause index bloat, slower sequential scans, and inflated row estimates that mislead the query planner.
- Autovacuum has no dead tuples to clean from soft-delete tables, so its default thresholds often mean it visits these tables infrequently even as they grow large.
- Finding these tables is the first step. The next step is understanding how large they have become and what fraction of rows are logically deleted.
Insights and Explanations
PostgreSQL's query planner sees all rows as live. When 80% of a table's rows have deleted_at IS NOT NULL, the planner still counts all of them in its row estimates. It may choose a sequential scan when an index scan would be faster for the 20% of live rows. The estimates are off by a factor of five before the query even starts.
Autovacuum triggers on dead tuple count and table age. Soft-delete tables generate no dead tuples from the marking operation itself — the old row version from an UPDATE does create a dead tuple, but autovacuum cleans that normally. The problem is that the logically-deleted rows themselves are never removed. Autovacuum works correctly at the mechanical level but cannot solve the accumulation problem.
Every index on a soft-delete table includes entries for logically-deleted rows. A B-tree index on user_id for a users table where half the users are soft-deleted carries entries for all users. Index scans touch more pages, cache efficiency drops, and the index consumes more disk space. Creating a partial index on (user_id) WHERE deleted_at IS NULL is the standard mitigation, but only helps if the application consistently uses that filter.
Once you have a list of soft-delete tables, follow up with these queries for each one:
1-- Row count split by deletion status
2SELECT
3 deleted_at IS NOT NULL AS is_deleted,
4 count(*)
5FROM your_schema.your_table
6GROUP BY 1;
1-- Table and index size
2SELECT
3 pg_size_pretty(pg_total_relation_size('your_schema.your_table')) AS total_size,
4 pg_size_pretty(pg_relation_size('your_schema.your_table')) AS table_size;
Additional Considerations
- Extending the heuristic:
deleted_atis a convention, not a standard. Some codebases useis_deleted(boolean),removed_at, orarchived_at. A thorough audit should query for all likely names. - Remediation options: Add partial indexes with
WHERE deleted_at IS NULL, schedule periodic hard deletes beyond a retention window, or partition large tables by time so old deleted data can be dropped by partition detach. - Involve the application team: Soft deletes are an application contract. Physical deletion of logically-deleted rows requires agreement on a retention policy before any DBA action.
- Behavioural signal: Tables where
n_tup_delinpg_stat_user_tablesis zero but the table is large and growing are another signal that rows are never physically deleted.
References
Posts in this series
- How Many Connections Can Your PostgreSQL Database Handle?
- PostgreSQL Backend Connections via pg_stat_database
- Identifying Blocking PostgreSQL Queries using pg_stat_activity
- List PostgreSQL Databases by Size with Access Check
- Assess PostgreSQL Database Sizes Quickly and Easily
- Unveiling Your PostgreSQL Server - A Diagnostic Powerhouse
- Keep Your PostgreSQL Database Clean, Identify Idle Connections
- Query the PostgreSQL Configuration
- PostgreSQL Recovery Monitoring: Essential SQL Insights
- Restart All PostgreSQL Sequences with ALTER SEQUENCE
- Monitor Running Queries in PostgreSQL using pg_stat_activity
- Monitor PostgreSQL Active Sessions with pg_stat_activity
- PostgreSQL Error Handling Settings via pg_settings
- PostgreSQL File Location Settings Query via pg_settings
- PostgreSQL Lock Management Settings via pg_settings
- PostgreSQL Logging Configuration Query via pg_settings
- Monitor PostgreSQL Memory Settings with pg_settings
- PostgreSQL Table Row Count Estimates with SQL
- List PostgreSQL Tables by Size with SQL
- PostgreSQL WAL Settings Query Guide
- PostgreSQL SSL Settings Query Guide
- PostgreSQL Statistics Settings Query Guide
- PostgreSQL Resource Settings Query Guide
- PostgreSQL Replication Settings Query Guide
- PostgreSQL Query Planning Settings Query Guide
- PostgreSQL Preset Options Settings Query Guide
- PostgreSQL Miscellaneous Settings Query Guide
- Count PostgreSQL Sessions by State with SQL
- Kill Idle PostgreSQL Sessions with SQL
- Grant SELECT on All Tables in PostgreSQL
- Identify Insert-Only Tables in PostgreSQL
- Detect Soft Delete Patterns in PostgreSQL
- List PostgreSQL Object Comments with SQL
- List Foreign Key Constraints in PostgreSQL
- List PostgreSQL Enum Types and Their Values with SQL
- List All Views in a PostgreSQL Database with SQL
- Find PostgreSQL Tables Without a Primary Key
- List PostgreSQL Partitioned Tables with SQL