PostgreSQL query to find columns containing only NULL values
Identifying Null-Only Columns in PostgreSQL
This article explores a valuable PostgreSQL query for database administrators. It helps identify columns within your database tables that exclusively contain NULL values. Uncovering these "null-only" columns can be a key step in optimizing storage usage and streamlining your database schema design.
Sample Code from Command Line
1SELECT
2 nspname,
3 relname,
4 attname,
5 typname,
6 (
7 stanullfrac*100
8 )
9 ::int AS null_percent
10FROM
11 pg_class c
12 JOIN
13 pg_namespace ns
14 ON (ns.oid = relnamespace)
15 JOIN
16 pg_attribute
17 ON (c.oid = attrelid)
18 JOIN
19 pg_type t
20 ON (t.oid = atttypid)
21 JOIN
22 pg_statistic
23 ON (c.oid = starelid
24 AND staattnum = attnum)
25WHERE
26 (
27 stanullfrac*100
28 )
29 ::int = 100
30 AND nspname NOT LIKE E'pg\\_%'
31 AND nspname != 'information_schema'
32 AND relkind = 'r'
33 AND NOT attisdropped
34 AND attstattarget != 0 -- AND
35 --reltuples >= 100
36ORDER BY
37 nspname,
38 relname,
39 attname;
Notes: Finds PostgreSQL columns that contain only NULLs. Tested on PostgreSQL 8.4, 9.x, 10.x, 11.x, 12.x, 13.0.
Cleaning Up Unused Space: Identifying Null-Only Columns in PostgreSQL
This article explores a valuable PostgreSQL query for database administrators. It helps identify columns within your database tables that exclusively contain NULL values. Uncovering these "null-only" columns can be a key step in optimizing storage usage and streamlining your database schema design.
Understanding the Code:
The provided PostgreSQL code leverages system views and functions to achieve its purpose:
- System Views:
pg_class (c)
: Provides information about database tables (relations).pg_namespace (ns)
: Stores details on database namespaces (schemas).pg_attribute (att)
: Contains details regarding database table columns (attributes).pg_type (t)
: Holds information on available data types within the database.pg_statistic (stats)
: Offers statistics for database tables, including the percentage of NULL values in each column.
- Joins: Combine data from multiple system views based on specified relationships.
- WHERE Clause: Filters the results based on specific conditions:
- Columns containing only NULL values (100% null fraction).
- Exclusion of system schemas (starting with "pg_").
- Exclusion of the "information_schema" schema.
- Inclusion of only base tables (
relkind = 'r'
). - Exclusion of dropped columns (
NOT attisdropped
). - Inclusion of columns with statistics gathering enabled (
attstattarget != 0
).
- Selection: Chooses specific columns for each identified null-only column:
nspname
: Schema name of the table.relname
: Name of the table.attname
: Name of the column.typname
: Data type of the column.null_percent
: Percentage of NULL values in the column (converted to an integer).
- Ordering: Organizes the results by schema name, table name, and then column name.
Key Points:
- Data Retrieval: The query gathers data from various system views using joins.
- Filtering: It meticulously filters the retrieved data to identify null-only columns.
- Insights: The results provide valuable information for database optimization:
- Identifying null-only columns can reveal potentially unused or redundant data.
- Removing null-only columns can free up storage space, potentially improving efficiency.
- Analyzing identified columns can lead to a better understanding of data models and data cleansing strategies.
Important Considerations:
- The commented-out
reltuples >= 100
clause can be used to focus on larger tables where null-only columns might have a more significant storage impact (minimum table row count). - Exercise caution and ensure proper backups before making any data schema modifications, especially when considering column removal. Unused columns might still be crucial for specific functionalities.
Conclusion:
By utilizing this PostgreSQL query and carefully analyzing the results, database administrators can streamline database schema design and potentially reclaim valuable storage space. Remember to exercise caution and perform proper backups before modifying your database schema.
References:
PostgreSQL Documentation on pg_class: https://www.postgresql.org/docs/current/catalog-pg-class.html
PostgreSQL Documentation on pg_namespace: https://www.postgresql.org/docs/current/catalog-pg-namespace.html
PostgreSQL Documentation on pg_attribute: https://www.postgresql.org/docs/current/catalog-pg-attribute.html
PostgreSQL Documentation on pg_type: https://www.postgresql.org/docs/current/catalog-pg-type.html
PostgreSQL Documentation on pg_statistic: https://www.postgresql.org/docs/current/catalog-pg-statistic.html
Source:
https://github.com/HariSekhon/SQL-scripts/blob/master/postgres_columns_null.sql