PostgreSQL Performance Tuning Boost Efficiency with Cache Hit Ratio Analysis
PostgreSQL Performance Tuning: Analyze Cache Hit Ratio with pg_stat_statements
Purpose
This PostgreSQL query empowers you to gain critical insights into the efficiency of your query workload by measuring the cache hit ratio from the pg_stat_statements extension. Cache hit ratio is a crucial metric for evaluating how effectively your database utilizes its memory buffers to serve query results. By identifying queries with low hit ratios, you can pinpoint potential areas for optimization, such as improving indexing strategies or adjusting memory allocation.
Sample Code from Command Line
1SELECT
2 calls,
3 rows,
4 shared_blks_hit,
5 shared_blks_read,
6 -- using greatest() to avoid divide by zero error, by ensuring we divide by at least 1
7 shared_blks_hit /
8 GREATEST(shared_blks_hit + shared_blks_read, 1)::float AS shared_blks_hit_ratio,
9 -- casting divisor to float to avoid getting integer maths returning zeros instead of fractional ratios
10 local_blks_hit,
11 local_blks_read,
12 local_blks_hit /
13 GREATEST(local_blks_hit + local_blks_read, 1)::float AS local_blks_hit_ratio,
14 query
15FROM
16 pg_stat_statements
17--ORDER BY rows DESC
18ORDER BY
19 shared_blks_hit_ratio DESC,
20 local_blks_hit_ratio DESC,
21 rows DESC
22LIMIT 100;
Code Breakdown:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
:- This line ensures the
pg_stat_statements
extension is installed in your database if it isn't already. This extension is essential for tracking detailed statistics about query execution, including cache hits and misses.
- This line ensures the
SELECT calls, rows, shared_blks_hit, shared_blks_read, ...
:- Selects a range of relevant columns from the
pg_stat_statements
view:calls
: The number of times the query has been executed.rows
: The total number of rows processed or returned by the query.shared_blks_hit
: The number of times the query found needed data in the shared buffer cache.shared_blks_read
: The number of times the query had to read data from disk because it wasn't in the shared buffer cache.shared_blks_hit_ratio
: Calculates the ratio of shared buffer cache hits to total shared buffer accesses (hits + reads). UsesGREATEST()
to prevent division by zero errors.local_blks_hit
andlocal_blks_read
: Similar to shared blocks, but track hits and reads within a temporary buffer used for specific operations like sorts or hash joins.local_blks_hit_ratio
: Calculates the hit ratio for the local buffer.query
: The actual SQL text of the query.
- Selects a range of relevant columns from the
FROM pg_stat_statements
:- Queries the
pg_stat_statements
view to access aggregated query statistics.
- Queries the
ORDER BY shared_blks_hit_ratio DESC, local_blks_hit_ratio DESC, rows DESC LIMIT 100;
:- Sorts the results:
- Primarily by descending
shared_blks_hit_ratio
, showing queries with the lowest shared buffer cache hit ratios first. - Secondarily by descending
local_blks_hit_ratio
. - Finally, by descending
rows
to prioritize queries processing more data.
- Primarily by descending
- Limits the output to the top 100 queries.
- Sorts the results:
Key Points:
- Cache Hit Ratio Focus: The query centers on evaluating how often queries can retrieve data from memory (cache) versus requiring disk reads, a key performance indicator.
- Shared and Local Buffers: It considers both shared and local buffer cache hit ratios for a comprehensive analysis.
- Prioritization: The sorting and limiting ensure you quickly see the queries most likely to benefit from optimization.
Insights:
- Performance Bottleneck Identification: Queries with low cache hit ratios might indicate inefficient queries or insufficient memory allocation.
- Indexing Opportunities: Analyzing these queries can highlight tables that would benefit from additional indexes.
- Memory Management: A consistently low overall cache hit ratio could suggest the need for increased shared buffers or other memory adjustments.
Explanations:
pg_stat_statements
: This PostgreSQL extension tracks and aggregates key execution statistics for all SQL statements run by the server.shared_buffers
: A PostgreSQL configuration parameter defining the amount of memory dedicated to the shared buffer cache.GREATEST()
: Used to ensure the denominator in the hit ratio calculations is at least 1, preventing division by zero.
References:
- PostgreSQL Documentation on pg_stat_statements: https://www.postgresql.org/docs/current/pgstatstatements.html
- PostgreSQL Documentation on shared_buffers: https://www.postgresql.org/docs/current/runtime-config-resource.html
- Reference Code: https://github.com/HariSekhon/SQL-scripts/blob/master/postgres_query_cache_hit_ratio.sql