PostgreSQL Performance Tuning Boost Efficiency with Cache Hit Ratio Analysis

PostgreSQL Performance Tuning: Analyze Cache Hit Ratio with pg_stat_statements

Purpose

This PostgreSQL query empowers you to gain critical insights into the efficiency of your query workload by measuring the cache hit ratio from the pg_stat_statements extension. Cache hit ratio is a crucial metric for evaluating how effectively your database utilizes its memory buffers to serve query results. By identifying queries with low hit ratios, you can pinpoint potential areas for optimization, such as improving indexing strategies or adjusting memory allocation.

Sample Code from Command Line

 1SELECT
 2  calls,
 3  rows,
 4  shared_blks_hit,
 5  shared_blks_read,
 6  -- using greatest() to avoid divide by zero error, by ensuring we divide by at least 1
 7    shared_blks_hit /
 8    GREATEST(shared_blks_hit + shared_blks_read, 1)::float AS shared_blks_hit_ratio,
 9    -- casting divisor to float to avoid getting integer maths returning zeros instead of fractional ratios
10  local_blks_hit,
11  local_blks_read,
12    local_blks_hit /
13    GREATEST(local_blks_hit + local_blks_read, 1)::float AS local_blks_hit_ratio,
14  query
15FROM
16  pg_stat_statements
17--ORDER BY rows DESC
18ORDER BY
19  shared_blks_hit_ratio DESC,
20  local_blks_hit_ratio DESC,
21  rows DESC
22LIMIT 100;

Code Breakdown:

  1. CREATE EXTENSION IF NOT EXISTS pg_stat_statements;:
    • This line ensures the pg_stat_statements extension is installed in your database if it isn't already. This extension is essential for tracking detailed statistics about query execution, including cache hits and misses.
  2. SELECT calls, rows, shared_blks_hit, shared_blks_read, ...:
    • Selects a range of relevant columns from the pg_stat_statements view:
      • calls: The number of times the query has been executed.
      • rows: The total number of rows processed or returned by the query.
      • shared_blks_hit: The number of times the query found needed data in the shared buffer cache.
      • shared_blks_read: The number of times the query had to read data from disk because it wasn't in the shared buffer cache.
      • shared_blks_hit_ratio: Calculates the ratio of shared buffer cache hits to total shared buffer accesses (hits + reads). Uses GREATEST() to prevent division by zero errors.
      • local_blks_hit and local_blks_read: Similar to shared blocks, but track hits and reads within a temporary buffer used for specific operations like sorts or hash joins.
      • local_blks_hit_ratio: Calculates the hit ratio for the local buffer.
      • query: The actual SQL text of the query.
  3. FROM pg_stat_statements:
    • Queries the pg_stat_statements view to access aggregated query statistics.
  4. ORDER BY shared_blks_hit_ratio DESC, local_blks_hit_ratio DESC, rows DESC LIMIT 100;:
    • Sorts the results:
      • Primarily by descending shared_blks_hit_ratio, showing queries with the lowest shared buffer cache hit ratios first.
      • Secondarily by descending local_blks_hit_ratio.
      • Finally, by descending rows to prioritize queries processing more data.
    • Limits the output to the top 100 queries.

Key Points:

  • Cache Hit Ratio Focus: The query centers on evaluating how often queries can retrieve data from memory (cache) versus requiring disk reads, a key performance indicator.
  • Shared and Local Buffers: It considers both shared and local buffer cache hit ratios for a comprehensive analysis.
  • Prioritization: The sorting and limiting ensure you quickly see the queries most likely to benefit from optimization.

Insights:

  • Performance Bottleneck Identification: Queries with low cache hit ratios might indicate inefficient queries or insufficient memory allocation.
  • Indexing Opportunities: Analyzing these queries can highlight tables that would benefit from additional indexes.
  • Memory Management: A consistently low overall cache hit ratio could suggest the need for increased shared buffers or other memory adjustments.

Explanations:

  • pg_stat_statements: This PostgreSQL extension tracks and aggregates key execution statistics for all SQL statements run by the server.
  • shared_buffers: A PostgreSQL configuration parameter defining the amount of memory dedicated to the shared buffer cache.
  • GREATEST(): Used to ensure the denominator in the hit ratio calculations is at least 1, preventing division by zero.

References: