Category Archives: PostgreSQL

Finding Missing Index in PostgreSQL

For better query performance we need to analyze indexing in PostgreSQL tables. Postgres has some support to find out missing indexing in a table.

This query finds missing index in PostgreSQL:

SELECT
  relname,
  seq_scan - idx_scan AS too_much_seq,
  CASE
    WHEN seq_scan - coalesce(idx_scan, 0) > 0 THEN 'Missing Index ?'
    ELSE 'OK'
  END,
  pg_relation_size(relname::regclass) AS rel_size, 
  seq_scan, idx_scan
FROM pg_stat_all_tables

WHERE schemaname = 'public' AND pg_relation_size(relname::regclass) > 80000 
ORDER BY too_much_seq DESC;

Another helpful query to find which tables need indexing:

SELECT
  x1.table_in_trouble,
  pg_relation_size(x1.table_in_trouble) AS sz_n_byts,
  x1.seq_scan,
  x1.idx_scan,
  CASE
  WHEN pg_relation_size(x1.table_in_trouble) > 500000000
    THEN 'Exceeds 500 megs, too large to count in a view. For a count, count individually'::text
  ELSE count(x1.table_in_trouble)::text
  END                                   AS tbl_rec_count,
  x1.priority
FROM
  (
    SELECT
      (schemaname::text || '.'::text) || relname: :text AS table_in_trouble,
      seq_scan,
      idx_scan,
      CASE
      WHEN (seq_scan - idx_scan) < 500
        THEN 'Minor Problem'::text
      WHEN (seq_scan - idx_scan) >= 500 AND (seq_scan - idx_scan) < 2500
        THEN 'Major Problem'::text
      WHEN (seq_scan - idx_scan) >= 2500
        THEN 'Extreme Problem'::text
      ELSE NULL::text
      END AS priority
    FROM
      pg_stat_all_tables
    WHERE
      seq_scan > idx_scan
      AND schemaname != 'pg_catalog'::name
    AND seq_scan > 100) x1
GROUP BY
  x1.table_in_trouble,
  x1.seq_scan,
  x1.idx_scan,
  x1.priority
ORDER BY
  x1.priority DESC,
  x1.seq_scan;
Advertisements

Finding Slow Queries in PostgreSQL

As a data engineer sometimes we need to find which queries taking too much time. In postgreSQL there is a module called pg_stats_statements which  provide very good stats about slow queries. This module tracks execution statistics of SQL statements and helps to find slow performing queries.

To enable this fist we need to run

CREATE EXTENSION pg_stat_statements

Column and definition for pg_stats_statements:

Name Type References Description
userid oid pg_authid.oid OID of user who executed the statement
dbid oid pg_database.oid OID of database in which the statement was executed
queryid bigint Internal hash code, computed from the statement’s parse tree
query text Text of a representative statement
calls bigint Number of times executed
total_time double precision Total time spent in the statement, in milliseconds
min_time double precision Minimum time spent in the statement, in milliseconds
max_time double precision Maximum time spent in the statement, in milliseconds
mean_time double precision Mean time spent in the statement, in milliseconds
stddev_time double precision Population standard deviation of time spent in the statement, in milliseconds
rows bigint Total number of rows retrieved or affected by the statement
shared_blks_hit bigint Total number of shared block cache hits by the statement
shared_blks_read bigint Total number of shared blocks read by the statement
shared_blks_dirtied bigint Total number of shared blocks dirtied by the statement
shared_blks_written bigint Total number of shared blocks written by the statement
local_blks_hit bigint Total number of local block cache hits by the statement
local_blks_read bigint Total number of local blocks read by the statement
local_blks_dirtied bigint Total number of local blocks dirtied by the statement
local_blks_written bigint Total number of local blocks written by the statement
temp_blks_read bigint Total number of temp blocks read by the statement
temp_blks_written bigint Total number of temp blocks written by the statement
blk_read_time double precision Total time the statement spent reading blocks, in milliseconds (if track_io_timing is enabled, otherwise zero)
blk_write_time double precision Total time the statement spent writing blocks, in milliseconds (if track_io_timing is enabled, otherwise zero)

To find slow queries, you can run below SQL command:

SELECT * FROM pg_stat_statements ORDER BY total_time DESC;

Adding a radius search to your Ruby on Rails 5 application and PostgreSQL

Recently one of my project need to create a functionality for location based search. I choose PostgreSQL as database because I want to use some power of PostgreSQL.

So I want to create a functionality where I can send three param in a PostgreSQL and this database will return me location within this radius.  So I want to search what locations are within X meters of these coordinates.

Using PostgreSQL is lot easier.

First you need to enable some plugin for PostgreSQL.

  CREATE EXTENSION cube;
  CREATE EXTENSION earthdistance;

Now you can run query like:

SELECT \"franchises\".* FROM \"franchises\" WHERE (earth_box(ll_to_earth(40.810649, -73.598605), 2000) @> ll_to_earth(latitude, longitude))

You can also create a scope in rails model.

scope :within_radius, lambda {|latitude, longitude, metres| where("earth_box(ll_to_earth(?, ?), ?) @> ll_to_earth(latitude, longitude)", latitude, longitude, metres) }

And that’s it. You can call

  Franchise.within_radius(43.6422125, -79.3744256, 5000) 

and you’ll have a list of franchises within 5 km of NY.

Happy coding 🙂