Complex Index for Postgres JSONB


PostgreSql takes on no-sql approach as hybrid using JSONB data type is quite impressive, it provides the versatility of RDBMS and nosql. I will not discuss further on it, for more information you can read at: https://www.postgresql.org/docs/9.5/functions-json.html. This page focus on one difficulty that developer often encounter: now that I have a complex json inside the jsonb column, how do we index it? and how do we index the sub-object inside json array ? 

Suppose we have Postgres table name sample with:

       
CREATE TABLE IF NOT EXISTS public.sample
(
    id integer NOT NULL DEFAULT nextval('sample_id_seq'::regclass),
    name character varying(100) COLLATE pg_catalog."default" NOT NULL,
    more_info jsonb,
    CONSTRAINT sample_pkey PRIMARY KEY (id)
)
       

And the following is the structure of json data inside the more_info column:

       
'{
   "info":{
      "address":[
         {
            "addressType":"PPOR",
            "addressDetails":{
               "streetNo":10,
               "streetName":"Bourke Street"
            }
         }
      ],
      "customerPhone":{
         "primary":12345
      },
      "customerType":[
         "REG"
      ]
   }
}'
       

And we need to query customer based on addressType, Customer Type and Customer Phone, therefore our query will look like this:

       

 select id, name, more_info
 from sample
 where more_info->'info'->'customerType' @> '["REG"]'::jsonb
 and more_info->'info'->'address' @> '[{"addressType":"PPOR"}]'::jsonb
 and more_info->'info'->'customerPhone'->>'primary' = '12345'

       

Which would work fine for small dataset, however when we start talking about millions of dataset, we will need to establish a precise jsonb index. Indexing JSONB fields are rather simple and straight forward (note you can have a read about GIN index here: https://www.postgresql.org/docs/9.5/gin-intro.html) :

       

  CREATE INDEX complexQueryIndex ON sample USING gin (
	(more_info->'info'->'customerType') jsonb_path_ops,
	(more_info->'info'->'customerPhone')
  );

However  while this index works fine, this is not precise and effcient enough. As we can see in the screenshot below (sample table of 20k rows), it is more costly (cost range from 8 to 36, time from 0.07 to 0.547):


This is simply because the GIN index we created previously could not handle the Address json array well. In order to create an efficient index for json array and @> operation, we need a function index which involves 3 steps:

1. Create the function

       

CREATE OR REPLACE FUNCTION addressArray(jsonb)
  RETURNS text AS
$func$
	SELECT (jsonb_array_elements($1) #>> '{addressType}')::text
$func$ LANGUAGE sql IMMUTABLE; 

2. Create the function index

       
CREATE INDEX complexQueryIndex ON sample USING gin (
	(more_info->'info'->'customerType') jsonb_path_ops,
	(more_info->'info'->'customerPhone'),
	(addressArray(more_info))
);
       

3. Update the Query to use the function

Note this is important, if your query does not use the same function, the postgres engine will not use the function index, or might use the index still, but inefficiently

       
select id, name, more_info
from sample
where more_info->'info'->'customerType' @> '["REG"]'::jsonb
and addressArray(more_info) @> '{PPOR}'::text[]
and more_info->'info'->'customerPhone'->>'primary' = '12345'
 

As you can see below, the EXPLAIN ANALYSE is much faster ( cost range from 12 to 16, time from 0.013 to 0.013)


 

 

Comments

Popular posts from this blog

Spring Boot 2: Parallelism with Spring WebFlux

Cucumber + Junit5 + Localstack