PostgreSQL optimize query performance that contains Window function with CTE

Question

Here the column amenity_category and parent_path is JSONB column with value like ["Tv","Air Condition"] and ["20000","20100","203"] respectively. Apart from that other columns are normal varchar and numeric type. I've around 2.5M rows with primary key on id and it is indexed. Basically the initial CTE part is taking time when rp.parent_path match multiple rows.

Sample dataset:

Current query:

WITH CTE AS
(
  SELECT id,
  property_name,
  property_type_category,
  review_score, 
  amenity_category.name, 
  count(*) AS cnt FROM table_name rp, 
  jsonb_array_elements_text(rp.amenity_categories) amenity_category(name)
  WHERE rp.parent_path ? '203' AND number_of_review >= 1
  GROUP BY amenity_category.name,id 
),
CTE2 as
(
  SELECT id, property_name,property_type_category,name,
  ROW_NUMBER() OVER (PARTITION BY property_type_category,
  name ORDER BY review_score DESC),
  COUNT(id) OVER (PARTITION BY property_type_category,
  name ORDER BY name DESC) 
  FROM CTE
)

SELECT id, property_name, property_type_category, name, COUNT 
FROM CTE2
where row_number = 1

Current Output:

So my basic question is is there any other way I can re-write this query or optimize the current query?

@GordonLinoff sir, I am trying to get count of each property_type_categories with its amenity combination windowing and get the top id and property_name along with it on each window. — A l w a y s S u n n y, Jan 02 '21 at 15:27
for example if you see the output, **Category: Cabin and Amenity: Tv has count of 273 and from that 273 I am grabbing the `id, property_name` value of TOP ROW** — A l w a y s S u n n y, Jan 02 '21 at 15:34

Erwin Brandstetter · Accepted Answer · 2021-01-03T01:24:53.190

If it's safe to assume that array elements in amenity_categories are distinct (no duplicate array elements), we can radically simplify to:

SELECT DISTINCT ON (property_type_category, ac.name)
       id, property_name, property_type_category, ac.name
     , COUNT(*) OVER (PARTITION BY property_type_category, ac.name) AS count
FROM   table_name rp, jsonb_array_elements_text(rp.amenity_categories) ac(name)
WHERE  parent_path ? '203'
AND    number_of_review >= 1
ORDER  BY property_type_category, ac.name, review_score DESC;

If review_score can be NULL, make that:

...
ORDER  BY property_type_category, ac.name, review_score DESC NULLS LAST;

This works, because DISTINCT ON is applied as last step (after window functions). See:

parent_path and number_of_review should probably be indexed. Depends on data distribution and selectivity of the WHERE conditions, which you didn't disclose.

About DISTINCT ON:

Select first row in each GROUP BY group?

Assuming id is NOT NULL, count(*) is faster and equivalent to count(id).

Thanks for the answer sir, I'll get back to you while checking this optimization. Really appreciate your answer. — A l w a y s S u n n y, Jan 03 '21 at 00:42
It works and It's plan and execution time better than than previous. — A l w a y s S u n n y, Jan 04 '21 at 14:56

PostgreSQL optimize query performance that contains Window function with CTE

1 Answers1

Linked