I have a simple two-stage SQL query that operators on two tables A and B, where I use a sub-select to retrieve a number of IDs of table A that are stored as foreign keys in B, using a (possibly complex) query on table B (and possibly other joined tables). Then, I want to simply return the first x IDs of A. I tried using a query like this:
SELECT sq.id
FROM (
SELECT a_id AS id, created_at
FROM B
WHERE ...
ORDER BY created_at DESC
) sq
GROUP BY sq.id
ORDER BY max(sq.created_at) DESC
LIMIT 10;
which is quite slow as Postgres seems to perform the GROUP BY / DISTINCT operation on the whole result set before limiting it. If I LIMIT the sub-query (e.g. to 100), the performance is just fine (as I'd expect), but of course it's no longer guaranteed that there will be at least 10 distinct a_id values in the resulting rows of sq.
Similarly, the query
SELECT a_id AS id
FROM B
WHERE ...
GROUP BY id
ORDER BY max(created_at) DESC
LIMIT 10
is quite slow as Postgres seems to perform a sequential scan on B instead of using an (existing) index. If I remove the GROUP BY clause it uses the index just fine.
The data in table B is such that most rows contain different a_ids, hence even without the GROUP BY most of the returned IDs will be different. The goal I pursue with the grouping is to assure that the result set always contains a given number of entries from A.
Is there a way to perform an "incremental DISTINCT / GROUP BY"? In my naive thinking it would suffice for Postgres to produce result rows and group them incrementally until it reaches the number specified by LIMIT, which in most cases should be nearly instantaneous as most a_id values are different. I tried various ways to query the data but so far I didn't find anything that works reliably.
The Postgres version is 9.6, the data schema as follows:
Table "public.a"
Column | Type | Modifiers
--------+-------------------+------------------------------------------------
id | bigint | not null default nextval('a_id_seq'::regclass)
bar | character varying |
Indexes:
"a_pkey" PRIMARY KEY, btree (id)
"ix_a_bar" btree (bar)
Referenced by:
TABLE "b" CONSTRAINT "b_a_id_fkey" FOREIGN KEY (a_id) REFERENCES a(id)
Table "public.b"
Column | Type | Modifiers
------------+-----------------------------+--------------------------------------------------
id | bigint | not null default nextval('b_id_seq'::regclass)
foo | character varying |
a_id | bigint | not null
created_at | timestamp without time zone |
Indexes:
"b_pkey" PRIMARY KEY, btree (id)
"ix_b_created_at" btree (created_at)
"ix_b_foo" btree (foo)
Foreign-key constraints:
"b_a_id_fkey" FOREIGN KEY (a_id) REFERENCES a(id)