Before measuring performance and comparing explains from different queries it usually worth to try and mimic your production setup first or you may (and almost certainly will) get a misleading EXPLAIN path (e.g. seq scan is used instead of index scan on small tables as sequential will be faster than random IO in such case)
I tried to tackle it in this way:
First - generate 200K conversations within the past 30 days
insert into conversations(id, name, created_at)
select
generate_series,
'CONVERSATION_'||generate_series,
NOW() - (random() * (interval '30 days'))
from generate_series(1, 200000);
Second - generate 2M messages randomly distributed among 200K conversations
and then also intentionally created 5 more "most recent" messages for conversation with ID=999, so that the conversation 999 has to always appear on top of query result.
insert into messages(id, content, conversation_id, created_at)
select msg.id, content, conversation_id, created_at + (random() * (interval '7 days')) from (
select distinct
generate_series as id,
'Message content ' || generate_series as content,
1 + trunc(random() * 200000) as conversation_id
from generate_series(1, 2000000)
) msg
join conversations c on c.id = conversation_id;
insert into messages(id, content, conversation_id, created_at)
select
generate_series as id,
'Message content ' || generate_series as content,
999 as conversation_id,
now() + interval '7 day' + (random() * (interval '7 days'))
from generate_series(2000001, 2000006);
And now you can try and compare (now with a little bit more confidence) those EXPLAINs to see which query works better.
Assuming you added the proposed index
CREATE INDEX idx1 ON messages(conversation_id, created_at desc)
- Both @GoonerForLife and @asinkxcoswt answers are pretty good, though the result is moderate because of window function usage
with cost=250000 on average and 2 to 3 seconds execution time on my machine
- @SalmanA and @ESG answers are twice as fast even though the
lateral join will force query planner to choose sequential scan (this is inevitable as the join is on TRUE, so no index could be used)
with cost ~150000 on average and 1-1.5 seconds execution time on my machine
It may not be obvious at first, but the @ElapsedSoul's answers with NOT EXISTS is quite close to ideal (though it still needs a couple of tweaks):
(1) It lacks the order by and limit:
select
conversations.id,
m.content AS last_message_content,
m.created_at AS last_message_at
from conversations,messages m
where conversations.id = m.conversation_id and not exists (
select 1 from messages n
where m.conversation_id = n.conversation_id and m.created_at < n.created_at
) order by last_message_at desc
limit 15
And (2) Since there is date-to-date comparison inside the NOT EXISTS subquery - we have to add another index on massages table
CREATE INDEX ix2 ON messages(created_at desc);
After that we should get a decent performance gain.
For example on my machine it resulted in 0.036ms execution time and 20.07 cost