I have a PostgreSQL table with the following relevant fields:
url
title
created_at
There can be many rows that contain identical URLs but different titles. Here are some sample rows:
www.nytimes.com | The New York Times | 2016-01-01 00:00:00`
www.wsj.com | The Wall Street Journal | 2016-01-03 15:32:13`
www.nytimes.com | The New York Times Online | 2016-01-06 07:19:08`
I'm trying to obtain an output that lists the following fields:
1) url
2) title that corresponds to the highest value of created_at
3) count of all title for that unique url
So, output rows for the above sample would look something like this:
www.nytimes.com | The New York Times Online | 2
www.wsj.com | The Wall Street Journal | 1
Based on the numerous SO posts I've read on similar questions, it looks like my best option for obtaining the first two fields (url and latest title) would be to use DISTINCT ON:
select distinct on (url) url, title from headlines order by url, created_at desc
Likewise, to obtain the first and third fields (url and count of all title), I could simply use GROUP BY:
select url, count(title) from headlines group by url
What I can't figure out is how to combine the above methodologies and obtain the above-mentioned three values I'm trying to get.
(Edited to provide more clarity.)