I need to cache the first, last, and second to last time a thing happened per user. The history table I'm querying has hundreds of millions of rows (we're caching so we can truncate it), and the table I'm updating has dozens of millions.
Currently I'm doing it in batches of 1000 to avoid locking the tables. The query is like so:
with ranked as (
  select
      user_id,
      rank() over (partition by user_id order by created_at desc) as ranked_desc,
      rank() over (partition by user_id order by created_at asc) as ranked_asc,
      created_at
  from history
  where type = 'SomeType' and
        user_id between $1 and $2
)
update
  users u
set
  latest_at = (
    select created_at
    from ranked
    where ranked.ranked_desc = 1 and ranked.user_id = u.id
  ),
  previous_at = (
    select created_at
    from ranked
    where ranked.ranked_desc = 2 and ranked.user_id = u.id
  ),
  first_at = (
    select created_at
    from ranked
    where ranked.ranked_asc = 1 and ranked.user_id = u.id
  )
from ranked
where u.id = ranked.user_id
Relevant indexes on history are these. They are all btree indexes.
- (created_at)
- (user_id, created_at)
- (user_id, type)
- (type, created_at)
Can this be optimized? I feel this can be done without the subqueries.
 
    