I have a PostgreSQL table with a bunch of events, each containing a timestamp, id, and action. I expect a few million events per day, and expect the table to grow to several billion rows eventually.
I want to query the most recent events for each item (a few hundred thousand distinct ids) before a given date, but currently the query is extremely slow, taking 1-2 hours minimum (the table currently has around 100 million rows). Is there a way to speed up this query?
SELECT a.* FROM (
    with events as (
        SELECT
          ROW_NUMBER() OVER (PARTITION BY item ORDER BY time_stamp DESC) AS rn,
          *
        FROM event_updates
        WHERE time_stamp < '2023-05-01'
    )
    SELECT * FROM events WHERE rn=1 ORDER BY item
) a;
DDL for table
CREATE TABLE "event_updates" (
  "id" int4 NOT NULL DEFAULT nextval("event_updates"::regclass),
  "time_stamp" timestamptz(6) NOT NULL DEFAULT CURRENT_TIMESTAMP,
  "item" varchar(32) COLLATE "pg_catalog"."default" NOT NULL,
  "event_type" int2,
)
PARTITION BY ();
ALTER TABLE "event_updates" OWNER TO "owners";
 
     
    