I have a table with approx 8 million rows that I need time slice data from.
I'm using PostgreSQL 9.1.
I need to query this table for a sum of the max(start_time) associated 'data' values for each 'object_id' (given a list), on a per day basis. (In other words, the closest record to the end of each day, for each object_id in a specific list).
Here's the basic table structure:
CREATE TABLE checks (
  id SERIAL PRIMARY KEY,
  object_id INTEGER,
  state INTEGER,
  start_time TIMESTAMP,
  data TEXT
);
data is a TEXT field, but has numeric values (I can't change this aspect, but can convert using casts).
And here's the query I'm working with so far:
WITH object_ids AS ( 
    SELECT object_id FROM objects WHERE object_id in (14845,12504,12451,12452)
),
records AS (
    SELECT
        data,
        start_time,
        MAX(start_time) OVER (PARTITION BY object_id)
    FROM checks
    WHERE
        object_id IN (SELECT object_id FROM object_ids) AND
        state = 0 AND
        start_time BETWEEN '2013-05-01 00:00:00' AND '2013-05-02 00:00:00'
)   
SELECT 
    SUM(data::bigint) 
FROM   
    records
WHERE 
    max = start_time
I'll run this query for each day of the month to give me a set of chart data points.
I would love to modify this query so I don't have to run separate queries per day, but one query to return a set of per day values
start_time          | sum
---------------------------
2013-05-01 00:00:00 | 39118
2013-05-02 00:00:00 | 98387
2013-05-03 00:00:00 | 8384
I've been researching time slice questions, and they are very helpful (I owe the fact that I'm using windowing functions to StackOverflow!), but I just can't make the leap to solving this problem.
 
     
     
    