I have a big table of timestamps in Postgres 9.4.5:
CREATE TABLE vessel_position (
  posid serial NOT NULL,
  mmsi integer NOT NULL,
  "timestamp" timestamp with time zone,
  the_geom geometry(PointZ,4326),
  CONSTRAINT "PK_posid_mmsi" PRIMARY KEY (posid, mmsi)
);
Additional index:
CREATE INDEX vessel_position_timestamp_idx ON vessel_position ("timestamp");
I want to extract every row where the timestamp is at least x minutes after the previous row. I've tried a few different SELECT statements using LAG() which all kind of worked, but didn't give me the exact result I require. The below functions gives me what I need, but I feel it could be quicker:
CREATE OR REPLACE FUNCTION _getVesslTrackWithInterval(mmsi integer, startTime character varying (25) ,endTime character varying (25), interval_min integer)
RETURNS SETOF vessel_position AS
$func$
DECLARE
    count integer DEFAULT 0;
    posids varchar DEFAULT '';
    tbl CURSOR FOR
    SELECT
      posID
      ,EXTRACT(EPOCH FROM (timestamp -  lag(timestamp) OVER (ORDER BY posid asc)))::int as diff
    FROM vessel_position vp WHERE vp.mmsi = $1  AND vp.timestamp BETWEEN $2::timestamp AND $3::timestamp;
BEGIN
FOR row IN tbl
LOOP
    count := coalesce(row.diff,0) + count;
    IF count >= $4*60 OR count = 0 THEN
            posids:= posids || row.posid || ',';
            count:= 0;
     END IF;
END LOOP;
RETURN QUERY EXECUTE 'SELECT * from vessel_position where posid in (' || TRIM(TRAILING ',' FROM posids) || ')';
END
$func$ LANGUAGE plpgsql;
I can't help thinking getting all the posids as a string and then selecting them all again at the very end is slowing things down.
Within the IF statement, I already have access to each row I want to keep, so could potentially store them in a temp table and then return temp table at the end of the loop.
Can this function be optimised - to improve performance in particular?
 
    