I manage a message based system in which a sequence of unique integer ids will be entirely represented at the end of the day, though they will not necessarily arrive in order.
I am looking for help in finding missing ids in this series using SQL. If my column values are something like the below, how can I find which ids I am missing in this sequence, in this case 6?
The sequence will begin and end at an arbitrary point each day, so min and max would differ upon each run. Coming from a Perl background I through some regex in there.
ids
1
2
3
5
4
7
9
8
10
Help would be much appreciated.
Edit: We run oracle
Edit2: Thanks all. I'll be running through your solutions next week in the office.
Edit3: I settled for the time being on something like the below, with ORIG_ID being the original id column and MY_TABLE being the source table. In looking closer at my data, there are a variety of cases beyond just number data in a string. In some cases there is a prefix or suffix of non-numeric characters. In others, there are dashes or spaces intermixed into the numeric id. Beyond this, ids periodically appear multiple times, so I included distinct.
I would appreciate any further input, specifically in regard to the best route of stripping out non-numeric characters.
SELECT
CASE
WHEN NUMERIC_ID + 1 = NEXT_ID - 1
THEN TO_CHAR( NUMERIC_ID + 1 )
ELSE TO_CHAR( NUMERIC_ID + 1 ) || '-' || TO_CHAR( NEXT_ID - 1 )
END
MISSING_SEQUENCES
FROM
(
SELECT
NUMERIC_ID,
LEAD (NUMERIC_ID, 1, NULL)
OVER
(
ORDER BY
NUMERIC_ID
ASC
)
AS NEXT_ID
FROM
(
SELECT
DISTINCT TO_NUMBER( REGEXP_REPLACE(ORIG_ID,'[^[:digit:]]','') )
AS NUMERIC_ID
FROM MY_TABLE
)
) WHERE NEXT_ID != NUMERIC_ID + 1