I want an efficient data structure that represents a set of integers from the range 0..k, where k is very small, let's say less than 1024.
I have come up with an implementation with the following complexities:
initialize: O(1)
size: O(1)
contains: O(1)
insert: O(1)
remove: O(1)
clear: O(1)
union: O(k)
However, I suspect my O(1)s might really by O(log(k)) since we have to read O(log(k)) digits somewhere.
The memory requirement is O(k).
I have not implemented equals, intersection or difference because I don't need them, but they all would be O(k). My implementation does support iteration.
This seems like it could come up in many different problems, so I'm wondering has my implementation been done before?
And more importantly: Can we do better?
Obviously better is a little subjective, but something with only O(1) and O(log(k)) everywhere might be interesting.
I suspect bitsets could be more practical when k is a little bigger, but I'm not sure about for such small k.
Here is pseudocode of what I have. It works by allocating an array and a map (implemented as another array) of lengths k and k+1, but only caring about the first m elements of the array, where m is the current size of the set.
class IntSet:
m: int
arr: array[int]
map: array[int]
init(k):
arr: int[k] # allocate, but don't initialize
map: int[k+1] # allocate, but don't initialize
m = 0
size(): m
contains(x: int): map[x] < m and arr[map[x]] == x
insert(x: int):
if not contains(x):
arr[m] = x
map[x] = m
m += 1
remove(x: int):
if contains(x):
m -= 1
arr[map[x]] = arr[m] # move x off the end
map[arr[m]] = map[x]
clear(): m = 0
union(s: IntSet):
for i in 0..s.m:
if not contains(s.arr[i]):
insert(s.arr[i])
items():
for i in 0..m:
yield arr[i]
EDIT:
Based on my understanding, I imagine a bitset implementation would look like:
initialize: O(k)
size: O(popcount of k bits) O(1)
contains: O(1)
insert: O(1)
remove: O(1)
clear: O(k)
union: O(bitwise OR of k bits)
If we use a bitset with 1024 bits I'm not sure if this is better.