Consider this interface:
public interface CoordinateSet {
boolean contains(int x, int y);
default boolean contains(Coordinate coord) {
return contains(coord.x, coord.y);
}
}
It represents a set of 2-dimensional integer coordinates, and each possible coordinate may be either inside the set (contains returns true) or outside (contains returns false).
There are many ways we can implement such an interface. The most computationally efficient one would be the implementation backed up by an array:
public class ArrayCoordinateSet implements CoordinateSet {
private final boolean[][] coords = new boolean[SIZE][SIZE];
// ...
@Override
public boolean contains(int x, int y) {
return coords[x][y];
}
public void add(int x, int y) {
coords[x][y] = true;
}
// ...
}
However, if SIZE is something large, say, 1000, and there are only, say, 4 cordinates that belong to the set, right in the four angles of a 1000×10000 rectangle, that means the absolute majority of cells space is consumed by false values. For such a sparse CoordinateSet we'd better be using a HashSet-based CoordinateSet:
public final class Coordinate {
public final int x;
public final int y;
public Coordinate(int x, int y) {
this.x = x;
this.y = y;
}
// .equals() and hashCode()
}
public class HashBasedCoordinateSet implements CoordinateSet {
private final Set<Coordinate> coords = new HashSet<>();
@Override
public boolean contains(int x, int y) {
return coords.contains(new Coordinate(x, y));
}
@Override
public boolean contains(Coordinate coord) {
return coords.contains(coord);
}
public void add(Coordinate coord) {
coords.add(coord);
}
}
However, with the HashBasedCoordinateSet we have such an issue:
for (int x=0; x<1000; x++) {
for (int y=0; y<1000; y++) {
hashBasedCoordinateSet.contains(x, y);
}
}
When we have values x and y and want to check if hashBasedCoordinateSet.contains(x, y), then that would require creating a new object at each method call (since we always need an object to search in a HashSet, it is not enough to just have object's data). And that would be a real waste of CPU time (it'd need to create all those Coordinate objects and then grabage-collect them, since seemngly no escape-analysis optimisation can be performed on this code).
So finally, my question is:
What would be the data structure to store a sparse set of coordinates that:
- Has O(1)
contains(int x, int y)operation; - Efficiently uses space (unlike the array-based implementation );
- Does not have to create extra objects during
contains(int x, int y)?