I have a list of sets called groups. groups[i] is a set of labels (integers) that are part of the same group as i. For example,
# idx: 0 1 2 3 4 5 6 7
groups = [{0}, {1}, {2,5,6}, {3}, {4,7}, {2,5,6}, {2,5,6}, {4,7}]
indicates that
0belongs to the group containing just01belongs to the group containing just12,5, and6each belong to the same group, namely the group containing2,5and63belongs to the group containing just14and7each belong to the same group, namely the group containing4and7
Sometimes it turns out that two groups are the same. For example, say we found out that 4 and 5 belong to the same group. Then we have to merge group {2,5,6} with group {4,7}. After this merge, groups should look like:
# idx: 0 1 2 3 4 5 6 7
groups = [{0}, {1}, {2,4,5,6,7}, {3}, {2,4,5,6,7}, {2,4,5,6,7}, {2,4,5,6,7}, {2,4,5,6,7}]
Naively, this would require updating groups[2], groups[4], groups[5], groups[6] and groups[7] or, in general, when merging groupA with groupB, it would require len(groupA) + len(groupB) updates.
A more efficient approach, would be if instead of having multiple copies of the same set in groups, we had a reference to the same set
# idx: 0 1 2 3 4 5 6 7
groups = [{0}, {1}, groupA, {3}, groupB, groupA, groupA, groupB]
where groupA == {2,5,6} and groupB == {4,7}
Then, to merge groupA and groupB would require at most min(len(groupA), len(groupB)) updates:
if len(groupA) < len(groupB):
groupA, groupB = groupB, groupA
# len(groupA) >= len(groupB)
groupA |= groupB
for label in groupB:
groups[label] = groupA
which would result in
# idx: 0 1 2 3 4 5 6 7
groups = [{0}, {1}, groupA, {3}, groupA, groupA, groupA, groupA]
where groupA == {2,4,5,6,7}
Is there a more efficient way to merge groups? Is Union-Find (Disjoin Set Union) data structure the way?
