I have 2 lists of tuples list1 = [(1.332, 3.23344, 3.22), (2.122, 2.11, 2.33), ... (1, 2, 3)] and list2 = [(4.23, 12.2, 3.333), (1.234, 3.21, 4.342), ... (1.1, 2.2, 3.3)]. These lists are both very long, somewhere in the millions for both lists. For context, each of these data points is some measure of position in two different datasets. Now I want to correspond each entry in list1 to an entry in list2 if it is "close enough". By close enough I mean the distance between the positions is less than some threshold value (say .1 for example). My initial thought was using the min function on each entry in list1. That is, the following:
import numpy as np
import random
def dist(pt1, pt2):
return np.sqrt( ((pt2[0] - pt1[0]) ** 2) + ((pt2[1] - pt1[1]) ** 2) + ((pt2[2] - pt1[2]) ** 2) )
list1 = [(random.random(), random.random(), random.random()) for _ in range(25)]
list2 = [(random.random(), random.random(), random.random()) for _ in range(20)]
threshold = .5
linker = []
for i, entry in enumerate(list1):
m = min(list2, key=lambda x: dist(entry, x))
if dist(entry, m) < threshold:
linker.append((i, list2.index(m))
So this would link each index in list1 to and index in list2. But I feel like there must be some already developed algorithm for this task specifically which is much faster, is there?