I am trying to find the minimum distance between each customer to the store. Currently, there are ~1500 stores and ~670K customers in my data. I have to calculate the geo distance for 670K customers x 1500 stores and find the minimum distance for each customer.
I have created the haversine function below:
import numpy as np
def haversine_np(lon1, lat1, lon2, lat2):
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
    dlon = lon2 - lon1
    dlat = lat2 - lat1
    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
    c = 2 * np.arcsin(np.sqrt(a))
    miles = 6367 * c/1.609
    return miles
and my data set looks like below, 1 data frame for the customer (cst_geo) and 1 data frame for the store (store_geo). The numbers below are made up as I can't share the snippet of the real data:
| Customer ID | Latitude | Longitude | 
|---|---|---|
| A123 | 39.342 | -40.800 | 
| B456 | 38.978 | -41.759 | 
| C789 | 36.237 | -77.348 | 
| Store ID | Latitude | Longitude | 
|---|---|---|
| S1 | 59.342 | -60.800 | 
| S2 | 28.978 | -71.759 | 
| S3 | 56.237 | -87.348 | 
I wrote a for loop below to attempt this calculation but it took >8 hours to run. I have tried to use deco but wasn't able to optimize it any further.
mindist = []
for i in cst_geo.index:
    dist = []
    for j in store_geo.index:
        dist.append(haversine_np(cst_geo.longitude[i], cst_geo.latitude[i],
                                 store_geo.longitude[j], store_geo.latitude[j]))    
    mindist.append(min(dist))