I have two DataFrames, df1 are locations of places and df2 are locations of stations. I am trying to find a more efficient way to apply a distance function to find which stations are within a certain range and return the station's name. If the distance function is a Latitude Difference of +/- 1 this is my expected outcome:
# df1
   Lat  Long 
0   30    31    
1   37    48    
2   54    62    
3   67    63     
# df2
   Station_Lat  Station_Long Station
0           30            32     ABC    
1           43            48     DEF    
2           84            87     GHI    
3           67            62     JKL    
# ....Some Code that compares df1 and df2....
# result
   Lat  Long  Station_Lat  Station_Long Station
    30    31           30            32     ABC
    67    63           67            62     JKL
I have a solution that uses a cartesian product/Cross Join to apply a function on a single DataFrame. This solution works, but I have millions of rows in a true dataset which makes a cartesian product very slow.
import pandas as pd
df1 = pd.DataFrame({'Lat' : [30, 37, 54, 67],
                    'Long' : [31, 48, 62, 63]})
df2 = pd.DataFrame({'Station_Lat' : [30, 43, 84, 67],
                    'Station_Long' : [32, 48, 87, 62],
                    'Station':['ABC', 'DEF','GHI','JKL']})
# creating a 'key' for a cartesian product
df1['key'] = 1
df2['key'] = 1
# Creating the cartesian Join
df3 = pd.merge(df1, df2, on='key')
# some distance function that returns True or False
# assuming the distance function I want is +/- 1 of two values
def some_distance_func(x,y):
    return x-y >= -1 and x-y <= 1
# applying the function to a column using vectorized approach
# https://stackoverflow.com/questions/52673285/performance-of-pandas-apply-vs-np-vectorize-to-create-new-column-from-existing-c
df3['t_or_f'] =  list(map(some_distance_func,df3['Lat'],df3['Station_Lat']))
# result
print(df3.loc[df3['t_or_f']][['Lat','Long','Station_Lat','Station_Long','Station']].reset_index(drop=True))
I have also tried a looping approach with iterrows(), but that is slower than the cross join method. Is there a more pythonic/efficient way to achieve what I am looking for?
 
     
     
    