I'm looking for fastest way to get distance between two latitude and longitude. One pair is from user and the other pair is from marker. Below is my code :
import geopy
import pandas as pd
marker = pd.read_csv(file_path)
coords_2 = (4.620881605,101.119911)
marker['Distance'] = round(geopy.distance.geodesic((marker['Latitude'].values,marker['Longitude'].values), (coords_2)).m,2)
Previously, I used apply which is extremely slow :
marker['Distance2'] = marker.apply(lambda x: round(geopy.distance.geodesic((x.Latitude,x.Longitude), (coords_2)).m,2), axis = 1)
Then, I used Pandas Series vectorization :
marker['Distance'] = round(geopy.distance.geodesic((marker['Latitude'].values,marker['Longitude'].values), (coords_2)).m,2)
I'm receiving error :
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I added all() and any() to test (such that marker['Latitude'].values.all(),marker['Longitude'].values.all() and vice versa). However, the result calculated was entirely wrong from both any() and all().
This is my result:
    Latitude    Longitude   Distance    Distance2
0   4.620882    101.119911  11132307.42 0.00
1   4.620125    101.120399  11132307.42 99.72
2   4.619368    101.120885  11132307.42 199.26
where Distance is the result from vectorization which is INCORRECT, whereas Distance2 is the result from using apply which is CORRECT. Simply, Distance2 is my expected outcome.
WITHOUT USING apply, I want to produce faster result with correct output.