I wrote some code that finds the distance between gps coordinates based on machines having the same serial numbers looking at
But I believe it will be more efficient if it can be simplified to using iterrows or df.apply; however, I cannot seems to figure it out.
Since I need to only execute the function when ser_no[i] == ser_no[i+1] and insert a NaN value at the location where the ser_no changes, I cannot seem to apply the Pandas methodology to make the code more efficient. I have looked at:
- Vectorised Haversine formula with a pandas dataframe
- Python function to calculate distance using haversine formula in pandas
- Vectorizing a function in pandas
Unfortunately, I don't readily see the leap I need to make even after looking over these posts.
What I have:
def haversine(lat1, long1, lat2, long2):
    r = 6371  # radius of Earth in km
    # convert decimals to degrees
    lat1, long1, lat2, long2 = map(np.radians, [lat1, long1, lat2, long2])
    # haversine formula
    lat = lat2 - lat1
    lon = long2 - long1
    a = np.sin(lat/2)**2 + np.cos(lat1)*np.cos(lat2)*np.sin(lon/2)**2
    c = 2*np.arcsin(np.sqrt(a))
    d = r*c
    return d
# pre-allocate vector    
hdist = np.zeros(len(mttt_pings.index), dtype = float)    
# haversine loop calculation
for i in range(0, len(mttt_pings.index) - 1):
    '''
    when the ser_no from i and i + 1 are the same calculate the distance
    between them using the haversine formula and put the distance in the
    i + 1 location
    '''
    if mttt_pings.ser_no.loc[i] == mttt_pings.ser_no[i + 1]:
        hdist[i + 1] = haversine(mttt_pings.EQP_GPS_SPEC_LAT_CORD[i], \
        mttt_pings.EQP_GPS_SPEC_LONG_CORD[i], \
        mttt_pings.EQP_GPS_SPEC_LAT_CORD[i + 1], \
        mttt_pings.EQP_GPS_SPEC_LONG_CORD[i + 1])
    else:
        hdist = np.insert(hdist, i, np.nan)
    '''
    when ser_no i and i + 1 are not the same, insert NaN at the ith location
    '''
 
     
    