I have one df1 that has 800 rows and the other df2 that has 9 million rows. Both have latitude and longitude and the df2 has some more columns that I need to add to df1 based on shortest distance as lat and lon do not mach exactly in both dataframes. I used goe_join from Fuzzyjoin package but get errors.
Summary of df1:
summary(df1)
          lat             lon           
 Min.   :25.39   Min.   :-124.62   
 1st Qu.:36.20   1st Qu.:-104.94    
 Median :40.63   Median : -84.15   
 Mean   :39.32   Mean   : -89.44    
 3rd Qu.:42.08   3rd Qu.: -73.97    
 Max.   :48.73   Max.   : -67.27  
Summary of df2:
summary(df2)
lon               lat                    x1                 x2                x3 
 Min.   :-124.73   Min.   :24.98   Min.   :-2230806   Min.   :-1569579   Min.   :     0.0  
 1st Qu.:-110.13   1st Qu.:34.78   1st Qu.:-1126720   1st Qu.: -508033   1st Qu.:   670.8  
 Median : -99.17   Median :39.06   Median : -263314   Median :  -15116   Median :  1507.5  
 Mean   : -99.17   Mean   :38.97   Mean   : -239487   Mean   :  -30086   Mean   :  2856.3  
 3rd Qu.: -88.94   3rd Qu.:43.25   3rd Qu.:  578810   3rd Qu.:  466600   3rd Qu.:  3354.7  
 Max.   : -66.97   Max.   :49.38   Max.   : 2122143   Max.   : 1270878   Max.   :395131.9  
Here is my code:
merged.dfs <- geo_join(df1, df2, by = NULL, method = "haversine", mode = "left", max_dist = 1) 
Here is the error I get:
Joining by: c("lat", "lon") 
Error in fuzzy_join(x, y, multi_by = by, multi_match_fun = match_fun, : long vectors not supported yet: ../../src/include/Rinlinedfuns.h:522
I appreciate your help!
