I am trying to find the nearest distance of locations in dataset 1 to dataset 2. Both data sets are different sizes. Ive looked into using the Haversine function but I'm unsure what I need to do after.
            Asked
            
        
        
            Active
            
        
            Viewed 1,174 times
        
    1 Answers
1
            Since you have not provided a sample of your data, I am going to use the oregon.tract data set from the UScensus2000tract library as a reproducible example.
Here is a solution based on fast data.table that I get from this other answer here.
# load libraries
  library(data.table)
  library(geosphere)
  library(UScensus2000tract)
  library(rgeos)
Now let's create a new data.table with all possible pair combinations of origins (census centroids) and destinations (facilities)
# get all combinations of origin and destination pairs
# Note that I'm considering here that the distance from A -> B is equal 
from B -> A.
  odmatrix <- CJ(Datatwo$Code_A , Dataone$Code_B)
  names(odmatrix) <- c('Code_A', 'Code_B') # update names of columns
# add coordinates of Datatwo centroids (origin)
  odmatrix[Datatwo, c('lat_orig', 'long_orig') := list(i.Latitude, 
i.Longitude), on= "Code_A" ]
# add coordinates of facilities (destination)
  odmatrix[Dataone, c('lat_dest', 'long_dest') := list(i.Latitude,  
i.Longitude), on= "Code_B" ]
Now you just need to:
# calculate distances
  odmatrix[ , dist := distHaversine(matrix(c(long_orig, lat_orig), ncol 
= 2), 
                                    matrix(c(long_dest, lat_dest), ncol  
= 2))]
# and get the nearest destinations for each origin
  odmatrix[, .(  Code_B = Code_B[which.min(dist)],
                    dist = min(dist)), 
                                    by = Code_A]
### Prepare data for this reproducible example
# load data
  data("oregon.tract")
# get centroids as a data.frame
  centroids <- as.data.frame(gCentroid(oregon.tract,byid=TRUE))
# Convert row names into first column
  setDT(centroids, keep.rownames = TRUE)[]
# get two data.frames equivalent to your census and facility data 
frames
  Datatwo<- copy(centroids)
  Dataone <- copy(centroids)
  names(Datatwo) <- c('Code_A', 'Longitude', 'Latitude')
  names(Dataone) <- c('Code_B', 'Longitude', 'Latitude')
 
    
    
        ajj
        
- 15
- 5
 
    
    
        rafa.pereira
        
- 13,251
- 6
- 71
- 109
- 
                    I have changed the code/reproducible example to make it more similar to your own data. I hope the answer /explanation is more clear now – rafa.pereira Jun 18 '17 at 01:15
- 
                    I don't know, but I just googled it and I've found this https://stackoverflow.com/questions/36110815/how-to-use-disthaversine-function And https://stackoverflow.com/questions/21496587/error-in-pointstomatrixp1-latitude-90 – rafa.pereira Jun 18 '17 at 19:15
- 
                    read the help file for `?geosphere::distHaversine` - it says "Value : Vector of distances in the same unit as r (default is meters)" – SymbolixAU Jun 18 '17 at 22:57
- 
                    have you 'overwritten' the object first? `odmatrix <- odmatrix[, .( NPRI.ID = NPRI.ID[which.min(dist)], dist = min(dist)), by = Geo_Code]` – SymbolixAU Jun 18 '17 at 23:13
- 
                    You would need to `merge` the two data sets you have. There are many tutorials on how to do it on the web. Here is a good explanation https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right – rafa.pereira Jun 19 '17 at 23:30