I am using foreach::foreach() in R to run an analysis in parallel. I am using a computing cluster with 1 node, 500Gb of RAM and 30 cores. I initialize the cluster using:
myCluster <- parallel::makeCluster(28)
doParallel::registerDoParallel(myCluster)
The process runs through completely and takes around 8 hours to complete, however, the foreach loop does not combine the results and returns a null object (lcp_network). The loop code looks like this (not a reprex):
lcp_network <- foreach::foreach(i = 1:nrow(comps), .errorhandling = "remove", .combine = "rbind", .packages = c("sf", "terra","leastcostpath","dplyr")) %dopar% {
  
  lcp <- leastcostpath::create_lcp(cost_surface = tr1,
                                   origin = nodes_sp[comps[i,1],, drop = FALSE],
                                   destination = nodes_sp[comps[i,2],, drop = FALSE])
  
  lcp$origin_ID <- nodes_sp[comps[i,1],]$layer
  lcp$destination_ID <- nodes_sp[comps[i,2],]$layer
   lcp <- lcp %>%
     st_as_sf() %>%
     mutate(length = st_length(.)) %>%
     st_drop_geometry()
  attributes(lcp$length) <- NULL
  
  return(lcp)
  
}
Notably, this same code runs on a smaller subset of data on my personal computer (8Gb of RAM, 8 cores) and combines, no problem. The error message given when using the .verbose argument is:
numValues: 43, numResults: 0, stopped: TRUE
got results for task 1
accumulate got an error result
numValues: 43, numResults: 1, stopped: TRUE
returning status FALSE
got results for task 2
...
returning status FALSE
got results for task 43
numValues: 43, numResults: 43, stopped: TRUE
not calling combine function due to errors
returning status TRUE
Any advice is helpful. I have tried adding gc() within the loop, among other attempted fixes.
EDIT: I noticed that the first description in the verbose statement notes:
accumulate got an error result
and at every point thereafter, it notes:
returning status FALSE
EDIT 2: I ran the same code on a different server, using the same parameters (500Gb of RAM, 30 cores). The error code is different now:
numValues: 43, numResults: 0, stopped: TRUE
Error in unserialize(socklist[[n]]) : error reading from connection
Calls: %dopar% ... recvOneData -> recvOneData.SOCKcluster -> unserialize
Execution halted
slurmstepd: error: Detected 8 oom-kill event(s) in StepId=13251537.batch. 
Some of your processes may have been killed by the cgroup out-of-memory handler.
 
     
    