I'm running the foreach package to try to parallelize my windows function (this was the only version of parallelizing I could follow easily). I basically need to call a function for g=1, then g=2, etc., and wanted to do this faster.
- My function works perfectly fine with a regular for loop or with %do% instead of %dopar%
- I believe I am passing all of the packages I am using and hopefully the correct variables/objects
- but I have very little understanding of parallelizing & nodes, and the errors don't give me enough to troubleshoot on
- I only included my main function, not all of the other functions it calls, but can provide
- would appreciate any help on this issue, parallelizing in Windows, and what kinds of things I have to keep in mind to make sure my %do% code works across the nodes from %dopar%
Thank you very much for your help!!
My code:
    #agonize parallel
#main function
par_agonize <- function(datfile, num_groups, regen_pref_matrices = FALSE, graph_groups = num_groups) {
  if (regen_pref_matrices) mm <- gen_pref_matrices(datfile)
  out <- list()
  tic.clearlog()
  improve <- tibble(groups=numeric(), agony=numeric(), abs_dec=numeric(), percent_dec=numeric(), total_dec=numeric(), tot_per_dec=numeric())
  foreach(g = 1:num_groups, .packages = loaded.package.names, .export = c(loaded.functions, loaded.objects), .verbose = TRUE) %dopar% { #key line where I use dopar/foreach
    tic()
    out[[g]] <- find_groups(mm, g) #this is the critical line, the improve and tic/toc log are just accessories
    toc(log = TRUE, quiet = FALSE) #calculates time
    log.lst <- tic.log(format = FALSE)
    if (g == 1) { #this calculates summary statistics, not important
      improve <- add_row(improve, groups = g, agony = out[[g]]$ag, abs_dec = 0, percent_dec = 0, total_dec = 0, tot_per_dec=0)
    } 
    else {
      improve <- add_row(improve, groups = g, agony = out[[g]]$ag, abs_dec = out[[g]]$ag - out[[g-1]]$ag, 
                         percent_dec = (out[[g]]$ag - out[[g-1]]$ag)/(out[[g-1]]$ag), total_dec = out[[g]]$ag - out[[1]]$ag,
                         tot_per_dec = (out[[g]]$ag - out[[1]]$ag)/(out[[1]]$ag))
    }
  }
  #just saves output to my list
  out[["summary_stats"]] <- improve
  out[["timings"]] <- tibble(num_groups = 1:g, run_time = unlist(lapply(log.lst, function(x) x$toc - x$tic))) %>% 
    add_row("num_groups" = "Total", "run_time" = sum(out[["timings"]]$run_time[1:g]))
  out[["agony_graph"]] <- graph_agony(out, graph_groups)
  social_rank <<- out
  return(social_rank$agony_graph)
}
#test code
registerDoParallel(cores = detectCores() - 1)
loaded.package.names <- c(sessionInfo()$basePkgs, names(sessionInfo()$otherPkgs))
loaded.package.names #works
loaded.functions <- c("assign_groups", "find_agony", "find_groups", "generate_hierarchy", "gen_pref_matrices", "graph_agony", "init")
loaded.objects <- c("mm") #I can regenerate mm within my code... or use the mm that's already there, so I figured I would export it him
system.time(par_agonize("./data/hof17.csv", 2, regen=F)) #this is the MAIN line that runs my function
stopCluster(cl) #not clear if needed
My current error is:
automatically exporting the following variables from the local 
environment:
  improve, out 
explicitly exporting variables(s): assign_groups, find_agony, find_groups, 
generate_hierarchy, gen_pref_matrices, graph_agony, init, mm
numValues: 2, numResults: 0, stopped: TRUE
got results for task 1
numValues: 2, numResults: 1, stopped: TRUE
returning status FALSE
got results for task 2
accumulate got an error result
numValues: 2, numResults: 2, stopped: TRUE
calling combine function
evaluating call object to combine results:
    fun(accum, result.1)
returning status TRUE
Show Traceback
Rerun with Debug
Error in { : task 2 failed - "replacement has length zero"
 
    