The below code produces different results on Windows and Ubuntu platforms. I understand it is because of the different methods of handling parallel processing.
Summarizing:
I cannot insert / rbind data on Linux parallely (mclapply, mcmapply) while I can do it on Windows.
Thanks @Hong Ooi for pointing out that
mclapplydoes not works on Windows parallely, yet below question is still valid.
Of course there are no multiple inserts to same data.frame, each insert is performed into separate data.frame.
library(R6)
library(parallel)
# storage objects generator
cl <- R6Class(
classname = "cl",
public = list(
data = data.frame(NULL),
initialize = function() invisible(self),
insert = function(x) self$data <- rbind(self$data, x)
)
)
N <- 4L # number of entities
i <- setNames(seq_len(N),paste0("n",seq_len(N)))
# random data.frames
set.seed(1)
ldt <- lapply(i, function(i) data.frame(replicate(sample(3:10,1),sample(letters,1e5,rep=TRUE))))
# entity storage
lcl1 <- lapply(i, function(i) cl$new())
lcl2 <- lapply(i, function(i) cl$new())
lcl3 <- lapply(i, function(i) cl$new())
# insert data
invisible({
mclapply(names(i), FUN = function(n) lcl1[[n]]$insert(ldt[[n]]))
mcmapply(FUN = function(dt, cl) cl$insert(dt), ldt, lcl2, SIMPLIFY=FALSE)
lapply(names(i), FUN = function(n) lcl3[[n]]$insert(ldt[[n]]))
})
### Windows
sapply(lcl1, function(cl) nrow(cl$data)) # mclapply
# n1 n2 n3 n4
# 100000 100000 100000 100000
sapply(lcl2, function(cl) nrow(cl$data)) # mcmapply
# n1 n2 n3 n4
# 100000 100000 100000 100000
sapply(lcl3, function(cl) nrow(cl$data)) # lapply
# n1 n2 n3 n4
# 100000 100000 100000 100000
### Unix
sapply(lcl1, function(cl) nrow(cl$data)) # mclapply
#n1 n2 n3 n4
# 0 0 0 0
sapply(lcl2, function(cl) nrow(cl$data)) # mcmapply
#n1 n2 n3 n4
# 0 0 0 0
sapply(lcl3, function(cl) nrow(cl$data)) # lapply
# n1 n2 n3 n4
# 100000 100000 100000 100000
And the question:
How can I achieve rbind parallely into separate data.frames on a Linux platform?
P.S. Off-memory storage like SQLite cannot be considered as solution in my case.