I am trying to parallelize a nested loop in which I substitute, for the common variables (changevars) between two datasets, within every country (v5) in it, every observation using its id (v3). I have to use the country+id since the id's are duplicated between countries.
My loop code is:
for (var in changevars) {
print(var)
for (i in unique(int2006$v5)) {
print(i)
for (id in unique(int2006$v3)) {
x2006r[x2006r$v5 == i & x2006r$v3 == id, var] <- int2006[int2006$v5 == i & int2006$v3 == id, var]    
}
}
}
I want to parallelize it.
Although it works, it is really slow. And I do not get the logic behind the changing from a for to a foreach loop with dopar. I've tried to understand the other answers, but my attempts have been all failures.
Reproducible example of datasets:
- Source Dataset
> dput(int2006)
structure(list(v3 = c(10001, 10002, 10003, 10004, 10005, 10006, 
10007, 10008, 10009, 10010, 10011, 10012, 10013, 10014, 10015, 
10016, 10017, 10018, 10019, 10020), v5 = c(36, 36, 36, 36, 36, 
36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36), 
    v7 = c(3606, 3606, 3606, 3606, 3606, 3606, 3606, 3606, 3606, 
    3606, 3606, 3606, 3606, 3606, 3606, 3606, 3606, 3606, 3606, 
    3606), v8 = c(1, 1, 2, 1, NA, NA, 1, 2, 2, 2, NA, 2, 2, 1, 
    1, 1, 2, 2, 1, 2), v9 = c(NA, 2, 1, 2, 1, 1, 1, 2, 4, 1, 
    NA, 1, NA, 1, 1, 1, 1, 1, 1, 2)), row.names = c(NA, 20L), class = "data.frame")
- Target Dataset (the one to which the cells of 1 should be copied):
    > dput(x2006r)
structure(list(v3 = c(10001, 10002, 10003, 10004, 10005, 10006, 
10007, 10008, 10009, 10010, 10011, 10012, 10013, 10014, 10015, 
10016, 10017, 10018, 10019, 10020), v5 = c(36, 36, 36, 36, 36, 
36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36, 36), 
    v7 = c("3606", "3606", "3606", "3606", "3606", "3606", "3606", 
    "3606", "3606", "3606", "3606", "3606", "3606", "3606", "3606", 
    "3606", "3606", "3606", "3606", "3606"), v8 = c(1, 1, 2, 
    1, NA, NA, 1, 2, 2, 2, NA, 2, 2, 1, 1, 1, 2, 2, 1, 2), v9 = c(NA, 
    2, 1, 2, 1, 1, 1, 2, 4, 1, NA, 1, NA, 1, 1, 1, 1, 1, 1, 2
    )), row.names = c(NA, 20L), class = "data.frame")
- Variables to iterate
changevars <- c("v7","v8","v9")
Can someone help me? I'm really stuck. Also, I am not sure if parallelizing this loop will help me in terms of speed.
Thank you very much!
 
    