I have a large dataframe that is structured as follows:
vals
  idx v
1   1 3
2   2 2
3   3 0
4   4 2
5   5 0
6   6 0
7   7 0
.
.
.
I need to put the content of this data frame into a csv file in the following way: I need to iterate through the 'idx' column in steps of, let's say 2 for example, and from every second idx value, need the 'v' value in the corresponding row and the next 2 'v' values below this.
Hence taking the first 7 rows of the above example dataframe:
> d=data.frame()
> temp=seq(vals[1,1],vals[nrow(vals),1]-1,2)
> for(i in temp){d=rbind(d,c(vals[which(vals[,1]==i)[1],1],vals[which(vals[,1]>=i & vals[,1]<=i+2),2]))}
> d
  X1 X3 X2 X0
1  1  3  2  0
2  3  0  2  0
3  5  0  0  0
The above code gives me what I want. However, in reality the 'vals' dataframe that I am working with is really big and this is taking an infeasible amount of time to process... I am trying to get a working solution for the parallelized version of the above code:
> d=data.frame()
> temp=seq(vals[1,1],vals[nrow(vals),1]-1,2)
> put_it=function(i){d=rbind(d,c(vals[which(vals[,1]==i)[1],1],vals[which(vals[,1]>=i & vals[,1]<=i+2),2]))}
> mclapply(temp,put_it,mc.cores = detectCores()
[[1]]
  X1 X3 X2 X0
1  1  3  2  0
[[2]]
  X3 X0 X2 X0.1
1  3  0  2    0
[[3]]
  X5 X0 X0.1 X0.2
1  5  0    0    0
hence the 'd' data frame is reset each time which does not give me what I want- as I need fro all of the data to be in the same dataframe.
I also considered writing the data, as a new row, to a file each time an iteration was complete:
temp=seq(vals[1,1],vals[nrow(vals),1]-1,2)
put_it=function(i){cat(vals[which(vals[,1]==i)[1],1],
         ',',paste(vals[which(vals[,1]>=i & vals[,1]<=i+10000),2],
          sep=' '),'\n',sep=' ',append=T,
           file='~/files/test.csv')}
mclapply(temp,put_it,mc.cores = detectCores())
Note that this time that I am adding vectors of 10000 rather than just the next 2 values However this runs into problems when 2 jobs execute at the same time and I get a file with multiple new rows started in the middle of other rows:
 [middle of a row]........0 0 0 0 01  0,  00  00  00  00  0 0 0 0 0 0 0 .....