I need to sequently analyze a dataset while using subresults of the operations before.
As I am known to R I decided to work with this and one of the solution I tried is using an for loop.
The dataset which I loop through has around 8 million rows with 4 columns.
I use a data.table and the variables are of type character eg. "XXXXXXXXX"
I tried to loop through but it takes approx 0,7 second per cycle from which the "<-" operation takes half a second.
Can anybody recommend a better technique. Potentially rcpp, apply or whatever?
Thx for your support,
Holger
'%!in%' <- function(x,y)!('%in%'(x,y))
library('data.table')    
dt_loop <- data.table(
              paste0("XXXXXXXXXX", 1:80000000),
              paste0("YXXXXXXXXX", 1:80000000),
              paste0("ZXXXXXXXXX", 1:80000000),
              paste0("AXXXXXXXXX", 1:80000000)
      )
    colnames(dt_loop)[colnames(dt_loop)=="V1"] <- "m"
    colnames(dt_loop)[colnames(dt_loop)=="V2"] <- "c"
    colnames(dt_loop)[colnames(dt_loop)=="V3"] <- "ma"
    colnames(dt_loop)[colnames(dt_loop)=="V4"] <- "unused"
    for(i in 1:nrow(dt_loop)){
      m <- dt_loop$m[i]
      c <- dt_loop$m[i]
      if(m %!in% dt_loop$ma[1:i] & c %!in% dt_loop$ma[1:i]){
        dt_loop$ma[i] <- m
      } else { 
        if(m %in% dt_loop$ma[1:i]){
          dt_loop$ma[i] <- m
        } else {
          dt_loop$ma[i] <- c
        }
      } 
    }
 
    