I'm stuck with a quite complex problem. I have a data frame with three rows: id, info and rownum. The data looks like this:
id   info   row
 1      a     1
 1      b     2
 1      c     3
 2      a     4
 3      b     5
 3      a     6
 4      b     7
 4      c     8
What I want to do now is to delete all other rows of one id if one of the rows contains the info a. This would mean for example that row 2 and 3 should be removed as row 1's coloumn info contains the value a. Please note that the info values are not ordered (id 3/row 5 & 6) and cannot be ordered due to other data limitations.
I solved the case using a for loop:
# select all id containing an "a"-value 
a_val <- data$id[grep("a", data$info)]
# check for every id containing an "a"-value
for(i in a_val) {
   temp_data <- data[which(data$id == i),]
   # only go on if the given id contains more than one row
   if (nrow(temp_data) > 1) {
      for (ii in nrow(temp_data)) {
         if (temp_data$info[ii] != "a") {
            temp <- temp_data$row[ii]
            if (!exists("delete_rows")) {
               delete_rows <- temp
            } else {
               delete_rows <- c(delete_rows, temp)
            }
         }
      }
   }
}
My solution works quite well. Nevertheless, it is very, very, very slow as the original data contains more than 700k rows and more that 150k rows with an "a"-value.
I could use a foreach loop with 4 cores to speed it up, but maybe someone could give me a hint for a better solution.
Best regards,
Arne
[UPDATE]
The outcome should be:
id   info   row
 1      a     1
 2      a     4
 3      a     6
 4      b     7
 4      c     8
 
     
     
     
     
    