I have a data.table with two columns of genes and each row treated as a pair. Some gene pairs are duplicated with the order reversed. I am looking for a faster method, preferably without using a loop like the one I've provided, to keep unique pairs in my table.
library(data.table)
genes <- data.table(geneA = LETTERS[1:10], geneB = c("C", "G", "B", "E", "D", "I", "H", "J", "F", "A"))
revG <- genes[,.(geneA = geneB, geneB = geneA)]
d <- fintersect(genes, revG)
for (x in 1:nrow(d)) {
  entry <- d[,c(geneA[x], geneB[x])]; revEntry <- rev(entry)
  dupEntry <- d[geneA %chin% revEntry[1] & geneB %chin% revEntry[2]]
  if (nrow(dupEntry) > 0) {
    d <- d[!(geneA %chin% dupEntry[,geneA] & geneB %chin% dupEntry[,geneB])]
  }
}
The table object d contains the duplicated, reversed pairs. After the loop, one copy of each is remaining. I used the original genes table and took a subset, excluding the copies in d and storing the index. I have a list whose names are the same as the first column in genes. The index is used to filter the list based on the duplicate pairs that were removed with the loop.
idx <- genes[!(geneA %chin% d[,geneA] & geneB %chin% d[,geneB]), which = TRUE]
geneList <- vector("list", length = nrow(genes)); names(geneList) <- genes[,geneA]
geneList <- geneList[idx]
The above method isn't necessarily too slow, but I plan on using ~12K genes so the speed might be noticeable then. I found a question with the same problem posted but without using data.table. It uses an apply function to get the job done but that might also be slow when dealing with larger numbers.
 
     
    