I am working with a matrix set_onco of 206 rows x 196 cols and I have a vector, genes_100 (it's a matrix but I take only the first col), with 101 names.
here's a snippet of how they look
> set_onco[1:10,1:10]
                             V2       V3        V4        V5      V6     V7     V8      V9     V10      V11
GLI1_UP.V1_DN             COPZ1 C10orf46 C20orf118   TMEM181   CCNL2  YIPF1  GTDC1    OPN3   RSAD2  SLC22A1
GLI1_UP.V1_UP            IGFBP6 HLA-DQB1     CCND2     PTH1R TXNDC12   M6PR   PPT2   STAU1     IGJ    TMOD3
E2F1_UP.V1_DN           TGFB1I1    CXCL5    POU5F1    SAMD10    KLF2  STAT6 ENTPD6    VCAN  HMGCS1    ANXA8
E2F1_UP.V1_UP             RRP1B     HES1     ADCY6    CHAF1B  VPS37B  GRSF1   TLX2  SSX2IP    DNA2     CMA1
EGFR_UP.V1_DN             NPY1R    PDZK1     GFRA1     GREB1    MSMB   DLC1    MYB SLC6A14   IFI44   IFI44L
EGFR_UP.V1_UP               FGG     GBP1 TNFRSF11B       FGB    GJA1  DUSP6 S100A9     ADM   ITGB6    DUSP4
ERB2_UP.V1_DN             NPY1R    PDZK1     ANXA3     GREB1   HSPB8   DLC1  NRIP1    FHL2    EGR3    IFI44
FAM18B1                                                                                                    
ERB2_UP.V1_UP            CYP1A1  CEACAM5   FAM129A TNFRSF11B   DUSP4 CYP1B1   UPK2    DAB2 CEACAM6 KIAA1199
GCNP_SHH_UP_EARLY.V1_DN   SRRM2 KIAA1217     DEFA1      DLK1   PITX2   CCL2  UPK3B    SEZ6   TAF15     EMP1
genes_100[1:10,1]
 [1] AL591845.1   B3GALT6      RAP1GAP      HSPG2        BX293535.1   RP1-159A19.1 IFI6         FAM76A       FAM176B      CSF3R       
101 Levels: 5_8S_rRNA AC018470.1 AC091179.2 AC103702.3 AC138972.1 ACVR1B AL049829.5 AL137797.2 AL139260.2 AL450326.2 AL591845.1 AL607122.2 B3GALT6 BX293535.1 ... ZNF678
what I want to do is to parse through the matrix and count the frequency at which each row contains the names in genes_100
to do that I created 3 for loops: the first one moves down one row at the time, the second one moves into the row and the third one loops over the list genes_100 checking for matches.
at the end I save in a matrix how many times genes_100 matched with the terms in each row, saving also the row names from the matrix (so that I know which one is which)
the code works and gives me the correct output...but it's just really slow!!
a snippet of the output is:
head(result_matrix_100)
                    freq_100
[1,] "GLI1_UP.V1_DN" "0"     
[2,] "GLI1_UP.V1_UP" "0"     
[3,] "E2F1_UP.V1_DN" "0"     
[4,] "E2F1_UP.V1_UP" "0"     
[5,] "EGFR_UP.V1_DN" "0"     
[6,] "EGFR_UP.V1_UP" "0" 
I used system.time() and I get:
  user  system elapsed 
 525.38    0.06  530.34
which is way too slow since I have even bigger matrices to parse, and in some cases I have to repeat this 10k times!!!
the code is:
result_matrix_100 <- matrix(nrow=0, ncol=2)
for (q in seq(1,nrow(set_onco),1)) {
  for (j in seq(1, length(set_onco[q,]),1)) {
    for (x in seq(1,101,1)) {
      if (as.character(genes_100[x,1]) == as.character(set_onco[q,j])) {
        freq_100 <- freq_100+1
      }
    }
  }
  result_matrix_100 <- rbind(result_matrix_100, cbind(row.names(set_onco)[q], freq_100))
}
what would you suggest?
thanks in advance :)
 
     
    