I have several processed microarray data (normalized, .txt files) from which I want to extract a list of 300 candidate genes (ILMN_IDs). I need in the output not only the gene names, but also the expression values and statistics info (already present in the original file). I have 2 dataframes:
- normalizedDatawith the identifiers (gene names) in the first column, named "Name".
- candidateGeneswith a single column named "Name", containing the identifiers.
I've tried
1).
all=normalizedData  
subset=candidateGenes  
x=all%in%subset 
2).
all[which(all$gene_id %in% subset)] #(as suggested in other bioinf. forum)#, 
but it returns a Dataframe with 0 columns and >4000 rows. This is not correct, since normalizedData has 24 columns and compare them, but I always get error.
The key is to be able to compare the first column of all ("Name") with subset. Here is the info:
> class(all)   
> [1] "data.frame"    
> dim(all)    
> [1] 4312 24    
> str(all)    
> 'data.frame':4312 obs. of 24 variables: 
$ Name: Factor w/ 4312 levels "ILMN_1651253": 3401.. 
$ meanbgt:num 0 .. 
$ meanbgc: num .. 
$ cvt: num 0.11 .. 
$ cvc: num 0.23 ..
$ meant: num 4618 ..
$ stderrt: num 314.6 ..
$ meanc: num 113.8 ... 
$ stderrc: num 15.6 ...
$ ratio: num 40.6 ...     
$ ratiose: num 6.21 ...
$ logratio: num 5.34 ... 
$ tp: num 1.3e-04 ... 
$ t2p: num 0.00476 ... 
$ wilcoxonp: num 0.0809 ...
$ tq: num 0.0256 ...
$ t2q: num 0.165 ...
$ wilcoxonq: num 0.346 ...
$ limmap: num 4.03e-10 ... 
$ limmapa: num 4.34e-06 ... 
$ SYMBOL: Factor w/ 3696 levels "","A2LD1",..
$ ENSEMBL: Factor w/ 3143 levels "ENSG00000000003",..
and here is the info about subset:
> class(subset)    
[1] "data.frame"    
> dim(subset)   
 >[1] 328 1    
> str(subset) 'data.frame': 328 obs. of 1 variable:    
$ V1: Factor w/ 328 levels "ILMN_1651429",..: 177 286 47 169 123 109 268 284 234 186 ...
I really appreciate your help!
 
    