I have two sets of data that I would like to investigate.  The first is gene/genome related data given different 'cell-states'.  The second set of data is relates the genes to a biological pathway. I believe my question is a relational db one.
'How can I show the data related from one dataframe and relate it to another. In other words, I want to graph the cell-state data and relate it to pathways and their specific genes. (I think in pictures so here goes.)
dataframe1-data from an affymetrix gene-chip
gene, cell-state1, cell-state2...
gene1, x1, y1,...
gene2, x2, y2,...
gene.x, ... ...
"1" "gene"  "log_b" "log_b_rich"    "Fc_cdt_rich_tot"   "fc_Etoh_CDT_tot_mono"  "fc_Etoh_CDT_tot_poly"  "fc_Etoh_CDT_mono_poly" "fc_Etoh_Rich_tot_mono" "fc_Etoh_Rich_tot_poly" "fc_Etoh_Rich_mono_poly"
"2" "PHF13" -2.712616698    -1.47923545 -0.791138043    -0.549610558    0.143808182 0.69341874  0.320812876 1.089260116 0.76844724
"3" "SPSB1" -1.808348454    -1.965601198    -1.349135752    -0.780105329    0.410647447 1.190752776 0.587287796 1.260350195 0.673062399
dataframe2-data from the kegg db
pathway1, gene-x1, gene-x2, ...
pathway2, gene-y1, gene-y2, ...
pathway3, gene-z1, ...
"1" "KEGG_GLYCOLYSIS_GLUCONEOGENESIS"   "PHF13" "LDHB"  "LDHA"  "PGAM1" "ADH1C" "PGAM2" "ADH1B" "ADH1A" "ACSS2" "PDHB"  "ACSS1" "PGAM4" "PDHA2" "PDHA1" "LDHAL6B"   "PFKL"  "LDHAL6A"   "FBP1"  "PFKP"  "ALDH3B2"   "FBP2"  "PFKM"  "ALDH3B1"   "PGM2"  "G6PC"  "ALDH7A1"   "ALDH1B1"   "PKM2"  "PGM1"  "DLD"   "PKLR"  "ALDH9A1"   "ALDOA" "ALDOC" "ALDOB" "ADH5"  "HK2"   "HK1"   "ADH6"  "ADH7"  "ALDH3A2"   "G6PC2" "ALDH3A1"   "GALM"  "TPI1"  "AKR1A1"    "ADH4"  "HK3"   "ALDH1A3"   "ENO2"  "ENO3"  "GAPDH" "ENO1"  "BPGM"  "DLAT"  "PCK2"  "PCK1"  "GPI"   "GCK"   "ALDH2" "PGK1"  "PGK2"
"2" "KEGG_CITRATE_CYCLE_TCA_CYCLE"  "PHF13" "OGDHL" "OGDH"  "PDHB"  "IDH3G" "LOC283398" "IDH2"  "IDH1"  "PDHA2" "PDHA1" "SUCLA2"    "FH"    "DLST"  "ACO2"  "SUCLG2"    "ACO1"
"PHF13" is highlighted to show relevance in each step.
What I want to do is, see if 'cell-state1' (in-)activates different genes / pathways from 'cell-state2.' Furthermore, I would like to test for correlation (t-test and maybe graphing) between the cell-states 1 Vs 2 for specific pathways. 
My question is, which commands or method would allow me to do this most easily/efficiently: merge or using dummy variable?
HTH
            Asked
            
        
        
            Active
            
        
            Viewed 156 times
        
    1
            
            
         
    
    
        mccurcio
        
- 1,294
- 5
- 25
- 44
- 
                    3Please rephrase your question in such a way that it actually becomes a programming problem, and the problem itself is clear (including the structure of your data). What is gene-x1, ... what is cell-state, ... ? Give an example dataset so we actually have a clue. See also http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Joris Meys Jul 27 '11 at 12:37
1 Answers
0
            What I want to do is, see if 'cell-state1' (in-)activates different genes pathways from 'cell-state2.'
This sounds like what you need is a factor-analysis. You could ask the good people of statistics.stackexchange.com about that.
 
    
    
        Bernd Elkemann
        
- 23,242
- 4
- 37
- 66
- 
                    I don't believe my question is necessarily stats but a relational db one. Maybe my question could be, 'How can I show the data related from one dataframe and relate it to another. I want to graph the cell-state data and relate it to the genes and pathways. – mccurcio Jul 27 '11 at 14:49