I have a codon usage table (http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=10029&aa=1&style=GCG). I would like to generate a vector of most used codons (1 for each amino acid residue). There are 20 naturally occurring AmAcids + stop codon (End), so my vector length will be 21. I've tryed using grep, but it takes only one pattern at a time, or searches for all patterns which doesn't help. Is there a way of doing this avoiding a loop?
            Asked
            
        
        
            Active
            
        
            Viewed 241 times
        
    -1
            
            
        - 
                    1What does your input data look like? Do you already know the correct open reading frame? Please create a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) that clearly shows some sample input and desired output. – MrFlick Jul 03 '14 at 06:55
- 
                    The input is the webpage I added. I could essentially generate a text file with the data but the webpage would be even better because switching to a different codon usage table would mean just pasting a different web address into the parser. – biomiha Jul 03 '14 at 12:15
1 Answers
0
            Here's what I think you would like to do. You can use the package XML to read the data and then dplyr to calculate the maximum. 
# load packages
require(XML)
require(dplyr)
# read the table
tt <- htmlParse('http://www.kazusa.or.jp/codon/cgi-bin/showcodon.cgi?species=10029&aa=1&style=GCG')
df <- read.table(text=xpathSApply(tt, "//pre", xmlValue), 
                 header=TRUE, 
                 fill=TRUE)
# calculate the maximum codons by Amino Acid
df.max <- group_by(df, AmAcid) %.% 
  filter(Number==max(Number)) %.% 
  select(AmAcid, Codon)
The result is then a data.frame with 21 rows. You can access the column Codon if you want to get a vector. 
 
    
    
        shadow
        
- 21,823
- 4
- 63
- 77
