I have a dataframe with many genes (the column being "gene"). Some of the genes appear more than once. I want to subset the dataframe where I only have genes that appear MORE than once. In other words, I want to REMOVE the rows that are unique in respect to the "gene" column.
            Asked
            
        
        
            Active
            
        
            Viewed 2,179 times
        
    2 Answers
4
            
            
        We can use subset with table in base R.  Get the frequency count of 'genes' with table, create a logical expression that checks the count greater than 1, retrieve those genes and use %in% to subset those genes
subset(df1, genes %in% names(which(table(genes) > 1)))
Or another option is duplicated
subset(df1, duplicated(genes)|duplicated(genes, fromLast = TRUE))
Or using dplyr
library(dplyr)
df1 %>%
   group_by(genes) %>%
   filter(n() > 1) %>%
   ungroup
        akrun
        
- 874,273
 - 37
 - 540
 - 662
 
1
            
            
        Here is another base R option, using subset + ave
subset(df, ave(gene,gene,FUN = length)>1)
        ThomasIsCoding
        
- 96,636
 - 9
 - 24
 - 81