I have big data frame and I want to filter columns of it. Basically I want to keep the columns whose entries are larger than k in N% of the rows. Can someone help me to do this in R ? I'm new in R.
            Asked
            
        
        
            Active
            
        
            Viewed 275 times
        
    -2
            
            
        - 
                    Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap Jun 04 '16 at 17:17
1 Answers
3
            
            
        Its good to have a reproducible example.
I will use the data diamonds as an illustration
data(diamonds)
keepCol <- function(df, K, N){
  # df: data.fram
  # K: Threshold value
  # N: % criteria
 # how many rows are in the data.frame
 cntRows <- dim(df)[1]
 # how many should fullfill the criteria (N%)
 N <- N*cntRows
 # Get the class of each column
 colClass <- lapply(df, class) %>% unlist
 # keep those that are numeric
 colNames <- names(colClass[colClass=="numeric"])
 df <- df[, colNames]
 # How many case of each numeric column fullfill your criteria (are > then K)
 keepCol <- (apply(df, 2, function(x) sum(x>K))>N)
 # Keep only those columns
 df <- df[, names(keepCol[keepCol==T])]
 return(df)
}
keepCol(diamonds, K=4, N=0.2)
 
    
    
        dimitris_ps
        
- 5,849
- 3
- 29
- 55
