I have a number of large datasets with ~10 columns, and ~200000 rows. Not all columns contain values for each row, although at least one column must contain a value for the row to be present, I would like to set a threshold for how many NAs are allowed in a row.
My Dataframe looks something like this:
 ID q  r  s  t  u  v  w  x  y  z
 A  1  5  NA 3  8  9  NA 8  6  4
 B  5  NA 4  6  1  9  7  4  9  3 
 C  NA 9  4  NA 4  8  4  NA 5  NA
 D  2  2  6  8  4  NA 3  7  1  32 
And I would like to be able to delete the rows that contain more than 2 cells containing NA to get
ID q  r  s  t  u  v  w  x  y  z
 A 1  5  NA 3  8  9  NA 8  6  4
 B 5  NA 4  6  1  9  7  4  9  3 
 D 2  2  6  8  4  NA 3  7  1  32 
complete.cases removes all rows containing any NA, and I know one can delete rows that contain NA in certain columns but is there a way to modify it so that it is non-specific about which columns contain NA, but how many of the total do?
Alternatively, this dataframe is generated by merging several dataframes using
    file1<-read.delim("~/file1.txt")
    file2<-read.delim(file=args[1])
    file1<-merge(file1,file2,by="chr.pos",all=TRUE)
Perhaps the merge function could be altered?
Thanks
 
     
     
     
     
    