I have a data-frame with several missing values (NAs), grouped into a number of groups (A,B,C,D,E,F) in a column named Group. I am trying to analyse it in R.
I want to tabulate the number of rows/records belonging to each group, or better still the proportion of rows/records within each group (ie out of the total number of rows/records within that group), which have: >= 1 NA (i.e. at least 1 missing val), >=2 NAs, >=3 NAs, >=4 NAs, so forth (up to n NAs, which I would predefine).
I want to do it with just base R and dplyr. I guess a solution would start with df %>% group_by(Group) but I'm not sure where to go from there.
Any ideas?
Edit: Let's take iris for data:
iris0 <- iris
set.seed(101)
iris0[sample(150,40),1] <- NA
iris0[sample(150,40),2] <- NA
iris0[sample(150,40),3] <- NA
iris0[sample(150,40),4] <- NA
At this point I would like (after supplying say max_n=7) a table somewhat like the following
                1 NAs 2 NAs 3 NAs 4 NAs
1   setosa      30    13    3   0   0   0   0
2   versicolor  40    18    5   0   0   0   0
3   virginica   36    13    2   0   0   0   0
4   Total      106    44    10  0   0   0   0
5   (%)       70.67 29.33   6.67    0.00    0.00    0.00    0.00
Actually I came up with a method that does it using rowsums and lapply but I wonder if anyone can help with a more efficient method (using base R and dplyr) e.g. with aggregate or something. Unboxing the lapply results into a table like the above is also a major pain.
Also, better still would be something that divides the numbers by the total records in each class (for the proportion in each class).
 
    