I know how to do this the long way, but I know there is a shorter and simpler solution in R. I have two dataframes: "tpm" which has column names of sample IDs, rownames of genes, and values as TPMs and "mani" which has sample IDs in the "sample" column and mutations in the "mutation" column. I want to filter the "tpm" dataframe for genes that are expressed at >= 5 TPMs in 30% of samples in at least 1 of the 20 mutations.
Input:
tpm df[18,000 x 1500]
Gene    Sample A        Sample B        Sample C        Sample D        Sample E ... 
6kbHsap 5               10              2               0               2
ACRO1   0               0               3               4               5
ALINE   0               0               2              10               1
ALR     7               1               21              1               0
...
mani df[1500 x 2]
    sample         mutation
1   Sample A       X
2   Sample B       X
3   Sample C       X
4   Sample D       Y
4   Sample E       X
...
Result:
tpm df[10,000 x 1500]
Gene    Sample A        Sample B        Sample C        Sample D        Sample E ... 
6kbHsap 5               10              2               0               2
ALINE   0               0               2              10               1
ALR     7               1               21              1               0
...
How could I do this in as few lines of code as possible?
