So I have a large dataset with 89 variables where multiple are unique identifiers attributing data in a relational DB. I want to see the frequency of unique identifiers as cross referenced by a second variable which is a factor?
i.e. this does not work but is how I think would work -
length(unique(data$PID ~ data$ICD_grouping)
returning a table like
ICD_grouping        unique.PID
C43                   5
C47/C49               1
C50                   2
C56                   1
C57-C58               1
C80                   1
Sample data
 PID ICD_Grouping
1     1          C80
2   918          C43
3   919          C43
4   919          C43
5  1284             
6  1285             
7   550          C43
8   550          C43
9   550          C43
10  550          C50
11  920          C43
12  920          C43
13  921          C50
14  921          C56
15  921       C57-58
16  921       C57-58
17  549          C43
18  549          C43
19  922       C47/49
20  551          C43
 
    