This is the toy version of my real dataframe:
df <- data.frame(
  sample = c("s1", "s1", "s1", "s2", "s2", "s2", "s1",  "s3", "s4"),
  snp = c("snp1", "snp1", "snp1", "snp1", "snp1", "snp1", "snp2", "snp2", "snp2"),
  random_column = 1:9
)
I'm interested in counting the number of unique sample-snp pairs and return that value to each row. In this case: s1 and s2 have snp1 (so size should be 2 for all the duplicate rows, 1-6), and s1, s3 and s4 have snp2 (so size should be 3 for rows 7-9). This would be the expected output:
  sample random   snp  size
   (chr)  (int) (chr) (int)
1     s1      1  snp1     2
2     s1      2  snp1     2
3     s1      3  snp1     2
4     s2      4  snp1     2
5     s2      5  snp1     2
6     s2      6  snp1     2
7     s1      7  snp2     3
8     s3      8  snp2     3
9     s4      8  snp2     3
I guess I could do this and then some type of left-join, but I'm wondering if there is an easier way:
df[!duplicated(df[,c('sample','snp')]),] %>% group_by(snp) %>% summarize(size = n())