I am a new R user and this is my first question submission (hopefully in compliance with the protocol).
I have a data frame with two columns.
df <- data.frame(v1 = c("A", "A", "B", "B", "B", "B", "C", "D", "D", "E" )) 
dfc <- df %>% count(v1)
df$n <- with(dfc, n[match(df$v1,v1)])
   v1 n  
1   A 2
2   A 2
3   B 4
4   B 4
5   B 4
6   B 4
7   C 1
8   D 2
9   D 2
10  E 1
I want to delete rows that exceed a threshold of 3 occurrences for a value in v1. All rows for that value less than the threshold are retained. In this example I want to delete row 6 and retain all remaining rows in a subset data frame.
The result would include the following values for v1:
  v1
1  A
2  A
3  B
4  B
5  B
6  C
7  D
8  D
9  E
Row 6 would have been deleted because it was the 4th occurrence of "B", but the 3 previous rows for "B" have been retained.
I have read multiple posts that demonstrate how to remove ALL rows for a variable with row totals less/greater than a cumulative frequency value, such as 4. For example, I have tried:
df1 <- df %>%
  group_by(v1) %>%
  filter(n() < 4)
This approach keeps only the rows where all unique occurrences of V1 are < 4. 6 rows are subset.
df2 <- df %>%
  group_by(v1) %>%
  filter(n() > 3)
This approach keeps only the rows where all unique occurrences of v1 are > 3. 4 rows are subset.
df4 <- subset(df, v1 %in% names(table(df$v1))[table(df$v1) <4])
This approach has the same result as the first approach.
None of these methods produce the result I need.
As previously stated, I need to retain the first three rows where v1="B" and only delete rows if there are > 3 occurrences of that value.
Because I am new to R, it's possible I am overlooking a very simple solution. Any suggestions would be greatly appreciated.
Thanks.
 
     
     
    