Follow-up to the igraph solution: Break Down Codes
I don't want my igraph answer too long to read, so I created a new answer to elaborate how the that solution works.
Dummy Example
For simplicity, we can start with a smaller dummy example, and we assume that our objective is to find the maximal number of Sites that share at least 3 years.
set.seed(0)
DF <- data.frame(
Sites = rep(1:10, each = 10),
Years = sample(2000:2020, 100, replace = TRUE)
) %>%
unique()
and the visualization of DF in a graph looks like can be done by
DF %>%
graph_from_data_frame() %>%
set_vertex_attr(name = "color", value = names(V(.)) %in% DF$Sites) %>%
plot()

Steps
1). As we can see, the graph can be treated as a bipartite graph (Sites and Years are two different types of vertices) and the shared Years actually can be projected to edges. Since we need to keep track of the number of shared Years, we can use an edge attribute "weight" to identify counts of shared Years. In that case, an adjacency matrix of Sites is required before the projection and it is obtained by table + tcrossprod, e.g.,
adjmat <- DF %>%
table() %>%
tcrossprod()
which gives
> adjmat
Sites
Sites 1 2 3 4 5 6 7 8 9 10
1 8 2 3 0 4 4 4 2 5 3
2 2 7 5 2 3 1 2 3 2 4
3 3 5 9 4 4 3 2 3 2 4
4 0 2 4 6 3 2 1 2 1 3
5 4 3 4 3 9 5 4 4 4 4
6 4 1 3 2 5 7 2 3 2 3
7 4 2 2 1 4 2 9 4 3 4
8 2 3 3 2 4 3 4 9 2 5
9 5 2 2 1 4 2 3 2 7 4
10 3 4 4 3 4 3 4 5 4 9
2). As stated in the objective, we want to find out the groups that have >=3 shared years, that means, the "weight" of edges should be at least 3. On top of the obtained adjacency matrix adjmat in step 1), we can further apply a filter (>=3) to simplify the matrix, which is equivalent to pruning the network, i.e.,
g <- adjmat %>%
`>=`(3) %>%
graph_from_adjacency_matrix(mode = "undirected", diag = FALSE)
and plot(g) shows the project like below

3). We have known the fact that, the Sites that share the same Years should produce a complete subgraph, namely, clique. Thus, we hereby can enumerate all cliques in graph g, which is done by the function cliques(), i.e.,
clq <- g %>%
cliques(min = 2)
where min = 2 specifies the minimal size of each cliques. If you have no prior information about the size, it is also fine to use cliques() without any additional arguments. Now, clq is a list which looks like
> clq
[[1]]
+ 2/10 vertices, named, from 3b71441:
[1] 5 10
[[2]]
+ 2/10 vertices, named, from 3b71441:
[1] 3 5
[[3]]
+ 3/10 vertices, named, from 3b71441:
[1] 3 5 10
[[4]]
+ 2/10 vertices, named, from 3b71441:
[1] 3 10
[[5]]
+ 2/10 vertices, named, from 3b71441:
[1] 5 7
[[6]]
+ 3/10 vertices, named, from 3b71441:
[1] 5 7 10
[[7]]
+ 2/10 vertices, named, from 3b71441:
[1] 7 10
[[8]]
+ 2/10 vertices, named, from 3b71441:
[1] 7 8
[[9]]
+ 3/10 vertices, named, from 3b71441:
[1] 5 7 8
[[10]]
+ 4/10 vertices, named, from 3b71441:
[1] 5 7 8 10
[[11]]
+ 3/10 vertices, named, from 3b71441:
[1] 7 8 10
[[12]]
+ 2/10 vertices, named, from 3b71441:
[1] 3 8
[[13]]
+ 3/10 vertices, named, from 3b71441:
[1] 3 5 8
[[14]]
+ 4/10 vertices, named, from 3b71441:
[1] 3 5 8 10
[[15]]
+ 3/10 vertices, named, from 3b71441:
[1] 3 8 10
[[16]]
+ 2/10 vertices, named, from 3b71441:
[1] 5 8
[[17]]
+ 3/10 vertices, named, from 3b71441:
[1] 5 8 10
[[18]]
+ 2/10 vertices, named, from 3b71441:
[1] 8 10
[[19]]
+ 2/10 vertices, named, from 3b71441:
[1] 1 7
[[20]]
+ 3/10 vertices, named, from 3b71441:
[1] 1 5 7
[[21]]
+ 4/10 vertices, named, from 3b71441:
[1] 1 5 7 10
[[22]]
+ 3/10 vertices, named, from 3b71441:
[1] 1 7 10
[[23]]
+ 2/10 vertices, named, from 3b71441:
[1] 1 3
[[24]]
+ 3/10 vertices, named, from 3b71441:
[1] 1 3 5
[[25]]
+ 4/10 vertices, named, from 3b71441:
[1] 1 3 5 10
[[26]]
+ 3/10 vertices, named, from 3b71441:
[1] 1 3 10
[[27]]
+ 2/10 vertices, named, from 3b71441:
[1] 1 5
[[28]]
+ 3/10 vertices, named, from 3b71441:
[1] 1 5 10
[[29]]
+ 2/10 vertices, named, from 3b71441:
[1] 1 10
[[30]]
+ 2/10 vertices, named, from 3b71441:
[1] 3 4
[[31]]
+ 3/10 vertices, named, from 3b71441:
[1] 3 4 5
[[32]]
+ 4/10 vertices, named, from 3b71441:
[1] 3 4 5 10
[[33]]
+ 3/10 vertices, named, from 3b71441:
[1] 3 4 10
[[34]]
+ 2/10 vertices, named, from 3b71441:
[1] 4 5
[[35]]
+ 3/10 vertices, named, from 3b71441:
[1] 4 5 10
[[36]]
+ 2/10 vertices, named, from 3b71441:
[1] 4 10
[[37]]
+ 2/10 vertices, named, from 3b71441:
[1] 1 9
[[38]]
+ 3/10 vertices, named, from 3b71441:
[1] 1 7 9
[[39]]
+ 4/10 vertices, named, from 3b71441:
[1] 1 5 7 9
[[40]]
+ 5/10 vertices, named, from 3b71441:
[1] 1 5 7 9 10
[[41]]
+ 4/10 vertices, named, from 3b71441:
[1] 1 7 9 10
[[42]]
+ 3/10 vertices, named, from 3b71441:
[1] 1 5 9
[[43]]
+ 4/10 vertices, named, from 3b71441:
[1] 1 5 9 10
[[44]]
+ 3/10 vertices, named, from 3b71441:
[1] 1 9 10
[[45]]
+ 2/10 vertices, named, from 3b71441:
[1] 7 9
[[46]]
+ 3/10 vertices, named, from 3b71441:
[1] 5 7 9
[[47]]
+ 4/10 vertices, named, from 3b71441:
[1] 5 7 9 10
[[48]]
+ 3/10 vertices, named, from 3b71441:
[1] 7 9 10
[[49]]
+ 2/10 vertices, named, from 3b71441:
[1] 5 9
[[50]]
+ 3/10 vertices, named, from 3b71441:
[1] 5 9 10
[[51]]
+ 2/10 vertices, named, from 3b71441:
[1] 9 10
[[52]]
+ 2/10 vertices, named, from 3b71441:
[1] 1 6
[[53]]
+ 3/10 vertices, named, from 3b71441:
[1] 1 3 6
[[54]]
+ 4/10 vertices, named, from 3b71441:
[1] 1 3 5 6
[[55]]
+ 5/10 vertices, named, from 3b71441:
[1] 1 3 5 6 10
[[56]]
+ 4/10 vertices, named, from 3b71441:
[1] 1 3 6 10
[[57]]
+ 3/10 vertices, named, from 3b71441:
[1] 1 5 6
[[58]]
+ 4/10 vertices, named, from 3b71441:
[1] 1 5 6 10
[[59]]
+ 3/10 vertices, named, from 3b71441:
[1] 1 6 10
[[60]]
+ 2/10 vertices, named, from 3b71441:
[1] 6 8
[[61]]
+ 3/10 vertices, named, from 3b71441:
[1] 3 6 8
[[62]]
+ 4/10 vertices, named, from 3b71441:
[1] 3 5 6 8
[[63]]
+ 5/10 vertices, named, from 3b71441:
[1] 3 5 6 8 10
[[64]]
+ 4/10 vertices, named, from 3b71441:
[1] 3 6 8 10
[[65]]
+ 3/10 vertices, named, from 3b71441:
[1] 5 6 8
[[66]]
+ 4/10 vertices, named, from 3b71441:
[1] 5 6 8 10
[[67]]
+ 3/10 vertices, named, from 3b71441:
[1] 6 8 10
[[68]]
+ 2/10 vertices, named, from 3b71441:
[1] 3 6
[[69]]
+ 3/10 vertices, named, from 3b71441:
[1] 3 5 6
[[70]]
+ 4/10 vertices, named, from 3b71441:
[1] 3 5 6 10
[[71]]
+ 3/10 vertices, named, from 3b71441:
[1] 3 6 10
[[72]]
+ 2/10 vertices, named, from 3b71441:
[1] 5 6
[[73]]
+ 3/10 vertices, named, from 3b71441:
[1] 5 6 10
[[74]]
+ 2/10 vertices, named, from 3b71441:
[1] 6 10
[[75]]
+ 2/10 vertices, named, from 3b71441:
[1] 2 8
[[76]]
+ 3/10 vertices, named, from 3b71441:
[1] 2 3 8
[[77]]
+ 4/10 vertices, named, from 3b71441:
[1] 2 3 5 8
[[78]]
+ 5/10 vertices, named, from 3b71441:
[1] 2 3 5 8 10
[[79]]
+ 4/10 vertices, named, from 3b71441:
[1] 2 3 8 10
[[80]]
+ 3/10 vertices, named, from 3b71441:
[1] 2 5 8
[[81]]
+ 4/10 vertices, named, from 3b71441:
[1] 2 5 8 10
[[82]]
+ 3/10 vertices, named, from 3b71441:
[1] 2 8 10
[[83]]
+ 2/10 vertices, named, from 3b71441:
[1] 2 3
[[84]]
+ 3/10 vertices, named, from 3b71441:
[1] 2 3 5
[[85]]
+ 4/10 vertices, named, from 3b71441:
[1] 2 3 5 10
[[86]]
+ 3/10 vertices, named, from 3b71441:
[1] 2 3 10
[[87]]
+ 2/10 vertices, named, from 3b71441:
[1] 2 5
[[88]]
+ 3/10 vertices, named, from 3b71441:
[1] 2 5 10
[[89]]
+ 2/10 vertices, named, from 3b71441:
[1] 2 10
4). It should be noted that, not all of the cliques meet >=3 requirement in terms of shared Years since the "weight" indicates the count of total shared Years, instead of the count of distinguished shared Years. In other words, 2000, 2001, 2002 is valid but 2000, 2000, 2002 is not, although the latter has a count of 3. We therefore need to check the distribution of shared Years in each clique.
For example, if we look into the third clique, i.e., clq[[3]], and check the distribution of shared Years
> q <- clq[[3]]
> table(subset(DF, Sites %in% names(q)))
Years
Sites 2000 2001 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
3 1 1 1 0 0 0 1 1 0 1 0 1 1 0
5 0 1 0 1 1 1 0 1 0 1 1 1 0 0
10 1 0 0 0 1 1 0 1 1 1 0 0 1 1
Years
Sites 2018 2019 2020
3 0 1 0
5 1 0 0
10 0 0 1
we see that columns with all 1s are 2009 and 2011, which means they have only 2 shared Years and thus invalid. To pick the valid cliques, we can use Filter with a filtering criteria, e.g.,
validclq <- clq %>%
Filter(
\(q) {
sum(table(with(DF, Years[Sites %in% names(q)])) == length(q)) >= 3
}, .
)
and we will see
> validclq
[[1]]
+ 2/10 vertices, named, from 9bd9430:
[1] 5 10
[[2]]
+ 2/10 vertices, named, from 9bd9430:
[1] 3 5
[[3]]
+ 2/10 vertices, named, from 9bd9430:
[1] 3 10
[[4]]
+ 2/10 vertices, named, from 9bd9430:
[1] 5 7
[[5]]
+ 2/10 vertices, named, from 9bd9430:
[1] 7 10
[[6]]
+ 2/10 vertices, named, from 9bd9430:
[1] 7 8
[[7]]
+ 2/10 vertices, named, from 9bd9430:
[1] 3 8
[[8]]
+ 2/10 vertices, named, from 9bd9430:
[1] 5 8
[[9]]
+ 2/10 vertices, named, from 9bd9430:
[1] 8 10
[[10]]
+ 2/10 vertices, named, from 9bd9430:
[1] 1 7
[[11]]
+ 2/10 vertices, named, from 9bd9430:
[1] 1 3
[[12]]
+ 2/10 vertices, named, from 9bd9430:
[1] 1 5
[[13]]
+ 2/10 vertices, named, from 9bd9430:
[1] 1 10
[[14]]
+ 2/10 vertices, named, from 9bd9430:
[1] 3 4
[[15]]
+ 3/10 vertices, named, from 9bd9430:
[1] 3 4 10
[[16]]
+ 2/10 vertices, named, from 9bd9430:
[1] 4 5
[[17]]
+ 2/10 vertices, named, from 9bd9430:
[1] 4 10
[[18]]
+ 2/10 vertices, named, from 9bd9430:
[1] 1 9
[[19]]
+ 3/10 vertices, named, from 9bd9430:
[1] 1 5 9
[[20]]
+ 2/10 vertices, named, from 9bd9430:
[1] 7 9
[[21]]
+ 3/10 vertices, named, from 9bd9430:
[1] 7 9 10
[[22]]
+ 2/10 vertices, named, from 9bd9430:
[1] 5 9
[[23]]
+ 2/10 vertices, named, from 9bd9430:
[1] 9 10
[[24]]
+ 2/10 vertices, named, from 9bd9430:
[1] 1 6
[[25]]
+ 2/10 vertices, named, from 9bd9430:
[1] 6 8
[[26]]
+ 2/10 vertices, named, from 9bd9430:
[1] 3 6
[[27]]
+ 2/10 vertices, named, from 9bd9430:
[1] 5 6
[[28]]
+ 2/10 vertices, named, from 9bd9430:
[1] 6 10
[[29]]
+ 2/10 vertices, named, from 9bd9430:
[1] 2 8
[[30]]
+ 2/10 vertices, named, from 9bd9430:
[1] 2 3
[[31]]
+ 2/10 vertices, named, from 9bd9430:
[1] 2 5
[[32]]
+ 2/10 vertices, named, from 9bd9430:
[1] 2 10
where 32 out of 89 cliques meets the requirement.
5). As the last step, we select the cliques that have the largest size (since we are looking for the maximal number of Sites) from the valid cliques validclq. Using lenghts to obtain the size of each clique, we can filter the ones with the largest size, e.g.,
res <- validclq %>%
`[`(lengths(.) == max(lengths(.)))
and we finally obtained
[[1]]
+ 3/10 vertices, named, from 75803c6:
[1] 3 4 10
[[2]]
+ 3/10 vertices, named, from 75803c6:
[1] 1 5 9
[[3]]
+ 3/10 vertices, named, from 75803c6:
[1] 7 9 10