I have a dataframe with this structure:
> df
factor  y  x
1       2  0
1       3  0
1       1  0
1       2  0
2       3  0
2       1  0
2       3  1
3       4  1
3       3  1 
3       6  3
3       5  2
4       4  1
4       7  8
4       2  1
2       5  3
In the actual dataset, I have 200 rows and different variables: several continuous variables and a factor variable with 70 levels with up to 4 observations each.
I would like to randomly subsample my entire dataframe into 4 groups of equal size without replacements within each group exclusively in the factor variable. In other words, I would like to have each level of the factor variable occurring not more than once per group.
I've tried different solutions. For instance, I tried by sampling the "factor" variable into four groups without replacements as follows:
factor1 <- as.character(df$factor)
set.seed(123)
group1 <- sample(factor, 35,replace = FALSE) 
factor2 <- setdiff(factor1, group1) 
group2 <- sample(factor2, 35,replace = FALSE) 
# and the same for "group3" and "group4"
but then I don't know how to associate the group vectors (group1, group2, etc.) to the other variables in my df ('x' and 'y').
I've also tried with:
group1 <- sample_n(df, 35, replace = FALSE)
but this solution fails as well since my dataframe doesn't include duplicated rows. The only duplicated values are in the factor variable.
Finally, I tried to use the solution proposed in reply to a similar question here, adapted to my case:
random.groups <- function(n.items = 200L, n.groups = 4L,
                          factor = rep(1L, n.items)) {
  splitted.items  <- split(seq.int(n.items), factor)
  shuffled <- lapply(splitted.items, sample)
  1L + (order(unlist(shuffled)) %% n.groups)
}
df$groups <- random.groups(nrow(df), n.groups = 4)
However, the resulting 4 groups include duplicated values for the factor variable, so something is not working properly.
I would really appreciate any idea or suggestion to solve this problem!