I frequently work with data frames and have to run some sophisticated data wrangling / manipulations by subgroup that is defined in one of the columns. I am aware of dplyr and group_by and know that many things could be solved using group_by. However, often I have to do some pretty intricate calculations and end up just using the 'for' loop.
I was wondering about the existence of some other general approach or paradigm that is faster/more elegant. Maybe map (that I am not very familiar with)?
Below is an example. Notice - it is fake and meaningless. So let's ignore why I need to do those things or the fact that there could be 2 consequtive NAs in a column, etc. That's not the focus of my question. The point is that often I have to operate "within the constraints of a subgroup" and then - inside that subgroup - I have to do operations columnwise, rowwise and sometimes even cellwise.
I also realize that I could probably put most of that code inside a function, split my data frame into a list based on 'group', apply this function to each element of that list and then do.call(rbind...) at the end. But is this the only way?
Thanks a lot for any hints!
library(dplyr)
library(forcats)
set.seed(123)
x <- tibble(group = c(rep('a', 10), rep('b', 10), rep('c', 10)),
                attrib = c(sample(c("one", "two", "three", "four"), 10, replace = T),
                           sample(c("one", "two", "three"), 10, replace = T),
                           sample(c("one", "three", "four"), 10, replace = T)),
                v1 = sample(c(1:5, NA), 30, replace = T),
                v2 = sample(c(1:5, NA), 30, replace = T),
                v3 = sample(c(1:5, NA), 30, replace = T),
                n1 = abs(rnorm(30)), n2 = abs(rnorm(30)), n3 = abs(rnorm(30)))
v_vars = paste0("v", 1:3)
n_vars = paste0("n", 1:3)
results <- NULL  # Placeholder for final results
for(i in seq(length(unique(x$group)))) { # loop through groups
  mygroup <- unique(x$group)[i]
  mysubtable <- x %>% filter(group == mygroup)
  # IMPUTE NAs in v columns
  # Replace every NA with a mean of values above and below it; and if it's the first or 
  # the last value, with the mean of 2 values below or above it.
  for (v in v_vars){  # loop through v columns
    which_nas <- which(is.na(mysubtable[[v]])) # create index of NAs for column v
    if (length(which_nas) == 0) next else {
      for (na in which_nas) { # loop through indexes of column values that are NAs
        if (na == 1) {
          mysubtable[[v]][na] <- mean(c(mysubtable[[v]][na + 1], 
                                      mysubtable[[v]][na + 2]), na.rm = TRUE)
        } else if (na == nrow(mysubtable)) {
          mysubtable[[v]][na] <- mean(c(mysubtable[[v]][na - 2],
                                      mysubtable[[v]][na - 1]), na.rm = TRUE)
        } else {
          mysubtable[[v]][na] <- mean(c(mysubtable[[v]][na - 1], 
                                      mysubtable[[v]][na + 1]), na.rm = TRUE)
        }
      } # end of loop through NA indexes
    } # end of else
  } # end of loop through v vars
  # Aggregate v columns (mean) for each value of column 'attrib'
  result1 <- mysubtable %>% group_by(attrib) %>% 
    summarize_at(v_vars, mean)
  # Aggregate n columns (sum) for each value of column 'attrib'
  result2 <- mysubtable %>% group_by(attrib) %>% 
    summarize_at(n_vars, sum)
  # final result should contain the name of the group
  results[[i]] <- cbind(mygroup, result1, result2[-1])
}
results <- do.call(rbind, results)