I am still new to data.table. My question is similar to this one and this one. The difference is that I want to compute weighted means for multiple variables, by groups, but using more than one weight for each mean.
Consider the following data.table (the actual is much larger):
library(data.table)
set.seed(123456)
mydata <- data.table(CLID = rep("CNK", 10),
ITNUM = rep(c("First", "Second", "First", "First", "Second"), 2),
SATS = rep(c("Always", "Amost always", "Sometimes", "Never", "Always"), 2),
ASSETS = rep(c("0-10", "11-25", "26-100", "101-200", "MORE THAN 200"), 2),
AVGVALUE1 = rnorm(10, 10, 2),
AVGVALUE2 = rnorm(10, 10, 2),
WGT1 = rnorm(10, 3, 1),
WGT2 = rnorm(10, 3, 1),
WGT3 = rnorm(10, 3, 1))
#I set the key of the table to the variables I want to group by,
#so the output is sorted
setkeyv(mydata, c("CLID", "ITNUM", "SATS", "ASSETS"))
What I want to achieve is to compute the weighted means for AVGVALUE1 and AVGVALUE2 (and possibly more variables) by groups defined by ITNUM, SATS, ASSETS using each one of the weight variables WGT1, WGT2, WGT3 (and possibly more of them). So for each of the variables I want to compute the weighted means I will have three weighted means by groups (or whatever the number of weights is).
I can do it for each variable separately, e.g.:
all.weights <- c("WGT1", "WGT2", "WGT3")
avg.var <- "AVGVALUE1"
split.vars <- c("ITNUM", "SATS", "ASSETS")
mydata[ , Map(f = weighted.mean, x = .(get(avg.var)), w = mget(all.weights),
na.rm = TRUE), by = c(key(mydata)[1], split.vars)]
I add the first key variable in by, although it is a constant, because I would like to have it as a column in the output. And I get:
CLID ITNUM SATS ASSETS V1 V2 V3
1: CNK First Always 0-10 11.66824 11.66819 11.66829
2: CNK First Never 101-200 11.37378 12.21008 11.60182
3: CNK First Sometimes 26-100 12.43004 13.13450 12.01330
4: CNK Second Always MORE THAN 200 12.32265 11.81613 12.56786
5: CNK Second Amost always 11-25 10.76556 11.34669 10.52458
However, with the actual data.table, where I have much more columns to compute the weighted means for (as well as much more weights to use), it would be rather cumberosme to do it one by one. What I imagine is a function where the mean for each of the variables (AVGVALUE1, AVGVALUE2 and so on) is computed with each of the weight variables (WGT1, WGT2, WGT3 and so on) and the output for each variable for which the weighted mean is computed is added to a list. I guess the list would be the best option because if all estimates are in the same output, the number of columns might be endless. So something like this:
[[1]]
CLID ITNUM SATS ASSETS V1 V2 V3
1: CNK First Always 0-10 11.66824 11.66819 11.66829
2: CNK First Never 101-200 11.37378 12.21008 11.60182
3: CNK First Sometimes 26-100 12.43004 13.13450 12.01330
4: CNK Second Always MORE THAN 200 12.32265 11.81613 12.56786
5: CNK Second Amost always 11-25 10.76556 11.34669 10.52458
[[2]]
CLID ITNUM SATS ASSETS V1 V2 V3
1: CNK First Always 0-10 9.132899 9.060045 9.197005
2: CNK First Never 101-200 12.896584 13.278680 13.000772
3: CNK First Sometimes 26-100 10.972260 11.215390 10.828431
4: CNK Second Always MORE THAN 200 11.704404 11.611072 11.749586
5: CNK Second Amost always 11-25 8.086409 8.225030 8.028928
What I tried so far:
Using
lapplyall.weights <- c("WGT1", "WGT2", "WGT3") avg.vars <- c("AVGVALUE1", "AVGVALUE2") split.vars <- c("ITNUM", "SATS", "ASSETS") lapply(mydata, function(i) { mydata[ , Map(f = weighted.mean, x = mget(avg.vars)[i], w = mget(all.weights), na.rm = TRUE), by = c(key(mydata)[1], split.vars)] }) Error in weighted.mean.default(x = dots[[1L]][[1L]], w = dots[[2L]][[1L]], : 'x' and 'w' must have the same lengthUsing
mapplymyfun <- function(data, spl.v, avg.v, wgts) { data[ , Map(f = weighted.mean, x = mget(avg.v), w = mget(all.weights), na.rm = TRUE), by = c(key(data)[1], spl.v)] } mapply(FUN = myfun, data = mydata, spl.v = split.vars, avg.v = avg.vars, wgts = all.weights) Error: value for ‘AVGVALUE2’ not found
I tried to wrap the mget(avg.v) as a list - .(mget(avg.v)), but then get another error:
Error in mapply(FUN = f, ..., SIMPLIFY = FALSE) :
could not find function "."
Can someone help?