I have been able to create a table with the median mean and sd, based on several conditions, using the dplry package. To do so I prepped the data. It looks a bit sloppy to me and I am using a big dataset (>3gb) so I have been using data.table already.
I am wondering how I can create a similar table more efficiently using the data.table package, so, if it is possible to do this without prepping the data too much. That would be great as I need to do this for many other conditions as well. Btw, I checked out this post but it didn't add conditions.
This is what I have and need:
I need the mean of con by id and date= 1 year since first i or i2 = A04 or A01.
EDIT (because it wasn't running before):
DATA:
DT <- structure(list(id = c(123L, 123L, 332L, 332L, 332L, 100L, 100L, 113L, 113L, 113L, 113L, 551L, 551L),
i = c("D95", "F85", "A01", "A04", "K20", "B10", "A04", "T08", "P28", "D95", "A04", "B12", "D95"),
i2 = c("F15", "", "", "", "", "", "", "", "", "A01", "", "A01", ""),
date = c("2015-06-19", "2016-08-15", "2013-03-16", "2017-01-17", "2013-01-16", "2009-05-08", "2011-04-03", "2015-05-04",
"2011-04-04", "2017-08-04", "2011-05-24", "2013-11-04", "2013-05-04"),
con = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"),
PS = c("0", "0", "1", "0", "0", "0", "1", "0", "0", "0", "1", "1", "0")),
class = "data.frame", row.names = c(NA, -13L))
The following is what I have done so far (and which works):
I created the columns PS (= having either A01 or A04 in i or i2) and ds(=days since first A01 or A01 (days aren't correct here)) based on the data.
With the dplyr package I first made a data.table with only rows with only ds between -365&0.
j.ds <- subset(DT, ds >= -365 & ds < 0)
Than agregated to a table with sum of con per id, like so:
j <- j.ds %>%
group_by(id) %>%
summarise(con = sum(con))
From there I made the desired table:
jP <- j %>%
summarise(median = median(con), mean = mean(con), SD = sd(con))