The details of t_id and date are not described in the question so in (1) we have assumed that there are up to 3 t_id values per email and that they appear in the order corresponding to date1, date2 and date3 respectively with all other date values being NA. For example, if there are 2 t_id values for a particular email then the first one will have date1 as a date and date2 and date3 be NA. The second will have date2 as a date and date1 and date3 will be NA. In (2) we assume the same except we generalize to k instead of 3.
No packages are used.
1) Use by to split on email and then manually construct the row for each. Finally rbind the rows together.
do.call("rbind",
by(DF, DF$email, function(x) {
t_id <- c(x$t_id, NA, NA, NA)[1:3]
date <- c(na.omit(c(x$date1, x$date2, x$date3)), NA, NA, NA)[1:3]
data.frame(email = x$email[1],
t_id1 = t_id[1], date1 = date[1],
t_id2 = t_id[2], date2 = date[2],
t_id3 = t_id[3], date3 = date[3]
)
}
))
giving:
email t_id1 date1 t_id2 date2
10@line.nl 10@line.nl 100689822 2016-09-18 NA <NA>
1000988@boerman.nl 1000988@boerman.nl 100732587 2016-11-01 NA <NA>
1111tk68@net.nl 1111tk68@net.nl 100640340 2016-08-01 100641415 2016-08-02
natas@gmail.com natas@gmail.com 100678318 2016-09-05 100678319 2016-10-05
t_id3 date3
10@line.nl NA <NA>
1000988@boerman.nl NA <NA>
1111tk68@net.nl NA <NA>
natas@gmail.com NA <NA>
2) If desired we could generalize this to up to k dates and t_id values. In this case rbind/by produces a new data frame long which has k rows for each email. The first row for each email in long corresponds to the first tid and date and so on up to the kth. long is subsequently reshaped to wide.
is.date <- grepl("date", names(DF))
k <- sum(is.date)
long <- do.call("rbind",
by(DF, DF$email, function(x)
data.frame(email = x$email[1],
time = 1:k,
t_id = c(x$t_id, rep(NA, k))[1:k],
date = c(na.omit(do.call("c", x[is.date])), rep(NA, k))[1:k]
)
)
)
reshape(long, dir = "wide", idvar = "email")
giving:
email t_id.1 date.1 t_id.2 date.2 t_id.3 date.3
10@line.nl.1 10@line.nl 100689822 2016-09-18 NA <NA> NA <NA>
1000988@boerman.nl.1 1000988@boerman.nl 100732587 2016-11-01 NA <NA> NA <NA>
1111tk68@net.nl.date11 1111tk68@net.nl 100640340 2016-08-01 100641415 2016-08-02 NA <NA>
natas@gmail.com.date11 natas@gmail.com 100678318 2016-09-05 100678319 2016-10-05 NA <NA>
Note: The input DF in reproducible form was assumed to be:
Lines <- "t_id date1 date2 date3 email
100678318 2016-09-05 <NA> <NA> natas@gmail.com
100678319 <NA> 2016-10-05 <NA> natas@gmail.com
100732587 2016-11-01 <NA> <NA> 1000988@boerman.nl
100689822 2016-09-18 <NA> <NA> 10@line.nl
100640340 2016-08-01 <NA> <NA> 1111tk68@net.nl
100641415 <NA> 2016-08-02 <NA> 1111tk68@net.nl"
DF <- transform(read.table(text = Lines, header = TRUE, na.strings = "<NA>"),
date1 = as.Date(date1),
date2 = as.Date(date2),
date3 = as.Date(date3))