A method which uses only base R and seems relatively straightforward could look like:
# ensure that ActivityType is a factor
dt$ActivityType <- factor(dt$ActivityType)
# convert character dates to Date types
dt$ActivityDate <- as.Date(dt$ActivityDate, "%m/%d/%Y")
# ensure that dt is ordered by Name and ActivityDate
dt <- dt[order(dt$Name, dt$ActivityDate),]
# for each ActivityDate and Name, count the number of instances within a 21 day window for each ActivityType
dt <- cbind(dt, t(sapply(split(dt, list(dt$ActivityDate, dt$Name)),
FUN=function(y) { z= (dt$Name == y$Name & dt$ActivityDate <= y$ActivityDate & dt$ActivityDate > (y$ActivityDate -21));
sapply(levels(dt$ActivityType), FUN=function(x) sum(z & dt$ActivityType == x )) } )) )
This version is somewhat faster and a bit simpler than the previous version.
EXPLANATION
I'll try to answer your question by going through the steps in the last statement although this makes the answer a bit long. I thought of dt in your problem as providing two sets of overlappoing data. First the Name and ActivityDates columns provide the set of names and dates for which results are required. Second the Name, ActivityDates, and ActivityTypes columns provide the data for computing the results. So, the code does the following
sapply with split sets up the loop over all Name-ActvityDates pairs. In ths case, split breaks dt into a list where each element of the list is a data frame containing a row of dt. The code uses sapply-split since it passes the row as a data frame with all columns to its FUN argument. Rather than use a named function from an R package, the code defines a function using the usual function definition format i.e. function(argument) { statements using arguments }. In this case, the function has no name and is called an anonymous function.
- Now that we have set up the loop over
Name-AcivityDates pairs, we will think of dt in the second way described above, as the source of data used to compute the results required for each row of dt. The function function(y) where y is the name of the argument for this function does this. The first statement in function(y) identifies all rows of dt which match the Name and the ActivityDate range of the input row y (a single-row data frame) where y$Name is the Name column and y$ActivityDate is the ActivityDate for the ActivityDate range we are trying to match. The results of this matching process are stored in z as a logical vector where a value of TRUE in the i-th element of z indicates that the i-th row of dt matches y while a value of FALSE means no match. z may contain matches for multiple values of ActivityType but it's important for the performance of the code that finding the rows of dt which match range defined by ActivityDates only be done once per row of dt.
- The second statement of
function(y) counts the number of matches in z for each value of ActivityType. sapply loops over the unique values of ActivityType (e.g. Email and webinar), passes each value to the anonymous function function(x), finds the rows in dt which match x, and forms a logical vector with z containing only the matches for that Name, AcivityType, and range of dates. z is available to function(x) since it was defined in function(y), the parent environment of function(x). sum acting on a logical vector counts the number of TRUE elements which gives the counts reported in the results.
sapply returns a matrix with ActivityType as rows and Name-ActivityDates as columns so t is used to form the transpose which is then bound to dt using cbind.
I tried to make the code have good performance and be comprehensible. If futher performance improvements are needed, the base R code can be made a bit faster or this approach could be moved to use either dplyr or data.table which might also help. Let me know if you have any more questions.