Can you make this R code faster? Can't see how to vectorize it. I have a data-frame as follows (sample rows below):
> str(tt)
'data.frame':   1008142 obs. of  4 variables:
 $ customer_id: int, visit_date : Date, format: "2010-04-04", ...
I want to compute the diff between visit_dates for a customer.
So I do diff(tt$visit_date), but have to enforce a discontinuity (NA) everywhere customer_id changes and the diff is meaningless, e.g. row 74 below.
The code at bottom does this, but takes >15 min on the 1M row dataset.
I also tried piecewise computing and cbind'ing the subresult per customer_id (using which()), that was also slow.
Any suggestions? Thanks. I did search SO, R-intro, R manpages, etc.
   customer_id visit_date visit_spend ivi
72          40 2011-03-15       18.38   5
73          40 2011-03-20       23.45   5
74          79 2010-04-07      150.87  NA
75          79 2010-04-17      101.90  10
76          79 2010-05-02      111.90  15
Code:
all_tt_cids <- unique(tt$customer_id)
# Append ivi (Intervisit interval) column
tt$ivi <- c(NA,diff(tt$visit_date))
for (cid in all_tt_cids) {
  # ivi has a discontinuity when customer_id changes
  tt$ivi[min(which(tt$customer_id==cid))] <- NA
}
(Wondering if we can create a logical index where customer_id differs to the row above?)