I am extremely impressed with how much improvement in speed I get for tapply-like operations using data.table compared to data frames. 
For example:
df = data.frame(class = round(runif(1e6,1,1000)), x=rnorm(1e6))
DT = data.table(df)
# takes ages if somefun is complex
res1 = tapply(df$x, df$class, somefun) 
# takes much faster 
setkey(DT, class)
res2 = DT[,somefun(x),by=class] 
However, I didn't quite manage to get it to work noticeably faster than data frames in apply-like operations (i.e., cases, in which a function needs to be applied to each row). 
df = data.frame(x1 = rnorm(1e6), x2=rnorm(1e6))
DT = data.table(df)
# takes ages if somefun is complex
res1 = apply(df, 1, somefun) 
# not much improvement, if at all 
DT[,rowid:=.I] # or: DT$rowid = 1:nrow(DT)
setkey(DT, rowid)
res2 = DT[,somefun1(x1,x2),by=rowid] 
Is this really just to be expected or there are some tricks?