This post addresses timings of various base R methods for this calculation. This post is inspired by comments to this post and the comment of @josilber in the post to the fastest method posted by Jake Burkhead.
Below, a variety of methods are used to calculate the random walk. To accomplish this, each function pulls 1000 values of either 1 or -1 as defined in fnc below. The timing test uses microbenchmark with 1000 replications for each method.
fnc <- function(n) sample(c(1L, -1L), n, replace=TRUE)
library(microbenchmark)
microbenchmark(all=cumsum(fnc(1000L)),
reduce=Reduce("+", fnc(1000L), accumulate=TRUE),
laplyRpCln=cumsum(unlist(lapply(rep.int(1L, 1000L), fnc))),
laplyRpAn=cumsum(unlist(lapply(rep.int(1L, 1000L), function(x) fnc(1L)))),
laplySqAn=cumsum(unlist(lapply(seq_len(1000L), function(x) fnc(1L)))),
saplyRpCln=cumsum(sapply(rep.int(1L, 1000L), fnc)),
saplyRpAn=cumsum(sapply(rep.int(1L, 1000L), function(x) fnc(1L))),
saplySqAn=cumsum(sapply(seq_len(1000L), function(x) fnc(1L))),
vaplyRpCln=cumsum(vapply(rep.int(1L, 1000L), fnc, FUN.VALUE=0)),
vaplyRpAn=cumsum(vapply(rep.int(1L, 1000L), function(x) fnc(1L), FUN.VALUE=0)),
vaplySqAn=cumsum(vapply(seq_len(1000L), function(x) fnc(1L), FUN.VALUE=0)),
replicate=cumsum(replicate(1000L, fnc(1L))),
forPre={vals <- numeric(1000L); for(i in seq_along(vals)) vals[i] <- fnc(1L); cumsum(vals)},
forNoPre={vals <- numeric(0L); for(i in seq_len(1000L)) vals <- c(vals, fnc(1L)); cumsum(vals)},
times=1000)
Here,
- "all" uses the suggestion of Jake Burkhead,
cumsum and pulling the sample all at once.
- "reduce" pulls the sample at once, but uses
Reduce to perform the summation.
- laplyRpCln uses
lapply and unlist to return a vector and iterates through 1000 instances of 1, calling the function directly by name.
- laplyRpAn differs in using an anonymous function.
- laplySqAn uses an anonymous function and creates the iterating variable with
seq rather than rep.
- saplyRpCln, laplyRpAn, laplySqAn are the same as laplyRpCln, etc except that
sapply is called instead of lapply/unlist.
- vaplyRpCln, etc are the same as laplyRpCln, etc except that
vapply is used in place of lapply/unlist.
- replicate is a call to
replicate, where the default is simplify=TRUE.
- forPre uses a
for loop that pre-allocates the vector and the fills it in.
- forNoPre uses a
for loop that creates an empty numeric(0) vector and then uses c to concatenate to that vector.
This returns
Unit: microseconds
expr min lq mean median uq max neval cld
all 25.634 31.0705 85.66495 33.6890 35.3400 49240.30 1000 a
reduce 542.073 646.7720 780.13592 696.4775 750.2025 51685.44 1000 b
laplyRpCln 4349.384 5026.4015 6433.60754 5409.2485 7209.3405 58494.44 1000 c e
laplyRpAn 4600.200 5281.6190 6513.58733 5682.0570 7488.0865 55239.04 1000 c e
laplySqAn 4616.986 5251.4685 6514.09770 5634.9065 7488.1560 54263.04 1000 c e
saplyRpCln 4362.324 5080.3970 6325.66531 5506.5330 7294.6225 59075.02 1000 cd
saplyRpAn 4701.140 5386.1350 6781.95655 5786.6905 7587.8525 55429.02 1000 e
saplySqAn 4651.682 5342.5390 6551.35939 5735.0610 7525.4725 55148.32 1000 c e
vaplyRpCln 4366.322 5046.0625 6270.66501 5482.8565 7208.0680 63756.83 1000 c
vaplyRpAn 4657.256 5347.2190 6724.35226 5818.5225 7580.3695 64513.37 1000 de
vaplySqAn 4623.897 5325.6230 6475.97938 5769.8130 7541.3895 14614.67 1000 c e
replicate 4722.540 5395.1420 6653.90306 5777.3045 7638.8085 59376.89 1000 c e
forPre 5911.107 6823.3040 8172.41411 7226.7820 9038.9550 56119.11 1000 f
forNoPre 8431.855 10584.6535 11401.64190 10910.0480 11267.5605 58566.27 1000 g
Notice that the first method is clearly the fastest. This is followed by pulling the full sample at once and then using Reduce to perform the summation. Among the *apply functions, the "clean" versions, using the name of the function directly seem to have a minor performance improvement, and the lapply version seems to be on par with vapply, but given the range of values, this conclusion is not entirely straightforward. sapply appears to be the slowest, though the method of function call dominates the type of *apply function.
The two for loops performed the worst, and the pre-allocation for loop outperforming for loop growing with c.
Here, I'm running a patched version of 3.4.1 (patched roughly August 23, 2017) on openSuse 42.1.
Please let me know if you see any mistakes and I'll fix them as soon as I can. Thanks to Ben Bolker for prodding me to investigate the final function more, where I found a couple of bugs.