How to make a loop run faster in R?

Question

I want to use arms() to get one sample each time and make a loop like the following one in my function. It runs very slowly. How could I make it run faster? Thanks.

library(HI)    
dmat <- matrix(0, nrow=100,ncol=30)
system.time(
    for (d in 1:100){
        for (j in 1:30){
            y <- rep(0, 101)
            for (i in 2:100){

                y[i] <- arms(0.3, function(x) (3.5+0.000001*d*j*y[i-1])*log(x)-x,
                    function(x) (x>1e-4)*(x<20), 1)       
            }
        dmat[d, j] <- sum(y)
        }
    }
)

You have three nested for loops. You're basically running in O(n^3). Do you really need them all? — simchona, Feb 09 '12 at 18:49
If you want the loop you have, you're going to have horrible run time every time. You need to modify it entirely to break out of the bad time. — simchona, Feb 09 '12 at 18:54
Since the underlying function doesn't seem particularly parallelizable, can you try running it on a cluster or multiple cores? See `library(parallel); help(package="parallel")` — Ben Bolker, Feb 09 '12 at 18:59
Try http://stackoverflow.com/a/8474941/636656 for some suggestions — Ari B. Friedman, Feb 09 '12 at 19:13
It looks like the code you have here just runs the same statement 100*30*100 times, overwriting y each time. Please clarify your problem with code that shows what you are trying to do. — , Feb 09 '12 at 19:14
@Seth I modified my code. Could you take a look at it? Thanks. — moli, Feb 09 '12 at 19:40
What is `y <- seq(0, 1:100)` supposed to do? Same as `y <- 0:1`. Maybe you intended `y <- numeric(100)` ?? — Tommy, Feb 09 '12 at 22:13
In your second anonymous function, don't multiply logical values, use & instead. — Richie Cotton, Feb 09 '12 at 23:35

Simon Urbanek · Accepted Answer · 2012-02-10T04:25:53.130

This is a version based on Tommy's answer but avoiding all loops:

library(multicore) # or library(parallel) in 2.14.x
set.seed(42)
m = 100
n = 30
system.time({
    arms.C <- getNativeSymbolInfo("arms")$address
    bounds <- 0.3 + convex.bounds(0.3, dir = 1, function(x) (x>1e-4)*(x<20))
    if (diff(bounds) < 1e-07) stop("pointless!")
    # create the vector of z values
    zval <- 0.00001 * rep(seq.int(n), m) * rep(seq.int(m), each = n)
    # apply the inner function to each grid point and return the matrix
    dmat <- matrix(unlist(mclapply(zval, function(z)
            sum(unlist(lapply(seq.int(100), function(i)
                .Call(arms.C, bounds, function(x) (3.5 + z * i) * log(x) - x,
                      0.3, 1L, parent.frame())
            )))
        )), m, byrow=TRUE)
})

On a multicore machine this will be really fast since it spreads the loads across cores. On a single-core machine (or for poor Windows users) you can replace mclapply above with lapply and get only a slight speedup compared to Tommy's answer. But note that the result will be different for parallel versions since it will use different RNG sequences.

Note that any C code that needs to evaluate R functions will be inherently slow (because interpreted code is slow). I have added the arms.C just to remove all R->C overhead to make moli happy ;), but it doesn't make any difference.

You could squeeze out a few more milliseconds by using column-major processing (the question code was row-major which requires re-copying as R matrices are always column-major).

Edit: I noticed that moli changed the question slightly since Tommy answered - so instead of the sum(...) part you have to use a loop since y[i] are dependent, so the function(z) would look like

function(z) { y <- 0
    for (i in seq.int(99))
         y <- y + .Call(arms.C, bounds, function(x) (3.5 + z * y) * log(x) - x,
                        0.3, 1L, parent.frame())
    y }

Nope - it is not because calls for all values are independent. It can be implemented as a loop but doesn't have to (which is the whole point if you read the above). — Simon Urbanek, Feb 12 '12 at 03:31

Tommy · Answer 2 · 2012-02-09T22:49:24.553

Well, one effective way is to get rid of the overhead inside arms. It does some checks and calls the indFunc every time even though the result is always the same in your case. Some other evaluations can be also be done outside the loop. These optimizations bring down the time from 54 secs to around 6.3 secs on my machine. ...and the answer is identical.

set.seed(42)
#dmat2 <- ##RUN ORIGINAL CODE HERE##

# Now try this:
set.seed(42)
dmat <- matrix(0, nrow=100,ncol=30)
system.time({
    e <- new.env()
    bounds <- 0.3 + convex.bounds(0.3, dir = 1, function(x) (x>1e-4)*(x<20))
    f <- function(x) (3.5+z*i)*log(x)-x
    if (diff(bounds) < 1e-07) stop("pointless!")
    for (d in seq_len(nrow(dmat))) {
        for (j in seq_len(ncol(dmat))) {
            y <- 0
            z <- 0.00001*d*j
            for (i in 1:100) {
                y <- y + .Call("arms", bounds, f, 0.3, 1L, e)
            }
            dmat[d, j] <- y
        }
    }
}) 

all.equal(dmat, dmat2) # TRUE

Thanks for your help. My logdens is very complicated and will be updated with the each sample point from arms(). So It takes very long time to run. Maybe I need go back to write c function. — moli, Feb 09 '12 at 23:23
So you have to change logdens based on the sample from arms? If that's the case then the question definitely needs to be rewritten — John, Feb 09 '12 at 23:36

score 0 · Answer 3 · answered Feb 09 '12 at 22:32

why not like this?

dat <- expand.grid(d=1:10, j=1:3, i=1:10)

arms.func <- function(vec) {
  require(HI)
  dji <- vec[1]*vec[2]*vec[3]
  arms.out <- arms(0.3, 
                   function(x,params) (3.5 + 0.00001*params)*log(x) - x,
                   function(x,params) (x>1e-4)*(x<20),
                   n.sample=1,
                   params=dji)

  return(arms.out)
}

dat$arms <- apply(dat,1,arms.func)

library(plyr)
out <- ddply(dat,.(d,j),summarise, arms=sum(arms))

matrix(out$arms,nrow=length(unique(out$d)),ncol=length(unique(out$j)))

However, its still single core and time consuming. But that isn't R being slow, its the arms function.

system.time(`y <- arms(runif(1,1e-4,20), function(x) (3.5)*log(x)-x, function(x) (x>1e-4)*(x<20), 100*30*100))` user system elapsed 2.739 0.010 2.766 So I guess communication between R and c take too much time. — moli, Feb 09 '12 at 23:26

How to make a loop run faster in R?

3 Answers3