This is a version based on Tommy's answer but avoiding all loops:
library(multicore) # or library(parallel) in 2.14.x
set.seed(42)
m = 100
n = 30
system.time({
    arms.C <- getNativeSymbolInfo("arms")$address
    bounds <- 0.3 + convex.bounds(0.3, dir = 1, function(x) (x>1e-4)*(x<20))
    if (diff(bounds) < 1e-07) stop("pointless!")
    # create the vector of z values
    zval <- 0.00001 * rep(seq.int(n), m) * rep(seq.int(m), each = n)
    # apply the inner function to each grid point and return the matrix
    dmat <- matrix(unlist(mclapply(zval, function(z)
            sum(unlist(lapply(seq.int(100), function(i)
                .Call(arms.C, bounds, function(x) (3.5 + z * i) * log(x) - x,
                      0.3, 1L, parent.frame())
            )))
        )), m, byrow=TRUE)
}) 
On a multicore machine this will be really fast since it spreads the loads across cores. On a single-core machine (or for poor Windows users) you can replace mclapply above with lapply and get only a slight speedup compared to Tommy's answer. But note that the result will be different for parallel versions since it will use different RNG sequences.
Note that any C code that needs to evaluate R functions will be inherently slow (because interpreted code is slow). I have added the arms.C just to remove all R->C overhead to make moli happy ;), but it doesn't make any difference.
You could squeeze out a few more milliseconds by using column-major processing (the question code was row-major which requires re-copying as R matrices are always column-major).
Edit: I noticed that moli changed the question slightly since Tommy answered - so instead of the sum(...) part you have to use a loop since y[i] are dependent, so the function(z) would look like
function(z) { y <- 0
    for (i in seq.int(99))
         y <- y + .Call(arms.C, bounds, function(x) (3.5 + z * y) * log(x) - x,
                        0.3, 1L, parent.frame())
    y }