Another answer with some notes of efficiency (although this QA is not about speed).
Firstly, it could be better to avoid the conversion of a "list"-y structure to a "matrix"; sometimes it's worth to convert to a "matrix" and use a function that handles efficiently a 'vector with a "dim" attribute' (i.e. a "matrix"/"array") - other times it's not. Both max.col and apply convert to a "matrix".
Secondly, in situations like these, where we do not need to check all the data while getting to a solution, we could benefit from a solution with a loop that controls what goes through to the next iteration. Here we know that we can stop when we've found the first "1". Both max.col (and which.max) have to loop once to, actually, find the maximum value; the fact that we know that "max == 1" is not taken advantage of.
Thirdly, match is potentially slower when we seek only one value in another vector of values because match's setup is rather complicated and costly:
x = 5; set.seed(199); tab = sample(1e6)
identical(match(x, tab), which.max(x == tab))
#[1] TRUE
microbenchmark::microbenchmark(match(x, tab), which.max(x == tab), times = 25)
#Unit: milliseconds
# expr min lq median uq max neval
# match(x, tab) 142.22327 142.50103 142.79737 143.19547 145.37669 25
# which.max(x == tab) 18.91427 18.93728 18.96225 19.58932 38.34253 25
To sum up, a way to work on the "list" structure of a "data.frame" and to stop computations when we find a "1", could be a loop like the following:
ff = function(x)
{
x = as.list(x)
ans = as.integer(x[[1]])
for(i in 2:length(x)) {
inds = ans == 0L
if(!any(inds)) return(ans)
ans[inds] = i * (x[[i]][inds] == 1)
}
return(ans)
}
And the solutions in the other answers (ignoring the extra steps for the output):
david = function(x) max.col(x, "first")
plafort = function(x) apply(x, 1, match, x = 1)
ff(df[-1])
#[1] 1 3 4 1
david(df[-1])
#[1] 1 3 4 1
plafort(df[-1])
#[1] 1 3 4 1
And some benchmarks:
set.seed(007)
DF = data.frame(id = seq_len(1e6),
"colnames<-"(matrix(sample(0:1, 1e7, T, c(0.25, 0.75)), 1e6),
paste("in", 11:20, sep = "")))
identical(ff(DF[-1]), david(DF[-1]))
#[1] TRUE
identical(ff(DF[-1]), plafort(DF[-1]))
#[1] TRUE
microbenchmark::microbenchmark(ff(DF[-1]), david(DF[-1]), as.matrix(DF[-1]), times = 30)
#Unit: milliseconds
# expr min lq median uq max neval
# ff(DF[-1]) 64.83577 65.45432 67.87486 70.32073 86.72838 30
# david(DF[-1]) 112.74108 115.12361 120.16118 132.04803 145.45819 30
# as.matrix(DF[-1]) 20.87947 22.01819 27.52460 32.60509 45.84561 30
system.time(plafort(DF[-1]))
# user system elapsed
# 4.117 0.000 4.125
Not really an apocalypse, but worth to see that simple, straightforward algorithmic approaches can -indeed- prove to be equally good or even better depending on the problem. Obviously, (most) other times looping in R can be laborious.