A data.table answer for your consideration. We're just using setattr() from it, which works on data.frame, and columns of data.frame. No need to convert to data.table.
The test data again :
dat <- cbind(rep(1:5,50000),rep(5:1,50000),rep(c(1L,2L,4L,5L,3L),50000))
dat <- cbind(dat,dat,dat,dat,dat,dat,dat,dat,dat,dat,dat,dat)
dat <- as.data.frame(dat)
re.codes <- c("This","That","And","The","Other")
Now change the class and set the levels of each column directly, by reference :
require(data.table)
system.time(for (i in 1:ncol(dat)) {
setattr(dat[[i]],"levels",re.codes)
setattr(dat[[i]],"class","factor")
}
# user system elapsed
# 0 0 0
identical(dat, <result in question>)
# [1] TRUE
Does 0.00 win? As you increase the size of the data, this method stays at 0.00.
Ok, I admit, I changed the input data slightly to be integer for all columns (the question has double input data in a third of the columns). Those double columns have to be converted to integer because factor is only valid for integer vectors. As mentioned in the other answers.
So, strictly with the input data in the question, and including the double to integer conversion :
dat <- cbind(rep(1:5,50000),rep(5:1,50000),rep(c(1,2,4,5,3),50000))
dat <- cbind(dat,dat,dat,dat,dat,dat,dat,dat,dat,dat,dat,dat)
dat <- as.data.frame(dat)
re.codes <- c("This","That","And","The","Other")
system.time(for (i in 1:ncol(dat)) {
if (!is.integer(dat[[i]]))
set(dat,j=i,value=as.integer(dat[[i]]))
setattr(dat[[i]],"levels",re.codes)
setattr(dat[[i]],"class","factor")
})
# user system elapsed
# 0.06 0.01 0.08 # on my slow netbook
identical(dat, <result in question>)
# [1] TRUE
Note that set also works on data.frame, too. You don't have to convert to data.table to use it.
These are very small times, clearly. Since it's only a small input dataset :
dim(dat)
# [1] 250000 36
object.size(dat)
# 68.7 Mb
Scaling up from this should reveal larger differences. But even so I think it should be (just about) measurably fastest. Not a significant difference that anyone minds about, at this size, though.
The setattr function is also in the bit package, btw. So the 0.00 method can be done with either data.table or bit. To do the type conversion by reference (if required) either set or := (both in data.table) is needed, afaik.