The General Problem
I want to vary the additional arguments passed on to a function in a lapply/sapply (or maybe mapply?) call. It would be nice to know how to do this in general. If it matters, though, for my specific purpose, I am trying to incorporate this into a custom function. (So hopefully it can scale).
Specific Example of Problem
Assume I have the following data frame:
df <- data.frame(column1 = letters[1:4],
column2 = LETTERS[1:4],
column3 = 1:4,
stringsAsFactors = FALSE)
As an example, I would like to convert column1 and column2 to factors, each with different levels. I might note the columns and levels as such:
# Columns in df I want to apply the factor() function to.
cols <- c("column1", "column2")
# Desired levels for column1
column1_lvl <- c(letters[1:5])
# Desired levels for column2
column2_lvl <- c(LETTERS[1:6])
Note that I have specified two separate levels for the columns, each with more levels than exist in df. This serves as a motivation for varying the arguments. Now I test out a lapply call without varying the levels argument to factor:
df[cols] <- lapply(df[,cols], factor)
This works and successfully converts those columns to factors. I redefine df to it's original structure for the next step. Now I want to specify the levels for each of the columns. In ?lapply, it says that you can pass additional arguments to FUN, but it doesn't specify how to vary those arguments over each vector in X. Trying this with one instance, I can write this:
df["column1"]<- factor(df[,"column1"], levels = column1_lvl)
This works. But now I want to abstract the levels argument. Unfortunately, this doesn't work, because no matter what you assign to levels, R will attempt to use that argument to each of the vectors in X.
Ideally, something like the following would work. The following is FAKE CODE that I wish would work the way I want it, but doesn't:
df[cols] <- lapply(df[,cols], factor, level = list(column1_lvl, column2_lvl))
What I have tried
I have not been able to find many resources that explain how I might be able to accomplish this. Or perhaps, I don't see what needs to be done. This post helped me a little, but I'm wondering if there is a way around creating my own factor function, for example.
Additionally, this person's answer to their own question encouraged me to check out mapply. Though I've read ?mapply's documentation, and followed along with some tutorials, I haven't been able to figure it out. On that front, I have tried the following code, which doesn't work (for my purposes):
col_levels <- list(column1_lvl, column2_lvl)
df[cols] <- mapply(factor, df[,cols], MoreArgs = col_levels)
SessionInfo
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1 yaml_2.1.19
Final Thoughts
I could just be having a difficult time knowing what to search for. I am always open to figuring out the problem myself, if you are able to point me in the right direction. Any additional resources are more than welcome.
Thanks, in advance!