I should know this, but I don't. And that's because factors in R can be an absolute nightmare. This is a follow-up to my previous question. I'm hoping a few of you might be able to explain in a bit more detail than the R manuals about how to preserve the column attributes when passing a data frame to a custom function. So far, the most useful information I've dug up was from Hadley's Advanced R Programming site. But that section is quite short. Here's what I have:
Edits: I've added the source code to my GitHub (EDIT: link goes to gsub.dataframe.R now). Also, I think I may have a good way to determine whether to set stringsAsFactors = FALSE in the new data frame. Or, as a much easier alternative, I could add a stringsAsFactors argument. Is it possible to use ... for more than one set of further arguments? Like having ... be the further arguments to grep anddata.frame?
Set up some data
set.seed(24)
num <- rep(1, 10); int <- 1:10; fac <- sample(LETTERS[1:3], 10, TRUE)
D <- data.frame(num, int, fac); D$char <- as.character(letters[1:10])
Here's a call to the custom function, and the result.
(newD <- grep.dataframe("6|(a|f)", D, sub = "XXX", ignore.case = TRUE))
# num int fac char
# 1 1 1 XXX XXX
# 2 1 2 B b
# 3 1 3 C c
# 4 1 4 XXX d
# 5 1 5 XXX e
# 6 1 XXX C XXX
# 7 1 7 XXX g
# 8 1 8 B h
# 9 1 9 B i
# 10 1 10 XXX j
I haven't done anything, but have tried everything I can think of, to preserve as much information about the columns as I can (i.e. class(x) <-, attr(x, "name") <-, attributes(x) <-, I(x), etc.). The result you see above is absolutely correct as it reads. However, the result below is troubling. I could use a little help with getting the final data structure to match the original data structure. I'm thinking a switch statement might do the trick?
Note that
> args(grep.dataframe)
function (pattern, X, sub = NULL, ...)
NULL
with the sub argument calling gsub when not NULL
As always, I appreciate the help.
Note : I took the advice of Hadley (why wouldn't you?) and split this into two functions. My answer below is a new function that only calls gsub for regular expression matching.