The title is a little garbled, but I'm not sure how else to describe it. I'm coming from Stata so still getting the hang of factors.
Basically, I want to be able to assign factor levels and labels, but any that I miss get assigned as a default level/label.
Take the following:
library(dplyr)
dt <- as.data.frame(mtcars) # load demo data
dt$carb[4:6] <- NA # set some rows to NA for example
dt <- dt%>%
mutate(
carb_f = factor(carb,
levels = c(1,2,3,4),
labels = c("One","Two","Three","Four")
)
)
table(dt$carb, dt$carb_f, exclude=NULL)
which yields the following:
One Two Three Four <NA>
1 5 0 0 0 0
2 0 9 0 0 0
3 0 0 3 0 0
4 0 0 0 10 0
6 0 0 0 0 1
8 0 0 0 0 1
<NA> 0 0 0 0 3
The unstated 6 and 8 are set to NA in the resultant factor carb_f. Although this is expected behaviour, I want to be able to request something like this:
dt <- dt%>%
mutate(
carb_f = factor(carb,
levels = c(1,2,3,4),
labels = c("One","Two","Three","Four"),
non-na(10,"Unk") # obvious pseudocode
)
)
to yield this:
One Two Three Four Unk <NA>
1 5 0 0 0 0 0
2 0 9 0 0 0 0
3 0 0 3 0 0 0
4 0 0 0 10 0 0
6 0 0 0 0 1 0
8 0 0 0 0 1 0
<NA> 0 0 0 0 0 3
...where the unstated 6 and 8 are assigned to a default level/label of 10 and Unk, but the true NA remain NA.
Is there a way of handling this without explicitly referencing 6 and 8 ?