I'm trying to get learn the data.table package, which seems fantastic.
One behaviour I'm unable to google my way to understanding is the following:
I want to use a function to create a variable by reference. If the data input is a data.frame I also want to make it a data.table, and this works - but the variable defined is only created if the data input is already a data.table. This seems strange to me, anyone knows the reason for this?
E.g. after running the bar function (which makes the input a data.table if it isn't already, and then defines a variable bar) on the data.frame X, X will be a data.table but the variable bar is not created. However, on Y (X coerced to a data.table), the variable bar is created. Why isn't the variable bar created in X?
library(data.table)
bar <- function(x){
if(!"data.table" %in% class(x)) setDT(x)
x[, bar := 1:.N]
invisible(NULL)
}
X <- data.frame(foo = letters[1:2])
Y <- as.data.table(X)
str(X)
## 'data.frame': 2 obs. of 1 variable:
## $ foo: chr "a" "b"
bar(X)
str(X)
## Classes 'data.table' and 'data.frame': 2 obs. of 1 variable:
## $ foo: chr "a" "b"
str(Y)
## Classes 'data.table' and 'data.frame': 2 obs. of 1 variable:
## $ foo: chr "a" "b"
## - attr(*, ".internal.selfref")=<externalptr>
bar(Y)
str(Y)
## Classes 'data.table' and 'data.frame': 2 obs. of 2 variables:
## $ foo: chr "a" "b"
## $ bar: int 1 2
## - attr(*, ".internal.selfref")=<externalptr>
I'm using R 4.2.0 and data.table 1.14.2
UPDATE: this is essentially the same question as in
and the answer provided there clarifies the issue. Thanks to Waldi for pointing me to this.