When creating data frames with multiple variables using the data.frame() function, each variable
cannot be a function of other variables generated within data.frame(). This is demonstrated in the code sample below, where Example 1 succeeds because the expressions for x and y don't require any object in our environment and Example 2 returns an error because x is not in the global environment.
Why does this happen?
I can think of two possible explanations, but I do not know how to evaluate them (pun intended):
Scoping: each assignment expression is evaluated sequentially (i.e.
xis assigned thenyis assigned) but only looks for objects in the environment in whichdata.frame()was called. Sincedata.frame()was called in the global environment butxis not in the global environment, an error is returned in Example 2. This may also be whyy = 6rather thany = 1in Example 3.Evaluation: all assignment expressions are evaluated simultaneously (i.e. in parallel), causing
xto not exist in any environment at the timeyis assigned a value that is a function ofx. While R employs lexical (i.e. static) scoping, perhapsdata.frame()is designed to look forxin both the environment in whichxwas called and the child environments within the function.
# Example 1 (success)
data.frame(x = 0, y = 0 + 1)
#> x y
#> 1 0 1
# Example 2 (failure)
data.frame(x = 0, y = x + 1)
#> Error in data.frame(x = 0, y = x + 1): object 'x' not found
# Example 3
x <- 5
data.frame(x = 0, y = x + 1)
#> x y
#> 1 0 6
Note: I am trying to understand why data.frame() exhibits this behavior. As observed in the comments and demonstrated below, tibble::tibble() is an excellent option for users who wish to generate variables in a data.frame conditional on other variables in the data.frame.
library(tibble)
# Tibble Example 1: y uses x!
tibble(x = 0, y = x + 1)
#> # A tibble: 1 x 2
#> x y
#> <dbl> <dbl>
#> 1 0 1
# Tibble Example 2: y uses x, ignoring the global x!
x <- 5
tibble(x = 0, y = x + 1)
#> # A tibble: 1 x 2
#> x y
#> <dbl> <dbl>
#> 1 0 1