I will use the following data set to illustrate my questions:
my_df <- data.frame(
a = 1:10,
b = 10:1
)
colnames(my_df) <- c("a", "b")
Part 1
I use the mutate() function to create two new variables in my data set and I would like to compute the row means of the two new columns inside the same mutate() call. However, I would really like to be able to use the select() helpers such as starts_with(), ends_with() or contains().
My first try:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2,
mean = rowMeans(select(ends_with("2")))
)
Error in mutate_impl(.data, dots) :
Evaluation error: No tidyselect variables were registered.
I understand why there is an error - the select() function is not given any .data argument. So I change the code in...
... my second try by adding "." inside the select() function:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2,
mean = rowMeans(select(., ends_with("2")))
)
a b a_2 b_2 mean
1 1 10 1 100 NaN
2 2 9 4 81 NaN
3 3 8 9 64 NaN
4 4 7 16 49 NaN
5 5 6 25 36 NaN
6 6 5 36 25 NaN
7 7 4 49 16 NaN
8 8 3 64 9 NaN
9 9 2 81 4 NaN
10 10 1 100 1 NaN
The new problem after the second try is that the mean column does not contain the mean of a_2 and b_2 as expected, but contains NaNs only. After studying the code a bit, I understood the second problem. The added "." in the select() function refers to the original my_df data frame, which does not have the a_2 and b_2 columns. So it makes sense that NaNs are produced because I am asking R to compute the means of nonexistent values.
I then tried to use dplyr functions such as current_vars() to see if it would make a difference:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2,
mean = rowMeans(select(current_vars(), ends_with("2")))
)
Error in mutate_impl(.data, dots) :
Evaluation error: Variable context not set.
However, this is obviously NOT the way to use this function. The solution is to simply add a second mutate() function:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2
) %>%
mutate(mean = rowMeans(select(., ends_with("2"))))
a b a_2 b_2 mean
1 1 10 1 100 50.5
2 2 9 4 81 42.5
3 3 8 9 64 36.5
4 4 7 16 49 32.5
5 5 6 25 36 30.5
6 6 5 36 25 30.5
7 7 4 49 16 32.5
8 8 3 64 9 36.5
9 9 2 81 4 42.5
10 10 1 100 1 50.5
Question 1: Is there any way to perform this task in the same mutate() call? Using a second mutate() function is not really an issue anyway; however, I am curious to know if there exists a way to refer to currently existing variables. The mutate() function allows for the usage of variables as soon as they are created inside the same mutate() call; however, this becomes problematic when functions are nested as shown in my example above.
Part 2
I also realize that using rowMeans() works in my solution; however, it is not really a dplyr-way of doing things especially because I need to use select() inside it. So, I decided to use the rowwise() and mean() functions instead. But once again, I would like to use one of the select() helpers for that and not have to list all variables in a c() function. I tried:
my_df %>%
mutate(
a_2 = a^2,
b_2 = b^2
) %>%
rowwise() %>%
mutate(
mean = mean(ends_with("2"))
)
Error in mutate_impl(.data, dots) :
Evaluation error: No tidyselect variables were registered.
I suspect that the error in the code is due to the fact that ends_with() is not inside select(), but I am showing this to ask whether there is a way to list the variables I want without having to specify them individually.
Thank you for your time.