I'm trying to figure out the arguments for gather in the tidyr package.
I looked at the documentation, and the syntax looks like:
gather(data, key, value, ..., na.rm = FALSE, convert = FALSE)
There is an example in the help files:
stocks <- data.frame(
time = as.Date('2009-01-01') + 0:9,
X = rnorm(10, 0, 1),
Y = rnorm(10, 0, 2),
Z = rnorm(10, 0, 4)
)
gather(stocks, stock, price, -time)
I'm curious about the last line:
gather(stocks, stock, price, -time)
Here, stocks is clearly the data we want to modify, which is fine.
So I can read that stock and price are arguments to a key value pair -- but how does this function decide how to select columns to create this key value pair? The original dataframe looks like this:
time X Y Z
2009-01-01 1.10177950 -1.1926213 -7.4149618
2009-01-02 0.75578151 -4.3705737 -0.3117843
2009-01-03 -0.23823356 -1.3497319 3.8742654
2009-01-04 0.98744470 -4.2381224 0.7397038
2009-01-05 0.74139013 -2.5303960 -5.5197743
I don't see any indication that we should use any combination of X, Y or Z. When I'm using this function, I feel like I'm just choosing names for what I want the columns in my long formatted dataframe to be, and praying that gather magically works. Come to think of it, I feel the same way when I use melt.
Does gather look at the column's type? How does it map from wide to long?
EDIT
Great answer below, great discussion below, and for anyone else wanting more info on the philosophy and use of the tidyr package should definitely read this paper, although the vignette doesn't explain the syntax.