I have data in the following format (excerpt):
# install.packages("data.table")
# install.packages("tidyverse")
library("data.table")
library("tidyverse")
dt <- data.table(
date = lubridate::as_date(c("2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04")),
location = c("Westpark", "Northpark", "Estpark", "Southpark"),
'receiver_a: max' = c(20, 30, 25, 15),
'receiver_a: min' = c(10, 15, 20, 5),
'receiver_b: max' = c(15, 45, 10, 50),
'receiver_b: min' = c(15, 45, 10, 50)
)
> dt
date location receiver_a: max receiver_a: min receiver_b: max receiver_b: min
1: 2021-01-01 Westpark 20 10 15 15
2: 2021-01-02 Northpark 30 15 45 45
3: 2021-01-03 Estpark 25 20 10 10
4: 2021-01-04 Southpark 15 5 50 50
I now want to split all columns starting with receiver_... in the column name and convert them to a long format:
- Split all columns with
receiver_...in the column name into new columnsreceiver,maxandmin. Everything after the:in the "old" column names starting withreceiver...is used for the new column names, heremaxandmin. - The new column
receivercontains e.g.receiver_a,receiver_betc as values. The new columnsmaxandmincontain the respective numerical values.
This can be implemented with tidyr::pivot_longer():
# dt <- dt %>%
dt %>%
tidyr::pivot_longer(
cols = dplyr::contains(":"),
names_to = c("receiver", ".value"),
names_sep = ": ",
names_repair = "minimal"
)
# A tibble: 8 x 5
date location receiver max min
<date> <chr> <chr> <dbl> <dbl>
1 2021-01-01 Westpark receiver_a 20 10
2 2021-01-01 Westpark receiver_b 15 15
3 2021-01-02 Northpark receiver_a 30 15
4 2021-01-02 Northpark receiver_b 45 45
5 2021-01-03 Estpark receiver_a 25 20
6 2021-01-03 Estpark receiver_b 10 10
7 2021-01-04 Southpark receiver_a 15 5
8 2021-01-04 Southpark receiver_b 50 50
cols = dplyr::contains(":"): select all columns with:inside the name, for examplereceiver_a: maxnames_to = c("receiver", ".value"): Split selected columns intoreceiverand.value..valueindicates that component of the name defines the name of the column containing the cell valuesnames_sep = ": "Ifnames_tocontains multiple values, these arguments control how the column name is broken up, here by:(whitespace after colon)
My question: Can this also be done with a (faster) data.table solution (e.g. with melt())?