Unnesting deep lists after applying the rolling_origin function from the rsample package

Question

I have some data which looks like:

head:

  dfID  date       group groupValues
  <chr> <date>     <chr>       <dbl>
1 df1   2020-03-01 grp1       0.175 
2 df1   2020-03-01 grp2       0.150 
3 df1   2020-03-01 grp3       0.0509

tail:

  dfID  date       group groupValues
  <chr> <date>     <chr>       <dbl>
1 df3   2020-06-29 grp7        0.705
2 df3   2020-06-29 grp8        0.473
3 df3   2020-06-29 grp9        0.900

Which is a time series data with 3 unique dfID's and 9 unique group's. Filtering the date column to a single day I have (3 df's and 9 groups):

df %>% 
  filter(date == "2020-03-01")

  dfID  date       group groupValues
  <chr> <date>     <chr>       <dbl>
1 df1   2020-03-01 grp1       0.175 
2 df1   2020-03-01 grp2       0.150 
3 df1   2020-03-01 grp3       0.0509
4 df2   2020-03-01 grp4       0.133 
5 df2   2020-03-01 grp5       0.779 
6 df2   2020-03-01 grp6       0.506 
7 df3   2020-03-01 grp7       0.868 
8 df3   2020-03-01 grp8       0.552 
9 df3   2020-03-01 grp9       0.274

I next want to split the data up into a pairwise combination of the dfID's.

combinedSplit <- combn(levels(as.factor(df$dfID)), m = 2, FUN = function(x)
  df %>% 
    filter(dfID %in% x), simplify = FALSE)

names(combinedSplit) <- combn(levels(as.factor(df$dfID)), m = 2, str_c, collapse="_")

So now I have a list of 3 dfs - one for each pairwise combination of the dfID's:

df1_df2
df1_df3
df2_df3

Now, I want to apply the rolling_origin function from the rsample package to split the time series into training and testing data. I can naively apply the function by mapping over the lists.

rolledData <-  combinedSplit %>% 
  map(., ~ rolling_origin(
    data = .,
    initial = 60,
    assess = 1,
    cumulative = FALSE,
    skip = 0
  )
)

I can access the tail of the first split for the combination df1_df2.

map(rolledData$df1_df2$splits, ~analysis(.x))[[1]] %>% tail()

  dfID  date       group groupValues
  <chr> <date>     <chr>       <dbl>
1 df1   2020-03-19 grp1        0.528
2 df1   2020-03-19 grp2        0.394
3 df1   2020-03-19 grp3        0.532
4 df1   2020-03-20 grp1        0.586
5 df1   2020-03-20 grp2        0.369
6 df1   2020-03-20 grp3        0.153

Which is incorrect. In the rolling_origin function I stated the training/assessment period to be 60 perids (days) but this data ends on the 20th March. This is because it is taking the first 60 observations of the data not of the time series (3 group's * 20 days).

So I want to apply the rolling_origin function to each of the grp's - with each grp having 60 day rolling windows.

Here, I thought it would be best to first nest() the group's and then apply the rolling_origin function, since then each of the group's would be separate and 60 days will correspond to grp1, grp2 and grp3. Then, unnest() the group's to put grp1, grp2 and grp3 back into the same data frame.

nestedRolledData <- map(combinedSplit, ~group_by(.x, group) %>% 
      nest() %>% 
      mutate(
        rolledData = map(data, ~.x %>% 
                           rolling_origin(
                             data = .,
                             initial = 60,
                             assess = 0,
                             cumulative = FALSE,
                             skip = 0
                           )
                         )
        )
)

I am having difficulty with this unnesting.

Inspecting the structure of the lists:

layer 1 = contains df1_df2, df1_df3 and df2_df3 combinations.
layer 2 = contains the nested data with the "rolledData" of interest.
layer 3 = (expanding rolledData contains 6 lists) contains a list for each of the grp's (each df das 3 grp's).
layer 4 = (expanding one of the grp's) contains a splits list (generated from the rolling_origin function).
layer 5 = (expanding the splits list) contains [[1]] ... [[63]]. The lists here correspond to each of the rolling_origin function training splits. In the rolling_origin function I set initial = 60 and the model creates 63 splits of the data between 2020-03-01 and 2020-06-30. (If I changed initial = 90 I get 33 splits. So the number of lists here is dependent on the time duration in the data and the initial part in the rolling_origin function).

I can access the first splits using map(nestedRolledData$df1_df2$rolledData[[1]]$splits, ~analysis(.x))[[1]] which gives me a data frame of 60 observations starting on 2020-03-01 and ending 2020-04-29. For the second split in this list map(nestedRolledData$df1_df2$rolledData[[1]]$splits, ~analysis(.x))[[2]] I have another data frame containing 60 observations starting this time on the 2020-03-02 and ending on the 2020-04-30 (so this data has been shifted by a single day). I can do this up until [[63]] where it starts on 2020-05-02 and ends on 2020-06-30 (which is the last day in my data).

This is what I want - i.e. the data made the correct time series splits for each grp. Now I want to unnest these and put them back into the correct data frames. Going back to layer 3 of the list which contains [[1]], ... , [[6]] 6 lists. These correspond to each of the 3 groups for the 2 dfIDs. So I essentially want to merge these lists together.

If I were to contruct these manually it might look like:

# grp 1:6 for rolling_origin split 1
map(nestedRolledData$df1_df2$rolledData[[1]]$splits, ~analysis(.x))[[1]]
map(nestedRolledData$df1_df2$rolledData[[2]]$splits, ~analysis(.x))[[1]]
map(nestedRolledData$df1_df2$rolledData[[3]]$splits, ~analysis(.x))[[1]]
map(nestedRolledData$df1_df2$rolledData[[4]]$splits, ~analysis(.x))[[1]]
map(nestedRolledData$df1_df2$rolledData[[5]]$splits, ~analysis(.x))[[1]]
map(nestedRolledData$df1_df2$rolledData[[6]]$splits, ~analysis(.x))[[1]]

# grp 1:6 for rolling_origin split 2
map(nestedRolledData$df1_df2$rolledData[[1]]$splits, ~analysis(.x))[[2]]
map(nestedRolledData$df1_df2$rolledData[[2]]$splits, ~analysis(.x))[[2]]
map(nestedRolledData$df1_df2$rolledData[[3]]$splits, ~analysis(.x))[[2]]
map(nestedRolledData$df1_df2$rolledData[[4]]$splits, ~analysis(.x))[[2]]
map(nestedRolledData$df1_df2$rolledData[[5]]$splits, ~analysis(.x))[[2]]
map(nestedRolledData$df1_df2$rolledData[[6]]$splits, ~analysis(.x))[[2]]

# ...

# grp 1:6 for rolling_origin split 63
map(nestedRolledData$df1_df2$rolledData[[1]]$splits, ~analysis(.x))[[63]]
map(nestedRolledData$df1_df2$rolledData[[2]]$splits, ~analysis(.x))[[63]]
map(nestedRolledData$df1_df2$rolledData[[3]]$splits, ~analysis(.x))[[63]]
map(nestedRolledData$df1_df2$rolledData[[4]]$splits, ~analysis(.x))[[63]]
map(nestedRolledData$df1_df2$rolledData[[5]]$splits, ~analysis(.x))[[63]]
map(nestedRolledData$df1_df2$rolledData[[6]]$splits, ~analysis(.x))[[63]]

Data:

library(rsample)
df1 = data.frame(
  date = seq.Date(from = as.Date("2020-03-01"), to = as.Date("2020-06-30"), by = "days"),
  grp1 = runif(122),
  grp2 = runif(122),
  grp3 = runif(122)
) %>% 
  pivot_longer(cols = where(is.numeric), names_to = "group", values_to = "groupValues") %>% 
  add_column(dfID = "df1")

df2 = data.frame(
  date = seq.Date(from = as.Date("2020-03-01"), to = as.Date("2020-06-30"), by = "days"),
  grp4 = runif(122),
  grp5 = runif(122),
  grp6 = runif(122)
) %>% 
  pivot_longer(cols = where(is.numeric), names_to = "group", values_to = "groupValues") %>% 
  add_column(dfID = "df2")

df3 = data.frame(
  date = seq.Date(from = as.Date("2020-03-01"), to = as.Date("2020-06-30"), by = "days"),
  grp7 = runif(122),
  grp8 = runif(122),
  grp9 = runif(122)
) %>% 
  pivot_longer(cols = where(is.numeric), names_to = "group", values_to = "groupValues") %>% 
  add_column(dfID = "df3")

df <- bind_rows(df1, df2, df3) %>% 
  relocate(dfID, .before = date)



map(rolledData$splits, ~analysis(.x))[[1]] %>% tail()

combinedSplit <- combn(levels(as.factor(df$dfID)), m = 2, FUN = function(x)
  df %>% 
    filter(dfID %in% x), simplify = FALSE)

names(combinedSplit) <- combn(levels(as.factor(df$dfID)), m = 2, str_c, collapse="_")

rolledData <-  combinedSplit %>% 
  map(., ~ rolling_origin(
    data = .,
    initial = 60,
    assess = 1,
    cumulative = FALSE,
    skip = 0
  )
)

map(rolledData$df1_df2$splits, ~analysis(.x))[[1]] %>% tail()


nestedRolledData <- map(combinedSplit, ~group_by(.x, group) %>% 
      nest() %>% 
      mutate(
        rolledData = map(data, ~.x %>% 
                           rolling_origin(
                             data = .,
                             initial = 60,
                             assess = 0,
                             cumulative = FALSE,
                             skip = 0
                           )
                         )
        )
)

akrun · Accepted Answer · 2020-12-23T17:14:09.857

May be we need a 3 nested map

library(purrr)
library(rsample)
out <- map(nestedRolledData, ~ map(.x$rolledData, ~ map(.x$splits, analysis)))

str(out, max.level = 3)
#List of 3
# $ df1_df2:List of 6
#  ..$ :List of 63
#  .. ..$ : tibble [60 × 3] (S3: tbl_df/tbl/data.frame)
#  .. ..$ : tibble [60 × 3] (S3: tbl_df/tbl/data.frame)
#  .. ..$ : tibble [60 × 3] (S3: tbl_df/tbl/data.frame)
#  .. ..$ : tibble [60 × 3] (S3: tbl_df/tbl/data.frame)
#  .. ..$ : tibble [60 × 3] (S3: tbl_df/tbl/data.frame)
# ..

Unnesting deep lists after applying the rolling_origin function from the rsample package

1 Answers1