I am trying to iterate through global health epidemic data on a database which consists of daily cases, cumulative cases, daily deaths, and cumulative deaths (as well as some other covariables which aren't really relevant here). The table is structured as follows: For each country (with country name listed, region, ID) and each date (though not all dates are displayed for all countries*) the daily/cumulative cases/deaths/etc. are listed.
The data looks something like this:
# A tibble: 40 x 7
iso_code continent location date total_cases new_cases week
<chr> <chr> <chr> <date> <dbl> <dbl> <chr>
1 AFG Asia Afghanistan 2020-02-24 5 5 2020-08
2 AFG Asia Afghanistan 2020-02-25 5 0 2020-08
3 AFG Asia Afghanistan 2020-02-26 5 0 2020-08
4 AFG Asia Afghanistan 2020-02-27 5 0 2020-08
5 AFG Asia Afghanistan 2020-02-28 5 0 2020-08
6 AFG Asia Afghanistan 2020-02-29 5 0 2020-08
7 AFG Asia Afghanistan 2020-03-01 5 0 2020-09
8 AFG Asia Afghanistan 2020-03-02 5 0 2020-09
9 AFG Asia Afghanistan 2020-03-03 5 0 2020-09
10 AFG Asia Afghanistan 2020-03-04 5 0 2020-09
# ... with 30 more rows
I need to summarize the daily data into weekly data. Of course, this is no problem for one column: using methods described here I should be able to aggregate the data for each week, for each country as follows~
library(dplyr)
sumByColumn <- function(df, colName) {
# the method for daily (cases/deaths)/(cases/deaths) smoothed
df %>%
group_by(location, week) %>%
summarize(colName = sum(!! sym(colName)))
}
idByColumn <- function(df, colName) {
# the method for cumulative (cases/deaths)
df %>%
group_by(location, week) %>%
summarize(colName = identity(!! sym(colName)))
}
(It should be noted that, obviously, daily case/death data will be summarized, whereas cumulative case/death data will be simply the identity function as given. These columns, in the list of column names of df, are denoted as id_cols.)
However, when I try to run the sumByColumn()/idByColumn() loop along the entire dataframe df, I run into this error:
for (col in 1:ncol(df)) {
colName = colnames(df)[col]
if (col%in%id_cols) {
df_weekly = idByColumn(df_weekly,colName)
} else {
df_weekly = sumByColumn(df_weekly,colName)
}
}
I get:
Error in !sym(colName) : invalid argument type
Note: I have computed the frequency by which the number of times each country appears in the dataframe, which corresponds to the number of days the disease was tracked. Is there a way to account for this, e.g. when I go through the weeks, if there is no data for that week, or an uneven number of countries per week give data, to ignore it and not return NA?
916
916
910
892
884
899
971
938
899
946
Edit:
R Session Info is:
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_(Country).1252 LC_CTYPE=English_(Country).1252 LC_MONETARY=English_(Country).1252
[4] LC_NUMERIC=C LC_TIME=English_(Country).1252
system code page: 65001
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] nnet_7.3-17 plyr_1.8.6 car_3.0-12 carData_3.0-5
[5] nlme_3.1-153 lubridate_1.8.0 gridExtra_2.3 ExcelFunctionsR_0.1.4
[9] forcats_0.5.1 stringr_1.4.0 purrr_0.3.4 readr_2.1.2
[13] tidyr_1.2.0 tibble_3.1.6 ggplot2_3.3.5 tidyverse_1.3.1
[17] dplyr_1.0.7 readxl_1.3.1 poisson_1.0
loaded via a namespace (and not attached):
[1] tseries_0.10-49 httr_1.4.2 jsonlite_1.7.3 splines_4.1.2 modelr_0.1.8
[6] assertthat_0.2.1 TTR_0.24.3 sp_1.4-6 roperators_1.2.0 cellranger_1.1.0
[11] pillar_1.7.0 backports_1.4.1 lattice_0.20-45 glue_1.6.1 quadprog_1.5-8
[16] digest_0.6.29 rvest_1.0.2 colorspace_2.0-2 Matrix_1.3-4 timeDate_3043.102
[21] pkgconfig_2.0.3 broom_0.7.12 haven_2.4.3 scales_1.1.1 tzdb_0.2.0
[26] mgcv_1.8-38 generics_0.1.2 farver_2.1.0 ellipsis_0.3.2 withr_2.5.0
[31] urca_1.3-0 cli_3.1.1 quantmod_0.4.18 magrittr_2.0.2 crayon_1.5.0
[36] forecast_8.16 fs_1.5.2 fansi_1.0.2 xts_0.12.1 xml2_1.3.3
[41] tools_4.1.2 hms_1.1.1 lifecycle_1.0.1 munsell_0.5.0 reprex_2.0.1
[46] compiler_4.1.2 rlang_1.0.1 grid_4.1.2 rstudioapi_0.13 INLA_21.11.22
[51] labeling_0.4.2 gtable_0.3.0 fracdiff_1.5-1 abind_1.4-5 DBI_1.1.2
[56] curl_4.3.2 R6_2.5.1 zoo_1.8-9 utf8_1.2.2 stringi_1.7.6
[61] parallel_4.1.2 Rcpp_1.0.8 vctrs_0.3.8 dbplyr_2.1.1 tidyselect_1.1.2
[66] lmtest_0.9-39