I need to group my data for province IDs (MUN_RESID) and population (V16). My dataframe contains 8,627,071 observations. I've been trying solutions provided in this forum for days now such as this and this, but nothing works. Any help on this would be greatly appreciated. Thank you very much
This is what the sample looks like:
          X MUN_RESID   V16 X08.2005_P  X09.2005_P X10.2005_P
1             1    110001 13203          0 0.007574036          0
2             2    110001 13203          0 0.007574036          0
3             3    110001 13203          0 0.007574036          0
4             4    110001 13203          0 0.007574036          0
5             5    110001 13203          0 0.007574036          0
6             6    110001 13203          0 0.007574036          0
7             7    110001 13203          0 0.007574036          0
8627069 8627069    530010 14802          0 0.000000000          0
8627070 8627070    530010 14802          0 0.000000000          0
8627071 8627071    530010 14802          0 0.000000000          0
==X==============================================================X==
     Copy+Paste this part. (If on a Mac, it is already copied!)
==X==============================================================X==
 months0606 <- structure(list(X = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8627069L, 8627070L,8627071L),
          MUN_RESID = c(110001L, 110001L, 110001L,
          110001L,110001L, 110001L, 110001L, 530010L, 530010L, 530010L),
          V16 = c(13203L,13203L, 13203L, 13203L, 13203L, 13203L, 13203L, 14802L, 14802L,14802L),
          X08.2005_P = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
          X09.2005_P = c(0.00757403620389305,0.00757403620389305,
          0.00757403620389305, 0.00757403620389305,0.00757403620389305,
          0.00757403620389305, 0.00757403620389305,0, 0, 0),
          X10.2005_P = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
          0)), class = "data.frame", row.names =
          c(1L,2L, 3L, 4L, 5L, 6L, 7L, 8627069L, 8627070L, 8627071L))
==X==============================================================X==
I've tried
months0606_grouped <- ddply(months0606, .(V16))
(does not give me any output at all)
library(dplyr)
months0606 %>% group_by(MUN_RESID, V16)
months0606 %>% dplyr::group_by(MUN_RESID)
(does not give me any error warning, but does no grouping either. This is the output:
# A tibble: 8,627,071 x 20
# Groups:   MUN_RESID [5,227]
       X MUN_RESID   V16 X08.2005_P X09.2005_P X10.2005_P
   <int>     <int> <int>      <dbl>      <dbl>      <dbl>
 1     1    110001 13203          0    0.00757          0
 2     2    110001 13203          0    0.00757          0
 3     3    110001 13203          0    0.00757          0
 4     4    110001 13203          0    0.00757          0
 5     5    110001 13203          0    0.00757          0
 6     6    110001 13203          0    0.00757          0
 7     7    110001 13203          0    0.00757          0
 8     8    110001 13203          0    0.00757          0
 9     9    110001 13203          0    0.00757          0
 10    10    110001 13203          0    0.00757          0
# ... with 8,627,061 more rows, and 14 more variables:
#   X11.2005_P <dbl>, X12.2005_P <dbl>,
#   X01.2006_P <dbl>, X02.2006_P <dbl>,
#   X03.2006_P <dbl>, X04.2006_P <dbl>,
#   X05.2006_P <dbl>, X06.2006_P <dbl>,
#   X07.2006_P <dbl>, X08.2006_P <dbl>,
#   X09.2006_P <dbl>, X10.2006_P <dbl>,
#   X11.2006_P <dbl>, X12.2006_P <dbl>
Also tried:
months0606$V16 <- with(months0606, ifelse(V16 %in% months0606, "V16"))
My goal is to have my dataframe look like this: Every combination of MUN_RESID and population level(V16) only contains one row:
MUN_RESID   V16     X08.2005_P  X09.2005_P 
110001      13203   0           0.007507
530010      530010  0           0
 
     
    