So, I have a dataset df 1.4 GBs big, and I am trying to reshape it using the following function:
reshaped <- function(df){
  df %>%
    select(subject_num, concept_code) %>% 
    group_by(subject_num, concept_code) %>%
    count() %>% 
    spread(concept_code, n, fill=0)
  return(df)
}
df = read_rds('df.RDs') %>% 
         mutate(a=paste(a, b, sep="|"))
df <- reshaped(df)
write_rds(df, 'df_reshaped.RDs')
I get: Error: cannot allocate vector of size 1205.6 GB. While debugging I discovered that the code gets stuck at the spread statement inside the reshaped function. I don't see how a dataset of 1.4 GB could ask for 1205.6 GB of memory inside the dplyr code that I wrote. Nothing in the code above seems like duplicating this dataset about 900 times as well, so I am a bit stuck here. Could anyone suggest why is this happening?
