The World Development Indicator looks as follows
library(data.table)
WDI <- fread("CountryName CountryCode IndicatorName IndicatorCode 1960 1961 2017
ArabWorld ARB A FX.OWN.TOTL.ZS 37.16521072 37.16521072 37.16521072
ArabWorld ARB B FX.OWN.TOTL.FE.ZS 25.63540268 25.63540268 25.63540268
ArabWorld ARB C FX.OWN.TOTL.MA.ZS 48.32851791 48.32851791 48.32851791
ArabWorld ARB D FX.OWN.TOTL.OL.ZS 42.54204559 42.54204559 42.54204559
ArabWorld ARB E FX.OWN.TOTL.40.ZS 27.72478104 27.72478104 27.72478104
ArabWorld ARB F FX.OWN.TOTL.PL.ZS 26.45811081 26.45811081 26.45811081
ArabWorld ARB G FX.OWN.TOTL.60.ZS 43.44695282 43.44695282 43.44695282
ArabWorld ARB H FX.OWN.TOTL.SO.ZS 48.66697693 48.66697693 48.66697693
ArabWorld ARB I FX.OWN.TOTL.YG.ZS 20.95479965 20.95479965 20.95479965
", header = TRUE)
I used the following code to reshape the World Development Indicator Database from the World Bank.
library(dplyr)
library(tidyr)
WDI <- WDI %>%
select(-`Indicator Name`) %>%
gather(Year, val,`1960`:`2017`) %>%
spread(`Indicator Code`, val)
It used to work without any issues, but for some reason it now requires too much memory for the operation to be completed.
I have tried to remove all other databases from the work space, gc(), closing all other programs on the computer and to reduce the years used to reshape, but that did not fix the problem. As a result, I wonder whether there is a less memory intensive way to deal with the issue.
EDIT 1: According to this post dcast.data.table or reshape (because of not running out of memory) is the way to go. I am however having a lot of trouble rewriting the syntax (I got the dplyr answer my posting a question as well) How would I rewrite the dplyr code for using dcast/reshape?
How do the terms select, gather, spread, relates to dcast and reshape?
EDIT 2: I have first tried to melt the data.table as follows:
WDI = melt(WDI, id.vars = c("IndicatorCode", "CountryName", "CountryCode"),
# measure.vars = -c("IndicatorCode", "CountryName", "CountryCode", "IndicatorName"))
measure.vars = c("1960", "1961","2017"))
colnames(WDI)[4] <- "year"
WDI = dcast(WDI, CountryName + CountryCode + year ~ IndicatorCode, value.var="value")
But then I get the "warning" Aggregation function missing: defaulting to length and all entries are just 1 instead of the values. This apparently happens when the combination of entries are not unique. I am however pretty sure that they are (the combination of the country and the Indicator, should make the entry unique).