tl;dr: This type of modeling is sensitive to the starting values. I detail how to successfully fix the issue. The three main things to aid in this fix are:
- Increase maximum iteration and function calls
- Acquire informed starting values
- Potentially restrict/subset the data as countries initially added have less information available about them
Briefly in overview:
I started tinkering by just blindly changing starting values. Changing c(1000, 20, 20) to c(20000, 10, 10) I got the model to run error/warning free as baseModel.naive with a Log-likelihood: -23451.37. Using increased maxima for iteration and function calls and acquiring informed starting values enabled the model to improve substantially with baseModel.bellcurve reporting a Log-likelihood: -20487.86.
Naively change starting values will get rid of errors
I adjusted the starting values. Also, I showed how to increase the maximum number of functional calls and iterations and use the verbose option. More can be found using ?nlme and ?nlmeControl. In my experience, these types of models can be sensitive to starting values and maximum iterations and function calls.
baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat,
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(20000, 10, 10)),
na.action = na.omit,
control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
baseModel.naive
Output:
> baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
+ data = dat,
+ fixed = list(d ~ 1, mu ~ 1, var ~ 1),
+ random = d + mu + var ~ 1|Country.Region,
+ start = list(fixed = c(20000, 10, 10)),
+ na.action = na.omit,
+ control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
0: 30455.774: 2.23045 10.6880 8.80935 -11177.6 3277.46 -1877.96
1: 30450.763: 2.80826 11.2726 9.37889 -11177.6 3277.46 -1877.96
2: 30449.789: 2.94771 11.5283 9.80691 -11177.6 3277.46 -1877.96
3: 30449.150: 3.30454 11.8277 10.0329 -11177.6 3277.46 -1877.96
4: 30448.727: 3.64209 12.2237 10.4571 -11177.6 3277.46 -1877.96
5: 30448.540: 3.97875 12.5637 10.8217 -11177.6 3277.46 -1877.96
6: 30448.445: 4.31754 12.9055 11.1826 -11177.6 3277.46 -1877.96
7: 30448.397: 4.65727 13.2480 11.5420 -11177.6 3277.46 -1877.96
8: 30448.372: 4.99746 13.5908 11.9008 -11177.6 3277.46 -1877.96
9: 30448.360: 5.33789 13.9337 12.2591 -11177.6 3277.46 -1877.96
10: 30448.354: 5.67844 14.2767 12.6173 -11177.6 3277.46 -1877.96
11: 30448.351: 6.01905 14.6197 12.9753 -11177.6 3277.46 -1877.96
12: 30448.349: 6.35969 14.9627 13.3333 -11177.6 3277.46 -1877.96
13: 30448.349: 6.70035 15.3058 13.6913 -11177.6 3277.46 -1877.96
14: 30448.348: 7.04102 15.6489 14.0493 -11177.6 3277.46 -1877.96
15: 30448.348: 7.38170 15.9919 14.4073 -11177.6 3277.46 -1877.96
16: 30448.348: 7.72237 16.3350 14.7652 -11177.6 3277.46 -1877.96
17: 30448.348: 8.06305 16.6781 15.1232 -11177.6 3277.46 -1877.96
18: 30448.348: 8.40373 17.0211 15.4811 -11177.6 3277.46 -1877.96
19: 30448.348: 8.74441 17.3642 15.8391 -11177.6 3277.46 -1877.96
20: 30448.348: 9.08508 17.7073 16.1971 -11177.6 3277.46 -1877.96
21: 30448.348: 9.42576 18.0503 16.5550 -11177.6 3277.46 -1877.96
0: 30108.384: 9.42573 18.0503 16.5550 -11183.6 3279.09 -1879.43
1: 30108.384: 9.42568 18.0503 16.5550 -11183.6 3279.09 -1879.43
2: 30108.384: 9.42544 18.0502 16.5548 -11183.6 3279.09 -1879.43
0: 30108.056: 9.42539 18.0502 16.5548 -11168.0 3239.13 -1879.51
1: 30108.056: 9.42526 18.0502 16.5549 -11168.0 3239.13 -1879.51
0: 30108.055: 9.42523 18.0502 16.5549 -11150.0 3223.70 -1879.28
1: 30108.055: 9.42523 18.0502 16.5549 -11150.0 3223.70 -1879.28
> baseModel.naive
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat
Log-likelihood: -23451.37
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
4290.47415 35.32178 38.44048
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 1.402353e-01 d mu
mu 2.518250e-05 0
var 1.123208e-04 0 0
Residual 1.738523e+03
Number of Observations: 2641
Number of Groups: 126
But then maybe that isn't enough. See the Corr with the 0's? Something seems off. So I took a page from Roland ( https://stackoverflow.com/a/27549369/2727349) and decided to get a self starting function that was similar to your bellcurve.model.
I also tried subsetting the data to Country.Regions that have a minimum number of days in case that was an issue. I am detailing my results/code here in sections and at the end (in the Appendix) it is all together for a quick copy and paste.
Preliminaries & Data exploration
To explore things, I tried limiting the data to countries that had a minimum number of days. I chose 45 days (5 countries) and slowly increased it up to 1 day (full dataset). I used data.table to calculate and display pertinent things.
library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)
# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
f <- d*exp(-((x - mu)^2) / (2*var))
return(f)
}
dim(dat)
head(dat)
## how many countries?
dat[,uniqueN(Country.Region)]
## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]
minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))
Get automatic starting values from nls() and the data
This is where Roland's answer in a related post comes into play. We need an automated way to get informed starting values, and one way of doing that is using nls() and a self starting function. You can make a self starting function out of your bellcurve.model but I found SSbell that was similar and decided to make use of it. I run nls() with SSbell and get starting values for the SSbell formulation which has ymax (corresponds to d of bellcurve.model) and xc (corresponds to mu in bellcurve.model). For the var starting value in bellcurve.model I first calculate each and every Country.Region's variance and then selected one of the smaller ones, settling on the 20%-tile because that worked for minimum_days<-45 and minimum_days<-1 (full data).
## use this approach https://stackoverflow.com/a/27549369/2727349
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)
## calculate a starting value for var: the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))
##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start
Again -- I had to play around a bit -- but if you take the 20%-tile of empirical variances the nlme call for bellcurve.model the code will run for minimum_days <- 45 as well as minimum_days <- 1 (full dataset). With our starting values calculated and potentially our dataset restricted, we fit two nlme models: one using SSbell and the other bellcurve.model. Both will run for minimum_days<-45 and only bellcurve.model will converge for minimum_days<-1 (full dataset). Lucky!
Two nlme calls: one with SSbell, the other for bellcurve.model
# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc),
data = dat[N>=minimum_days],
fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
random = ymax + a + b + xc ~ 1|Country.Region,
start = list(fixed = c( coef(nls_starting_values) )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc for mu;
# NLME Model: using bellcurve.model and var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat[N>=minimum_days],
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
na.action = na.omit,
control=list(maxIter=1e6,
msMaxIter = 1e6,
msVerbose = TRUE)
)
Compare the output
baseModel.ssbell
baseModel.bellcurve
Showing the output for minimum_days<-45 shows similar fits (look at likelihood).
## 5 countries DATA minimum_days <- 45
> baseModel.ssbell
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ SSbell(day, ymax, a, b, xc)
Data: dat[N >= minimum_days]
Log-likelihood: -2264.706
Fixed: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
ymax a b xc
1.026599e+03 -1.164625e-02 -1.626079e-04 1.288924e+01
Random effects:
Formula: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
ymax 1.731582e+03 ymax a b
a 4.467475e-06 -0.999
b 1.016493e-04 -0.998 0.999
xc 3.528238e+00 1.000 -0.999 -0.999
Residual 8.045770e+02
Number of Observations: 278
Number of Groups: 5
## 5 countries DATA minimum_days <- 45
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat[N >= minimum_days]
Log-likelihood: -2267.406
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
965.81525 12.73549 58.22049
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 1.633432e+03 d mu
mu 2.815230e+00 1.00
var 5.582379e-03 -0.01 -0.01
Residual 8.127932e+02
Number of Observations: 278
Number of Groups: 5
Showing output for baseModel.bellcurve for minimum_days<-1 (full dataset) shows an improvement from baseModel.naive where I blindly and arbitrarily fiddled with the starting values for the sole purpose of getting rid of the errors and warnings.
## FULL DATA minimum_days <- 1
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat[N >= minimum_days]
Log-likelihood: -20487.86
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
1044.52101 25.00288 81.79004
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 4.285702e+03 d mu
mu 5.423452e+00 0.545
var 3.008379e-03 0.028 0.050
Residual 4.985698e+02
Number of Observations: 2641
Number of Groups: 126
Log-likelihood: -20487.86 for baseModel.bellcurve vs Log-likelihood: -23451.37 for baseModel.naive The Corr matrix in baseModel.bellcurve looks better, too.
Appendix: Code in one grab.
library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)
# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
f <- d*exp(-((x - mu)^2) / (2*var))
return(f)
}
dim(dat)
head(dat)
## how many countries?
dat[,uniqueN(Country.Region)]
## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]
minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))
## use this approach https://stackoverflow.com/a/27549369/2727349
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)
## calculate a starting value for var: the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))
##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start
# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc),
data = dat[N>=minimum_days],
fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
random = ymax + a + b + xc ~ 1|Country.Region,
start = list(fixed = c( coef(nls_starting_values) )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc for mu;
# NLME Model: using bellcurve.model and var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat[N>=minimum_days],
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
baseModel.ssbell
baseModel.bellcurve