I am using the nlsLM function from the minpack.lm package to find the values of parameters a, e, and c that give the best fit to the data out.
Here is my code:
n <- seq(0, 70000, by = 1)
TR <- 0.946
b <- 2000
k <- 50000
nr <- 25
na <- 4000
nd <- 3200
d <- 0.05775
y <- d + ((TR*b)/k)*(nr/(na + nd + nr))*n
## summary(y)
out <- data.frame(n = n, y = y)
plot(out$n, out$y)
## Estimate the parameters of a nonlinear model
library(minpack.lm)
k1 <- 50000
k2 <- 5000
fit_r <- nlsLM(y ~ a*(e*n + k1*k2 + c), data=out,
               start=list(a = 2e-10,
                          e = 6e+05, 
                          c = 1e+07), lower = c(0, 0, 0), algorithm="port")
print(fit_r)
## summary(fit_r)
df_fit <- data.frame(n = seq(0, 70000, by = 1))
df_fit$y <- predict(fit_r, newdata = df_fit)
plot(out$n, out$y, type = "l", col = "red", ylim = c(0,10))
lines(df_fit$n, df_fit$y, col="green")
legend(0,ceiling(max(out$y)),legend=c("observed","predicted"), col=c("red","green"), lty=c(1,1), ncol=1)
The fitting to the data seems to be very sensitive to initial conditions. For example:
- with list(a = 2e-10, e = 6e+05, c = 1e+07), this gives a good fit:
Nonlinear regression model model: y ~ a * (e * n + k1 * k2 + c) data: out a e c 2.221e-10 5.895e+05 9.996e+06 residual sum-of-squares: 3.225e-26 Algorithm "port", convergence message: Relative error between `par' and the solution is at most `ptol'.
- with list(a = 2e-01, e = 100, c = 2), this gives a bad fit:
Nonlinear regression model model: y ~ a * (e * n + k1 * k2 + c) data: out a e c 1.839e-08 1.000e+02 0.000e+00 residual sum-of-squares: 476410 Algorithm "port", convergence message: Relative error in the sum of squares is at most `ftol'.
So, Is there an efficient way to find initial conditions that give a good fit to the data ?
EDIT:
I added the following code to explain better the problem. The code is used to find the values of a, e, and c that give the best fit to data from several data sets. Each line in Y corresponds with one data set. By running the code, there is an error message for the 3rd data set (or 3rd line in Y): singular gradient matrix at initial parameter estimates. Here is the code:
TR <- 0.946
b <- 2000
k <- 50000
nr <- 25
na <- 4000
nd <- 3200
d <- 0.05775
Y <- data.frame(k1 = c(114000, 72000, 2000, 100000), k2 = c(47356, 30697, 214, 3568), n = c(114000, 72000, 2000, 100000), 
           na = c(3936, 9245, 6834, 2967), nd = c(191, 2409, 2668, 2776), nr = c(57, 36, 1, 50), a = NA, e = NA, c = NA)
## Create a function to round values
roundDown <- function(x) {
  k <- floor(log10(x))
  out <- floor(x*10^(-k))*10^k
  return(out)
}
ID_line_NA <- which(is.na(Y[,c("a")]), arr.ind=TRUE)
## print(ID_line_NA)
for(i in ID_line_NA){
  print(i)
  ## Define the variable y
  seq_n <- seq(0, Y[i, c("n")], by = 1)
  y <- d + (((TR*b)/(Y[i, c("k1")]))*(Y[i, c("nr")]/(Y[i, c("na")] + Y[i, c("nd")] + Y[i, c("nr")])))*seq_n
  ## summary(y)
  out <- data.frame(n = seq_n, y = y)
  ## plot(out$n, out$y)
  ## Build the linear model to find the values of parameters that give the best fit 
  mod <- lm(y ~ n, data = out)
  ## print(mod)
  ## Define initial conditions
  test_a <- roundDown(as.vector(coefficients(mod)[1])/(Y[i, c("k1")]*Y[i, c("k2")]))
  test_e <- as.vector(coefficients(mod)[2])/test_a
  test_c <- (as.vector(coefficients(mod)[1])/test_a) - (Y[i, c("k1")]*Y[i, c("k2")])
  ## Build the nonlinear model
  fit <- tryCatch( nlsLM(y ~ a*(e*n + Y[i, c("k1")]*Y[i, c("k2")] + c), data=out,
                                   start=list(a = test_a,
                                              e = test_e,
                                              c = test_c), lower = c(0, 0, 0)),
                             warning = function(w) return(1), error = function(e) return(2))
  ## print(fit)
  if(is(fit,"nls")){
    ## Plot
    tiff(paste("F:/Sources/Test_", i, ".tiff", sep=""), width = 10, height = 8, units = 'in', res = 300)
    par(mfrow=c(1,2),oma = c(0, 0, 2, 0))
    df_fit <- data.frame(n = seq_n)
    df_fit$y <- predict(fit, newdata = df_fit)
    plot(out$n, out$y, type = "l", col = "red", ylim = c(0, ceiling(max(out$y))))
    lines(df_fit$n, df_fit$y, col="green")
    dev.off()
    ## Add the parameters a, e and c in the data frame
    Y[i, c("a")] <- as.vector(coef(fit)[c("a")])
    Y[i, c("e")] <- as.vector(coef(fit)[c("e")])
    Y[i, c("c")] <- as.vector(coef(fit)[c("c")])
  } else{
    print("Error in the NLM")
  }
}
So, using the constraints a > 0, e > 0, and c > 0, is there an efficient way to find initial conditions for the nlsLM function that give a good fit to the data and to avoid error messages ?
I added some conditions to define initial conditions for the parameters a, e, and c:
Using the result of the linear model lm(y ~ n):
c = intercept/a - k1*k2 > 0 and e = slope/a > 0 0 < a < intercept/(k1*k2)
, where intercept and slope is the intercept and slope of lm(y ~ n), respectively.


 
    