I am using mlflow from R to register a model which can be retrieved and invoked from a separate container. The mlflow_log_model() function takes a crated function as argument.
The TRCDetect model can be found on GitHub and relies on 3 separate R6Class objects which are sourced from 2 additional files, Transform.R and SESD.R.
Below is the carrier::crate call.
predictor <- crate(
    function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR),
    TRCDetector=TRCDetector,
    SESD=SESD,
    TRES=TRES,
    TCHA=TCHA,
    TRAIN_SIZE=TRAIN_SIZE,
    DWIN=DWIN,
    RWIN=RWIN,
    ALPHA=ALPHA,
    MAXR=MAXR)
Below is what it looks like in terms of sizes
<crate> 400.24 kB
* function: 14.38 kB
* `SESD`: 268.21 kB
* `TRES`: 227.38 kB
* `TCHA`: 198.78 kB
* `TRCDetector`: 38.52 kB
* `ALPHA`: 56 B
* `DWIN`: 56 B
* `MAXR`: 56 B
* `RWIN`: 56 B
* `TRAIN_SIZE`: 56 B
function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR)
I am able to stash the crate in S3 and recover it from a separate container using mlflow_load_model().
Below is the recovered crate
function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR)
<environment: 0x556e0ae93c38>
attr(,"class")
Unfortunately, I am only able to use the recovered crate if I re-create the 3 dependent objects in the current environment like so
TRES <- get("TRES", get_env(p))
TCHA<- get("TCHA", get_env(p))
SESD <- get("SESD", get_env(p))
Without this, I get an 'Object not found' error.
This prevents the use of the R model with the standard mlflow models serve.
I have looked into R environments in related questions but I am failing to come up with a mechanism which ensures the model is usable straight after the mlflow_load_model() call.
The option of creating an Rserve wrapper which lets me revive the 3 dependent objects is not ideal. Am I missing something when I call crate?
EDIT: had a look at the often cited page Deploying R Models with MLflow and Docker - Option 2: Don’t install the package. While I didn't quite follow the suggestion of passing set_env() I tried that as follows
predictor <- crate(
    function(x) TRCDetector(data=x$value, time=x$timestamps, train_size=TRAIN_SIZE, dwin=DWIN, rwin=RWIN, alpha=ALPHA, maxr=MAXR),
    TRCDetector=set_env(TRCDetector),
    SESD=set_env(SESD),
    TRES=set_env(TRES),
    TCHA=set_env(TCHA),
    TRAIN_SIZE=TRAIN_SIZE,
    DWIN=DWIN,
    RWIN=RWIN,
    ALPHA=ALPHA,
    MAXR=MAXR)
Which results in the following error:
> p(data.frame(timestamps=rep(1:500,1),value=v))
Error in TRCDetector(data = x$value, time = x$timestamps, train_size = TRAIN_SIZE,  :
  attempt to apply non-function
