When to use model.predict(x) vs model(x) in tensorflow

Question

I've got a keras.models.Model that I load with tf.keras.models.load_model.

Now there are two options to use this model. I can call model.predict(x) or I can call model(x).numpy(). Both options give me the same result, but model.predict(x) takes over 10x longer to run.

The comments in the source code state:

Computation is done in batches. This method is designed for performance in large scale inputs. For small amount of inputs that fit in one batch, directly using __call__ is recommended for faster execution, e.g., model(x), or model(x, training=False)

I've tested with x containing 1; 1,000,000; and 10,000,000 rows and model(x) still performs better.

How large does the input need to be to be classified as a large scale input, and for the model.predict(x) to perform better?

[This](https://stackoverflow.com/questions/58378374/why-does-keras-model-predict-slower-after-compile/#answer-58385156) may be of help — OverLordGoldDragon, Feb 10 '20 at 23:18

jkr · Accepted Answer · 2020-02-10T23:24:25.280

There is an existing stack overflow answer that you might find useful: https://stackoverflow.com/a/58385156/5666087. I found it on tensorflow/tensorflow#33340. That answer suggests passing experimental_run_tf_function=False into the model.compile call to revert to the TF 1.x version of model execution. You can also omit the model.compile call entirely (it is not necessary for prediction).

How large does the input need to be to be classified as a large scale input, and for the model.predict(x) to perform better?

This is something you can test. As the documentation states, model(x) will likely be faster than model.predict(x) if your data fit in one batch. One thing that model.predict(x) provides over model(x) is the ability to predict on multiple batches. If you want to predict on multiple batches with model(x), you have to write the loop yourself. model.predict also provides other features, like callbacks.

FYI the documentation in the source code was added in commit 42f469be0f3e8c36624f0b01c571e7ed15f75faf, as a result of tensorflow/tensorflow#33340.

The main behavior of model.predict(x) is implemented here. It contains more than just the forward pass of the model. This could account for some of the speed differences.

I've tested with x containing 1; 1,000,000; and 10,000,000 rows and model(x) still performs better.

Do these 10,000,000 rows fit into one batch...?

Thank you for this. I guess I don't know enough about all this to really understand what is happening. Would a batch be whatever amount of data I can fit into (GPU) memory? — Gunnarsi, Feb 11 '20 at 09:11
The batch size is the number of examples used in one iteration of the model. You could set your batch size to whatever you can fit onto your GPU, but too large of a batch size can lead to overfitting ([Keskar et al., 2016](https://arxiv.org/abs/1609.04836)). — jkr, Feb 11 '20 at 13:51

When to use model.predict(x) vs model(x) in tensorflow

1 Answers1

Linked