Can TensorFlow cache (sub-)graph computations?

Question

Can TensorFlow automatically cache computations if they involve multiple calls to the same computation (sub-)graph?

For example, I have a matrix F in which each entry represents a computation based on trainable variables W. My objective function multiplies this matrix several times with different vectors (each time with unchanged W).

Will TensorFlow recompute, for example, F[1,2] whenever I access it, or will it cache that value?

In theory, one could precompute the matrix F given a fixed W, such that each entry in F is a tf.constant. But that would prevent the correct computation of the gradients of W.

This question seems related: http://stackoverflow.com/questions/34536340/how-to-use-tensorflow-optimizer-without-recomputing-activations-in-reinforcement — Markus, Mar 11 '16 at 02:07

score 2 · Answer 1 · answered Mar 10 '16 at 04:43

TensorFlow performs a limited amount of caching, but it probably doesn't cover the case that you describe.

If you create a tf.Session with the following options, constant folding will be enabled:

config = tf.ConfigProto(graph_options=tf.GraphOptions(
    optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L2)))
sess = tf.Session(config=config)

When you call sess.run() with this configuration, TensorFlow will evaluate the appropriate nodes to run, then identify the subgraph of those nodes whose outputs are constant, evaluate them, and cache the results. Therefore, it will avoid re-executing redundant computation.

However, in your question you mention that F is a function of some trainable variables. From TensorFlow's point of view, these variables are volatile—they may change at any time—so it does not cache values that are derived from these variables. If you want to reuse the same value for F multiple times, you could consider storing it in a tf.constant() so that the constant folding optimization is more useful.

Thanks for the info. I see that option `tf.OptimizerOptions.L1` does `common_subexpression_elimination` and `tf.OptimizerOptions.L2` does `constant_folding`. But if those options do not handle the gradients correctly it wouldn't work for my case. — Markus, Mar 10 '16 at 16:37
Both optimizations should have no effect on the semantics of your program, so should handle the gradients correctly. However, since the vast majority of the gradient calculation depends on the current variable values, I wouldn't expect a large speedup (maybe a few percent, due to eliminating op dispatch overhead). — mrry, Mar 10 '16 at 16:58

Can TensorFlow cache (sub-)graph computations?

1 Answers1