It seems that each Tensorflow session I open and close consumes 1280 bytes from the GPU memory, which are not released until the python kernel is terminated.
To reproduce, save the following python script as memory_test.py:
import tensorflow as tf
import sys
n_Iterations=int(sys.argv[1])
def open_and_close_session():
   with tf.Session() as sess:
      pass
for _ in range(n_Iterations):
   open_and_close_session()
with tf.Session() as sess:
   print("bytes used=",sess.run(tf.contrib.memory_stats.BytesInUse()))
Then run it from command line with different number of iterations:
python memory_test.py 0yieldsbytes used= 1280python memory_test.py 1yieldsbytes used= 2560.python memory_test.py 10yieldsbytes used= 14080.python memory_test.py 100yieldsbytes used= 129280.python memory_test.py 1000yieldsbytes used= 1281280.
The math is easy - each session opened and closed leaks 1280 bytes. I tested this script on two different ubuntu 17.10 workstations with tensorflow-gpu 1.6 and 1.7 and different NVIDIA GPUs.
Did I miss some explicit garbage collection or is it a Tensorflow bug?
Edit: Note that unlike the case described in this question, I add nothing to the default global graph within the loop, unless the tf.Session() objects themselves 'count'. If this is the case, how can one delete them? tf.reset_default_graph() or using with tf.Graph().as_default(), tf.Session() as sess: doesn't help.