Yes, this also happens on my pc with following configurations:
- 20.04.1-Ubuntu
- 1.7.1+cu110
According to information from fastai discussion:https://forums.fast.ai/t/gpu-memory-not-being-freed-after-training-is-over/10265/8
This is related to the python garbage collector in ipython environment.
def pretty_size(size):
    """Pretty prints a torch.Size object"""
    assert(isinstance(size, torch.Size))
    return " × ".join(map(str, size))
def dump_tensors(gpu_only=True):
    """Prints a list of the Tensors being tracked by the garbage collector."""
    import gc
    total_size = 0
    for obj in gc.get_objects():
        try:
            if torch.is_tensor(obj):
                if not gpu_only or obj.is_cuda:
                    print("%s:%s%s %s" % (type(obj).__name__, 
                                          " GPU" if obj.is_cuda else "",
                                          " pinned" if obj.is_pinned else "",
                                          pretty_size(obj.size())))
                    total_size += obj.numel()
            elif hasattr(obj, "data") and torch.is_tensor(obj.data):
                if not gpu_only or obj.is_cuda:
                    print("%s → %s:%s%s%s%s %s" % (type(obj).__name__, 
                                                   type(obj.data).__name__, 
                                                   " GPU" if obj.is_cuda else "",
                                                   " pinned" if obj.data.is_pinned else "",
                                                   " grad" if obj.requires_grad else "", 
                                                   " volatile" if obj.volatile else "",
                                                   pretty_size(obj.data.size())))
                    total_size += obj.data.numel()
        except Exception as e:
            pass        
    print("Total size:", total_size)
if I do something like
import torch as th
a = th.randn(10, 1000, 1000)
aa = a.cuda()
del aa
th.cuda.empty_cache()
you will not see any decrease in nvidia-smi/nvtop.
But you can find out what is happening using handy function
dump_tensors()
and you may observe following informations:
Tensor: GPU pinned 10 × 1000 × 1000
Total size: 10000000
That means your gc still holds the resources.
One may refer to more discussions for python gc mechanism.
- Force garbage collection in Python to free memory