detach()
One example without detach():
from torchviz import make_dot
x=torch.ones(2, requires_grad=True)
y=2*x
z=3+x
r=(y+z).sum()
make_dot(r)

The end result in green r is a root of the AD computational graph and in blue is the leaf tensor.
Another example with detach():
from torchviz import make_dot
x=torch.ones(2, requires_grad=True)
y=2*x
z=3+x.detach()
r=(y+z).sum()
make_dot(r)

This is the same as:
from torchviz import make_dot
x=torch.ones(2, requires_grad=True)
y=2*x
z=3+x.data
r=(y+z).sum()
make_dot(r)
But, x.data is the old way (notation), and x.detach() is the new way.
What is the difference with x.detach()
print(x)
print(x.detach())
Out:
tensor([1., 1.], requires_grad=True)
tensor([1., 1.])
So x.detach() is a way to remove requires_grad and what you get is a new detached tensor (detached from AD computational graph).
torch.no_grad
torch.no_grad is actually a class.
x=torch.ones(2, requires_grad=True)
with torch.no_grad():
y = x * 2
print(y.requires_grad)
Out:
False
From help(torch.no_grad):
Disabling gradient calculation is useful for inference, when you are sure
| that you will not call :meth:Tensor.backward(). It will reduce memory
| consumption for computations that would otherwise have requires_grad=True.
|
| In this mode, the result of every computation will have
| requires_grad=False, even when the inputs have requires_grad=True.