Why does:
with torch.no_grad():
     w = w - lr*w.grad
     print(w)
results in:
tensor(0.9871)
and
with torch.no_grad():
     w -= lr*w.grad
     print(w)
results in:
tensor(0.9871, requires_grad=True)
Aren't both operations the same?
Here is some test code:
def test_stack(): 
    np.random.seed(0)
    n = 50
    feat1 = np.random.randn(n, 1)
    feat2 = np.random.randn(n, 1)
    
    X = torch.tensor(feat1).view(-1, 1)
    Y = torch.tensor(feat2).view(-1, 1)
    
    w = torch.tensor(1.0, requires_grad=True)
    
    epochs = 1
    lr = 0.001
    
    for epoch in range(epochs):
        for i in range(len(X)):
            y_pred = w*X[i]
            loss = (y_pred - Y[i])**2
            loss.backward()
            
            with torch.no_grad():
                #w = w - lr*w.grad  # DOESN'T WORK!!!!
                #print(w); return
                w -= lr*w.grad
                print(w); return
                w.grad.zero_()
Remove the comments and you'll se the requires_grad disappearing. Could this be a bug?