I am trying to implement the standard gradient descent algorithm with pytorch in order to perform dimensionality reduction (PCA) on the Indian Pines dataset. More specifically, I am trying to estimate matrix U1 that minimizes ||X-(U1 @ U1.T)@X||^2 where U1.T denotes the transpose of U1, @  denotes matrix multiplication, || denotes the Frobenius norm and X denotes the data (reconstruction error minimization).
For starters, I have vectorized the data and the variable indian_pines is of size torch.Size([220, 21025]) and I initilize U1 randomly with U1 = torch.rand(size=(220,150),dtype=tl.float64,requires_grad=True).
For the method itself, I have the following code:
n_iters = 100
learning_rate = 2e-9
for epoch in range(n_iters):
    
    #forward
    y_pred = torch.tensordot(U1 @ torch.t(U1),indian_pines,([0],[0]))
    #loss
    l = torch.norm(indian_pines - y_pred, 'fro')
    
    if epoch % 10 == 0: print(f'epoch: {epoch} loss: {l}')
    
    #gradient
    l.backward()
    
    #update
    with torch.no_grad():
        U1 -= learning_rate * U1.grad
        U1.grad.zero_()
with (example due to randomness) output:
epoch: 0 loss: 44439840488.652824
epoch: 10 loss: 27657067086.461464
epoch: 20 loss: 17353003250.14576
epoch: 30 loss: 10980377562.427532
epoch: 40 loss: 7000015690.042022
epoch: 50 loss: 4478747227.40419
epoch: 60 loss: 2847777701.784741
epoch: 70 loss: 1757431994.7743077
epoch: 80 loss: 990962121.4576876
epoch: 90 loss: 426658102.95583844
This loss seems to be very high and it gets even worse by increasing learning_rate. Of course decreasing it makes the loss function reduce in a much slower rate. My question is: Is there something wrong with the way I use autograd that results in such high loss? How could I improve quality? Thanks in advance.
