TL;DR - using a generator fails, using a list succeeds. why?
I am trying to change my model's parameters manually like so:
(1st code, works)
       delta = r_t + gamma * expected_reward_from_t1.data - expected_reward_from_t.data
        negative_expected_reward_from_t = -expected_reward_from_t
        self.critic_optimizer.zero_grad()
        negative_expected_reward_from_t.backward()
        for i, p in enumerate(self.critic_nn.parameters()):
             if not p.requires_grad:
                 continue
             p.grad[:] = delta.squeeze() * discount * p.grad
        self.critic_optimizer.step()
and it seems to converge on the correct result 100% of the time.
However,
When attempting to use a function like so:
(2nd code, fails)
def _update_grads(self,delta, discount):
    params = self.critic_nn.parameters()
    for i, p in enumerate(params):
        if not p.requires_grad:
            continue
        p.grad[:] = delta.squeeze() * discount * p.grad
and then
       delta = r_t + gamma * expected_reward_from_t1.data - expected_reward_from_t.data
        negative_expected_reward_from_t = -expected_reward_from_t
        self.critic_optimizer.zero_grad()
        negative_expected_reward_from_t.backward()
        self._update_grads(delta=delta,
                           discount=discount)
        self.critic_optimizer.step()
which seems like the only thing I've done was put self.critic_nn.parameters() into a temporary local variable params,
Now the network does not converge.
(3rd code, again, works)
When replacing in the method _update_grads: params = self.critic_nn.parameters() with params = list(self.critic_nn.parameters())
Now again, convergence is restored.
This seems like a referencing issue, which, in PyTorch, I do not completely understand. I don't seem to understand fully what returns from parameters()
The question: Why do the 1st and 3rd codes work, but the 2nd one doesn't?
