Relation between the optimizer and loss.backward()

        opt = torch.optim.SGD(model.parameters(),lr=1e-5)
        #Compute gradients
        #Update parameters using gradients

While building a linear regression model using PyTorch built-ins, I encountered this. How are the gradients computed using loss.backward() linked to the optimizer here? The link to the jupyter notebook cell is given here

Optimizer keeps model parameters by reference (meaning if it modifies them, they change in the model, and if something modifies them, the optimizer is able to see such changes).

After calculating gradients for a loss, they (the gradients) are stored in parameters (they have grad field).

Now, since optimizer keeps a reference to params, it can check the gradients and apply whatever optimization technique it performs.

Thank you for your reply. So before computing the loss, I printed the parameters using:


and I printed the parameters even after computing loss.backward(). The values remain the same which makes me wonder how he optimizer keeps a reference to the parameters/gradients. Kindly let me know.


You are confusing parameter gradients with their values.

You should print gradients (grad field) instead of just values. Then you will see a difference.

Not checked code (should work anyway):
print(*[p.grad for p in model.parameters()])

Ah it makes sense. Thank you! It did work :slight_smile: