Gradient Descent Implementation

Train for 100 epochs

for i in range(100):
preds = model(inputs)
loss = mse(preds, targets)
loss.backward()
with torch.no_grad():
w -= w.grad * 1e-5
b -= b.grad * 1e-5
w.grad.zero_()
b.grad.zero_()

In this loop, should not we must calculate loss after adjusting w and b .I am getting doubt that each time we adjust w grad zero how are we calculating new grad as we are not taking new random w(weight).

If I’m understanding your question correctly:

  • loss.backward() calculates new gradients
  • w -= w.grad * 1e-5 & b -= b.grad * 1e-5 adjust your weights
  • w.grad.zero_() & b.grad.zero_() reset the gradients to zero because otherwise, when the for loop goes into its next iteration, it would add the newly calculated gradient to the existing gradient values.

As you can see from the above, we are:

  • getting predictions, then
  • calculating loss, then
  • calculating gradients, then
  • then adjusting our weights, then
  • resetting gradients for our next calculation.

The reason you do it in that order instead of re-calculating losses after adjusting the weights and biases is your predictions have not changed, and so your loss has not changed. Does this clear up your confusion?

1 Like

Naah , we’ll have to calculate loss first to find out the d(loss) w.r.t w,b.
After calculating loss the gradients are then multiplied to the learning rate
(w -= w.grad * 1e-5
b -= b.grad * 1e-5)
and which gives the new weight and bias.
in order to find these gradients we"ll have to get the loss.
And after getting the loss and reassigning the weights and bias , then after every batch or epoch we will predict the loss with these new bias and weights.

If you are thinking about initializing weights and bias, they are set to normal random values to avoid stucking of gradient descent(in our case) at local minima

1 Like