Regarding backword function

a = torch.tensor([5.0, 3.0], requires_grad=True)
b = torch.tensor([1.0, 4.0])
ab = ((a + b) ** 2).sum()
ab.backward()

RuntimeError Traceback (most recent call last)
in ()
----> 1 ab.backward()

2 frames
/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py in _make_grads(outputs, grads)
48 if out.requires_grad:
49 if out.numel() != 1:
—> 50 raise RuntimeError(“grad can be implicitly created only for scalar outputs”)
51 new_grads.append(torch.ones_like(out, memory_format=torch.preserve_format))
52 else:

RuntimeError: grad can be implicitly created only for scalar outputs

Why the grad should be implicitly created only for scalar outputs?

a = torch.tensor([5.0, 3.0], requires_grad=True)
b = torch.tensor([1.0, 4.0])
ab = ((a + b) ** 2).sum()
ab.backward()

This code is running for me. I don’t know why you got the error.
Anyway, for your question ‘Why the grad should be implicitly created only for scalar outputs?

First of all Scalar output means tensor with only one element in it.
Now, what we are doing is we are providing each element of a tensor as input and similarly if we write a relation that will give us two or more elements in a tensor it means that it is giving us two or more outputs (i.e. it is not giving scalar output) and hence it is not function then (Remember: A function only gives one output and we calculate the gradient of a function, not any relation) and hence obtaining gradient is not possible. :slightly_smiling_face:

what does learning rate do in training the model ? Does the val_loss decreases according to the lr we give ? please explain ?

Learning rate basically represents how large or small are you decreasing or increasing the weights and biases of the model.

Remember the 2nd notebook of this course (zerotogans) i.e. gradient descent and linear regression https://jovian.ai/aakashns/02-linear-regression. In this notebook, we updated weights and biases as shown below:

 with torch.no_grad():
     w -= w.grad * 1e-5
     b -= b.grad * 1e-5

Here, we decreased the weights and biases with the multiple of their gradients with loss. This multiple 1e-5 is the learning rate (lr).