Confusion on torch

Can anyone explain to me in detail what is happening here? what exactly is torch doing?

input_ids = torch.tensor(padded)
attention_mask = torch.tensor(attention_mask)

with torch.no_grad():
last_hidden_states = model(input_ids, attention_mask=attention_mask)

The Output is same with and without using torch. Moreover I didn’t observe any significant difference in the run time as well.

I’m not sure which part of torch you mean here. I’m sure the output wouldn’t be the same if you didn’t use torch at all.

If it’s torch.no_grad(), then the output will not change, because no_grad is used only to avoid grad calculation.

What i meant was

with torch.no_grad():
last_hidden_states = model(input_ids, attention_mask=attention_mask)

gives the same output as
last_hidden_states = model(input_ids, attention_mask=attention_mask).

why am I using a torch.no_grad here?

TL:DR
To avoid grad calculation.

A bit longer answer:
You only ask for an answer (for the given input), so there’s no need to track any gradients, which would be useful for backpropagation.
By avoiding the calculation of gradients, you keep the memory usage low, because the grads usually take a lot of data.

Thanks. I get it now.