Why transpose the weights?

I’m wondering why we need to transpose the weights at all

Weights and biases

w = torch.randn(2, 3, requires_grad=True)
b = torch.randn(2, requires_grad=True)
def model(x):
return x @ w.t() + b

why not
w = torch.randn(3,2, requires_grad=True)
b = torch.randn(2, requires_grad=True)
def model(x):
return x @ w + b

Well, the answer lies within linear algebra. I have written and deleted this post several times concluding to ask how much linear algebra do you know? Because however I try to explain I need to know your knowledge level.

I don’t know if we both share the same scale on “linear algebra knowledge”. Can you pls post the answer that you know. I can follow up with any questions if necessary.

In your example, it seems like you have 2 variables with 6 weights but from the linear algebraic standpoint, you have 6 variables. Because we are trying to find the values of weights these become the variables. Columns group these variables into 3 groups(now rows because they are the variables), (for example temp, rainfall, and humidity from the first lesson). And we can only make calculations within the same groups.

We are not actually solving the equation x @ w + b = y for x but we are trying to solve it for w.

I don’t know if this helps

Thanks Shaica!! But I still don’t understand if this is the answer I was looking for. I wanted to know why we need to transpose, instead create an array that has the same shape as the transposed one.

If you see the example, the only things I changed are in bold. My guess is, both the code are same. Why the first code is in jovian notebook and not the second code.

If you write in the second way you are still transposing the matrix, just by hand this time. I suppose it will work without a problem (in the case we define the linear regression model by hand not from the nn module).

But from the mathematical standpoint that’s how it is defined.

I hope this helps better :thinking:

1 Like