I am just a noob in this field and I would like to ask that in logistic regression model why do we take the transpose of the weights? Is it only because we defined weights as a 2x3 tensor.Say if we defined our weight matrix to be a 3x2, then is the transpose necessary?

Searched the net for a bit, and there are some apparent answers:

- It’s just sort of the unspoken rule, and every implementation seems to follow it (most likely to make it possible to use learned weights between different frameworks).
- Transposing the second matrix helps with data locality on CPU (avoid cache misses etc).
- Transposing in forward is easier to do than in backpropagation (not sure about this, matrix transpose is actually easy to perform).

I think the second answer makes the most sense, since matrix multiplication is a bit complex (implementing it from scratch in basic form requires three `for`

s). I guess if it’s possible to gain any performance, then it’s good idea to do it.

I think with such small matrices it doesn’t matter. But it’s done like this because of answer 1

