I meant actually more the insides of the model, than any training step.
y' = f(x) → if
f is non-linear, then the transformation is considered non-linear as well.
z = g(y, y') → this is something totally separate from the above transform.
f is linear, and
g is non-linear, the transformation above is still considered linear - this is because your
y' (the predictions or however you call it) depend directly on
g (you can’t use
z and call it prediction, because it’s a totally different measure).
Notice that it’s usually hard to confuse final activation function from the loss/cost function → the latter one accepts two inputs, because it’s task is to measure how well the model have performed.