@aakashns I have a lot of questions. Thanks for your patience with a newbie and hopefully answering some of these will help other students that come across the answers.
-
Is it useful to augment images and add those copies to the dataset or would the duplicates not be helpful?
-
Is there any literature or research that you can point to that explains the math or reasoning between residual blocks that do g(f(x)) + x vs g(f(x) + x) ?
-
Is there are reason that we add f(x) and x together and not subtract them? I’m also imagining that you could apply other function to learn things between the f(x) and x.
-
Is there a reason to do two conv layers in the residual block instead of one?
-
In a perfect world we wouldn’t need to do batches, right? If we processed all images in one batch, would you still want to do batch normalization for the one batch?
-
Would it make sense to do all the batches first and then do the normalization weights for a given layer for all batches at once based on their summary stats?
-
Is it best practices to build out the feature block to a 4x4 matrix? Is it just because you can’t max pool on a 2x2 matrix because it is already as small as possible?