@aakashns we are doing batchnorm so it weights wont go high rt…then y gradient clipping @PrajwalPrashanth
When we are doing batchnorm, so it weights wont go high, right? Then why do we do gradient clipping?
1 Like
Correction: Batch norm is applied to layer outputs, not weights. It is weight Weight decay prevents weights from becoming too large. Regardless, the purpose of gradient clipping is to prevent sudden undesired changes in weights due to large gradients.
See https://towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48