When we are doing batchnorm, so it weights wont go high, right? Then why do we do gradient clipping?

@aakashns we are doing batchnorm so it weights wont go high rt…then y gradient clipping @PrajwalPrashanth

1 Like

Correction: Batch norm is applied to layer outputs, not weights. It is weight Weight decay prevents weights from becoming too large. Regardless, the purpose of gradient clipping is to prevent sudden undesired changes in weights due to large gradients.

See https://towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48