BatchNorm
https://en.wikipedia.org/wiki/Batch_normalization
Each layer of a neural network has inputs with a corresponding distribution, which is affected during the training process by the randomness in the parameter initialization and the randomness in the input data. The effect of these sources of randomness on the distribution of the inputs to internal layers during training is described as internal co-variate shift.
Batch normalization was initially proposed to mitigate internal co-variate shift.
Transformation
Let us use B to denote a mini-batch of size m of the entire training set. The empirical mean and variance of B could thus be denoted as
Normalized as
BP
How to update
Inference
Last modified: 10 March 2024