Code associated with these tutorials can be downloaded from here: https://github.com/bnsreenu/python_fo...
The essence of deep learning is to find best weights (and biases) for the network that minimizes error (loss).
This is done via an iterative process where weights are updated in each iteration in a direction that minimizes the loss.
But how do we assign the initial weights?
Zero initialization: Setting all initial weights to 0.
This means the derivative of loss function w.r.t. weights will be same for every weight in every iteration
this implies weights will not be updated.
Random weight initialization: Could assign very large or small weights
Large weights: Leads to large values at respective nodes (neurons) and when sigmoid is applied it results values close to 1.
Gradient slopes slowly and learning takes long times.
Small weights: Leads to small values at nodes resulting is similar situation as above.
He et al. (2015) proposed activation aware initialization of weights (for ReLu activation function)
He initialization: Initialize weights randomly but with a variance
Other similar initializers also try to find a good variance for the distribution from which the initial parameters are drawn.
Summary: Use He_Uniform for ReLu activation function
Watch video Tutorial 98 - Deep Learning terminology explained - Kernel (weights) initialization and padding online without registration, duration hours minute second in high quality. This video was added by user ZEISS arivis 08 April 2021, don't forget to share it with your friends and acquaintances, it has been viewed on our site 1,974 once and liked it 51 people.