Tutorial 98 - Deep Learning terminology explained - Kernel (weights) initialization and padding

Опубликовано: 08 Апрель 2021
на канале: ZEISS arivis
1,974
51

Code associated with these tutorials can be downloaded from here: https://github.com/bnsreenu/python_fo...

The essence of deep learning is to find best weights (and biases) for the network that minimizes error (loss).

This is done via an iterative process where weights are updated in each iteration in a direction that minimizes the loss.

But how do we assign the initial weights?

Zero initialization: Setting all initial weights to 0.

This means the derivative of loss function w.r.t. weights will be same for every weight in every iteration
 this implies weights will not be updated.

Random weight initialization: Could assign very large or small weights

Large weights: Leads to large values at respective nodes (neurons) and when sigmoid is applied it results values close to 1.
 Gradient slopes slowly and learning takes long times.

Small weights: Leads to small values at nodes resulting is similar situation as above.

He et al. (2015) proposed activation aware initialization of weights (for ReLu activation function)

He initialization: Initialize weights randomly but with a variance

Other similar initializers also try to find a good variance for the distribution from which the initial parameters are drawn.

Summary: Use He_Uniform for ReLu activation function


Смотрите видео Tutorial 98 - Deep Learning terminology explained - Kernel (weights) initialization and padding онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь ZEISS arivis 08 Апрель 2021, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 1,974 раз и оно понравилось 51 людям.