In this video I cover the AdamW optimizer in comparison with the classical Adam. Also, I underline the differences between L2 Regularization and Weight Decay which stays at the core of understanding what new things AdamW brings to the table.
Resources
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
An overview of gradient descent optimization algorithms: https://ruder.io/optimizing-gradient-...
AdamW original paper: https://arxiv.org/abs/1711.05101
Adam original paper: https://arxiv.org/abs/1412.6980
Related Videos
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
AMSGrad Explained: • AMSGrad - Why Adam FAILS to Converge
Why neural networks can learn any function: • Why Neural Networks Can Learn Any Fun...
Why weight regularization reduces overfitting: • Why Weight Regularization Reduces Ove...
Contents
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 - Intro
00:29 - L2 Regularization vs Weight Decay
00:48 - SGD L2 vs Weight Decay
01:40 - SGD with momentum L2 vs Weight Decay
02:14 - Adam vs AdamW
03:13 - Outro
Follow Me
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
🐦 Twitter: @datamlistic / datamlistic
📸 Instagram: @datamlistic / datamlistic
📱 TikTok: @datamlistic / datamlistic
Channel Support
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
The best way to support the channel is to share the content. ;)
If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
► Patreon: / datamlistic
► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a
#adamw #adam #l2regularization #weightdecay
Смотрите видео AdamW Optimizer Explained | L2 Regularization vs Weight Decay онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь DataMListic 02 Январь 2023, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 9,029 раз и оно понравилось 165 людям.