The machine learning consultancy: https://truetheta.io
Join my email list to get educational and useful articles (and nothing else!): https://mailchi.mp/truetheta/true-the...
Want to work together? See here: https://truetheta.io/about/#want-to-w...
Neural Networks see something special in the softmax function.
SOCIAL MEDIA
LinkedIn : / dj-rich-90b91753
Twitter : / duanejrich
Github: https://github.com/Duane321
Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
SOURCE NOTES
I decided to make this video when inspecting jacobians/gradients starting from the end of a small network. Right near the softmax, the jacobian looked simple enough that I suspected interesting math behind it. And there was. I came across several excellent blogs on the Softmax's jacobian and its interaction with the negative log likelihood. Source [1] was the primary source, since it was quite well explained and used condensed notation. [2] was useful for understanding the broader context and [3] was a separate, thorough perspective.
SOURCES
[1] M. Peterson, "Softmax with cross-entropy," https://mattpetersen.github.io/softma..., 2017
[2] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016, section 6.2.2.3
[3] M. Lester James, "Understanding softmax and the negative log-likelihood," https://ljvmiranda921.github.io/noteb..., 2017
TIME CODES
0:00 Everyone uses the softmax
0:23 A Standard Explanation
3:20 But Why the Exponential Function?
3:57 The Broader Context
6:05 Two Choices Together
6:51 The Gradient
10:07 Other Reasons
Watch video Why Do Neural Networks Love the Softmax? online without registration, duration hours minute second in high quality. This video was added by user Mutual Information 07 June 2023, don't forget to share it with your friends and acquaintances, it has been viewed on our site 68,320 once and liked it 3.3 thousand people.