Why Do Neural Networks Love the Softmax?

Published: 07 June 2023
on channel: Mutual Information
68,320
3.3k

The machine learning consultancy: https://truetheta.io
Join my email list to get educational and useful articles (and nothing else!): https://mailchi.mp/truetheta/true-the...
Want to work together? See here: https://truetheta.io/about/#want-to-w...

Neural Networks see something special in the softmax function.

SOCIAL MEDIA

LinkedIn :   / dj-rich-90b91753  
Twitter :   / duanejrich  
Github: https://github.com/Duane321

Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon:   / mutualinformation  

SOURCE NOTES

I decided to make this video when inspecting jacobians/gradients starting from the end of a small network. Right near the softmax, the jacobian looked simple enough that I suspected interesting math behind it. And there was. I came across several excellent blogs on the Softmax's jacobian and its interaction with the negative log likelihood. Source [1] was the primary source, since it was quite well explained and used condensed notation. [2] was useful for understanding the broader context and [3] was a separate, thorough perspective.

SOURCES

[1] M. Peterson, "Softmax with cross-entropy," https://mattpetersen.github.io/softma..., 2017

[2] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016, section 6.2.2.3

[3] M. Lester James, "Understanding softmax and the negative log-likelihood," https://ljvmiranda921.github.io/noteb..., 2017

TIME CODES
0:00 Everyone uses the softmax
0:23 A Standard Explanation
3:20 But Why the Exponential Function?
3:57 The Broader Context
6:05 Two Choices Together
6:51 The Gradient
10:07 Other Reasons


Watch video Why Do Neural Networks Love the Softmax? online without registration, duration hours minute second in high quality. This video was added by user Mutual Information 07 June 2023, don't forget to share it with your friends and acquaintances, it has been viewed on our site 68,320 once and liked it 3.3 thousand people.