The matrix math behind transformer neural networks, one step at a time!!!

Published: 07 April 2024
on channel: StatQuest with Josh Starmer
61,293
1.3k

Transformers, the neural network architecture behind ChatGPT, do a lot of math. However, this math can be done quickly using matrix math because GPUs are optimized for it. Matrix math is also used when we code neural networks, so learning how ChatGPT does it will help you code your own. Thus, in this video, we go through the math one step at a time and explain what each step does so that you can use it on your own with confidence.

NOTE: This StatQuest assumes that you are already familiar with:
Transformers:    • Transformer Neural Networks, ChatGPT'...  
The essential matrix algebra for neural networks:    • Decoder-Only Transformers, ChatGPTs s...  

If you'd like to support StatQuest, please consider...
Patreon:   / statquest  
...or...
YouTube Membership:    / @statquest  

...buying my book, a study guide, a t-shirt or hoodie, or a song from the StatQuest store...
https://statquest.org/statquest-store/

...or just donating to StatQuest!
paypal: https://www.paypal.me/statquest
venmo: @JoshStarmer

Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter:
  / joshuastarmer  

0:00 Awesome song and introduction
1:43 Word Embedding
3:37 Position Encoding
4:28 Self Attention
12:09 Residual Connections
13:08 Decoder Word Embedding and Position Encoding
15:33 Masked Self Attention
20:18 Encoder-Decoder Attention
21:31 Fully Connected Layer
22:16 SoftMax

#StatQuest #Transformer #ChatGPT


Watch video The matrix math behind transformer neural networks, one step at a time!!! online without registration, duration hours minute second in high quality. This video was added by user StatQuest with Josh Starmer 07 April 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 61,293 once and liked it 1.3 thousand people.