Self-Attention Equations - Math + Illustrations

Published: 23 September 2022
on channel: ChrisMcCormickAI
4,785
161

I created this video as supplemental material for my new video course on Decoder-based Transformer models such as GPT-3. Check out the course here!
https://www.chrismccormick.ai/the-inn...

==== Overview ====
The mathematical equations for Multi-Headed Attention can be a little daunting, given the number of steps and variables! They’re certainly a difficult place to start in trying to understand the algorithm for the first time.

In this tutorial, I’ll walk through an illustrated explanation of Multi-Headed Attention, but also show how each step maps to the original equations.

==== Pre-Reqs ====
This video assumes some familiarity with Self-Attention in Transformer models, so if you're brand new to those concepts you probably want to start with something like my GPT course linked above, or my "BERT Research" series on YouTube, to provide all of the context that you'll need.

==== Added Insights ====
I’ll also share some new perspectives on what Attention is doing, based on a couple “Bertology” papers I’ve studied, plus my own interpretation of the math. These insights are particularly helpful, I believe, for understanding how Attention can be applied in ways outside of just the Self-Attention mechanism in NLP language models like BERT and GPT.

==== Student Discount ====
As mentioned in the video, students / low income learners can apply for financial aid here:
https://www.chrismccormick.ai/student...


Watch video Self-Attention Equations - Math + Illustrations online without registration, duration hours minute second in high quality. This video was added by user ChrisMcCormickAI 23 September 2022, don't forget to share it with your friends and acquaintances, it has been viewed on our site 4,785 once and liked it 161 people.