In the last episode we built self-attention but left out a key ingredient: position embeddings. On its own, self-attention doesn’t provide the model with any information about where in a sentence different words are in relation to one another. “John loves Mary,” “Mary loves John,” and “loves John Mary” all look the same to self-attention! That’s why we add in explicit information about the sequence order of words with positional embeddings.
Memorizing Transformers uses a special positional encoding scheme from the T5 paper called relative position bias. In this video we’ll refer back to the T5 paper and build out the special position embedding scheme line by line.
Links:
Link to Colab Notebook: https://colab.research.google.com/dri...
You can follow me on twitter: / nickcdryan
Check out the membership site for a full course version of the series (coming soon) and lots of other NLP content and code! https://www.chrismccormick.ai/membership
Chapters:
00:00 introduction
00:33 what is relative position bias?
01:57 T5 relative position bias
04:25 building a “vanilla” relative position matrix
09:23 first mask for exact indices
10:59 second mask for log scaled indices
13:30 creating a T5 relative position matrix
14:46 initialize the positional embedding weights
17:27 reshape embeddings for multihead self-attention
19:25 result: relative position embedding class
Watch video Coding a Paper - Ep. 4: Adding in Position Embeddings online without registration, duration hours minute second in high quality. This video was added by user ChrisMcCormickAI 15 February 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 843 once and liked it 39 people.