This is the third video about the transformer decoder and the final video introducing the transformer architecture. Here we mainly learn about the encoder-decoder multi-head self-attention layer, used to incorporate information from the encoder into the decoder. It should be noted that this layer is also commonly known as the cross-attention layer.
The video is part of a series of videos on the transformer architecture, https://arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here:
• A series of videos on the transformer
Slides are available here:
https://chalmersuniversity.box.com/s/...
Watch video Transformer - Part 8 - Decoder (3): Encoder-decoder self-attention online without registration, duration hours minute second in high quality. This video was added by user Lennart Svensson 17 November 2020, don't forget to share it with your friends and acquaintances, it has been viewed on our site 8,686 once and liked it 156 people.