Transformer - Part 8 - Decoder (3): Encoder-decoder self-attention

Published: 17 November 2020
on channel: Lennart Svensson

8,686

156

This is the third video about the transformer decoder and the final video introducing the transformer architecture. Here we mainly learn about the encoder-decoder multi-head self-attention layer, used to incorporate information from the encoder into the decoder. It should be noted that this layer is also commonly known as the cross-attention layer.

The video is part of a series of videos on the transformer architecture, https://arxiv.org/abs/1706.03762. You can find the complete series and a longer motivation here:
• A series of videos on the transformer

Slides are available here:
https://chalmersuniversity.box.com/s/...

Watch video Transformer - Part 8 - Decoder (3): Encoder-decoder self-attention online without registration, duration hours minute second in high quality. This video was added by user Lennart Svensson 17 November 2020, don't forget to share it with your friends and acquaintances, it has been viewed on our site 8,686 once and liked it 156 people.

9,414

262

00:00:00