In this video we talk about the sliding window attention, the diluted sliding window attention and the global+sliding window attention, as introduced in the Longformer paper. We take a look at the main disadvantage of the classical attention mechanism introduced in the Transformer paper (i.e. the quadratic time complexity) and how the sliding window attention proposes to solves this issue.
References
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Transformer Self-Attention Mechanism Explained: • Transformer Self-Attention Mechanism ...
"Longformer: The long-document transformer" paper: https://arxiv.org/abs/2004.05150
Related Videos
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
BART model explained: • BART Explained: Denoising Sequence-to...
Why Language Models Hallucinate: • Why Language Models Hallucinate
Grounding DINO, Open-Set Object Detection: • Object Detection Part 8: Grounding DI...
Detection Transformers (DETR), Object Queries: • Object Detection Part 7: Detection Tr...
Wav2vec2 A Framework for Self-Supervised Learning of Speech Representations - Paper Explained: • Wav2vec2 A Framework for Self-Supervi...
Transformer Self-Attention Mechanism Explained: • Transformer Self-Attention Mechanism ...
How to Fine-tune Large Language Models Like ChatGPT with Low-Rank Adaptation (LoRA): • How to Fine-tune Large Language Model...
Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained: • Multi-Head Attention (MHA), Multi-Que...
LLM Prompt Engineering with Random Sampling: Temperature, Top-k, Top-p: • LLM Prompt Engineering with Random Sa...
Contents
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
00:00 - Intro
00:26 - Original attention mechanism
00:50 - Sliding window attention
01:56 - Dilated sliding window attention
02:40 - Global + Sliding window attention
03:31 - Outro
Follow Me
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
🐦 Twitter: @datamlistic / datamlistic
📸 Instagram: @datamlistic / datamlistic
📱 TikTok: @datamlistic / datamlistic
Channel Support
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
The best way to support the channel is to share the content. ;)
If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary)
► Patreon: / datamlistic
► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq
► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281
► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5
► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a
#slidingwindowattention #longformer #attentionmechanism
Смотрите видео Sliding Window Attention (Longformer) Explained онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь DataMListic 04 Апрель 2024, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 2,259 раз и оно понравилось 82 людям.