This video tutorial explains how Bi-Directional Attention works in NLP. This attention mechanism is very similar to Self-attention method which was introduced in the previous video. The main difference is that Bi-Directional attention do not have Masking operation, which Self-attention has.
Also, while self-attention look to the previous words (or tokens) only, the Bi-Directional attention looks to both sides. For this reason this attention called as Bi-Directional.
This video do not cover math for this method. This lesson explain the logic in a high level how Bi-Directional attention works. To implement this method, there are developed Python packages to do it.
Bi-Directional attention is widely used in BERT, which means: Bidirectional Encoder Representation from Transformers.
This is the 3rd video in the mini course about Attention in NLP. Check it out the previous ones:
1. Encoder-Decoder attention and Dot-Product: • ENCODER-DECODER Attention in NLP | Ho...
2. Self Attention: • SELF-ATTENTION in NLP | How does it w...
3. Bi-Directional Attention (this one).
4. Multi-Head attention (Upcoming).
You can read more about Bi-Directional Attention in the following sources:
Standford University: Bidirectional Attention Flow with Self-Attention: https://web.stanford.edu/class/archiv...
Medium.com article (BiDAF): https://towardsdatascience.com/the-de...
See you! - @DataScienceGarage
#attention #nlp #bidirectional #tokenizer #bert #selfattention #BiDAF #multihead #python #dotproduct #naturallanguageprocessing
Watch video BI-DIRECTIONAL ATTENTION | Explained in high level online without registration, duration hours minute second in high quality. This video was added by user Data Science Garage 23 October 2022, don't forget to share it with your friends and acquaintances, it has been viewed on our site 1,359 once and liked it 24 people.