Coding a Paper - Ep. 6: Adding XL Recurrence to Transformers

Published: 29 February 2024
on channel: ChrisMcCormickAI
750
27

In this episode we’re adding recurrence into Memorizing Transformers. Specifically, we’re implementing a type of recurrence as outlined in the Transformer-XL architecture.

Transformer-XL was one of the earliest attempts at solving the “long range dependencies” problem with the transformers architecture (the same problem we’re trying to address in Memorizing Transformers!), and they did this by incorporating recurrence between segments of a long document.

While the main contribution and strength of Memorizing Transformers is the kNN memory, the authors found that incorporating Transformer-XL’s recurrence strategy helped improve their performance a little further.

Before diving in to the implementation, we’ll provide a brief summary of recurrence, recurrent language models, and why transformers largely replaced them for most language modeling tasks.

Then we’ll examine the original Transformer-XL paper and implement a toy version of XL recurrence in order to understand the internals. After that we’ll take our existing multihead attention and KNN multihead attention classes and add XL recurrence into them.

Links:
Link to Colab Notebook: https://colab.research.google.com/dri...
You can follow me on twitter:   / nickcdryan  
Check out the membership site for a full course version of the series (coming soon) and lots of other NLP content and code! https://www.chrismccormick.ai/membership

Chapters:
00:00 introduction and recap
00:34 what are recurrent models?
02:55 why don't we use recurrent models today?
05:30 transformer XL description
10:33 transformer XL pseudocode
11:46 building transformer XL
17:40 causal language mask on recurrent information
21:30 finishing XL attention
24:00 XL attention class
29:59 KNN XL attention class
32:46 wrapping up and final steps for next episode


Watch video Coding a Paper - Ep. 6: Adding XL Recurrence to Transformers online without registration, duration hours minute second in high quality. This video was added by user ChrisMcCormickAI 29 February 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 750 once and liked it 27 people.