Coding a Paper - Ep. 5: Adding KNN memory to transformers

Опубликовано: 22 Февраль 2024
на канале: ChrisMcCormickAI
915
35

In this episode it’s time to finally add memory to Memorizing Transformers! This is the crux of the paper, so there’s lots to do but we will take it step by step.

The Memorizing Transformers model was created to address the problem of long documents that don’t fit within a single context window. It addresses this by storing information (specifically, the key and value projects from attention) in its memory as it reads through the document.

While interpreting the current context window, it retrieves the most relevant memories to incorporate that older information.

This retrieval piece is done using the k-Nearest Neighbors algorithm, a.k.a. vector similarity search.

To implement this feature, we’ll start by looking at Meta’s faiss library (pronounced “face” 😏) for efficient vector similarity search. We’ll do a quick tour of the library and discuss general design considerations for trading off speed, memory, and accuracy of search results.

To actually store all of these memory vectors, we’ll build a memory-mapped database with numpy.

Finally, we’ll wrap it all together to create a kNN multihead-attention class!

Integrating the memory vectors into the attention calculations is a bewildering mess of tensor shapes, so we’ll introduce the ‘einops’ and ‘einsum’ functions and see how much easier these tools make that task–they’re great techniques to learn as an ML engineer!

And voila! - we’ve finished one of the most important pieces of our implementation.

Links:
Link to Colab Notebook: https://colab.research.google.com/dri...
You can follow me on twitter:   / nickcdryan  
Check out the membership site for a full course version of the series (coming soon) and lots of other NLP content and code! https://www.chrismccormick.ai/membership


Chapters:
00:00 intro and refresher on KNN attention
02:08 faiss introduction
05:10 faiss / vector database considerations for your application
06:12 faiss quick tour
09:58 add database
12:38 recipe for our KNN memory class
13:53 adding to index and database
20:35 search and retrieval
25:45 clearing / removing / simplified scheme
29:50 demoing our KNN class
33:04 integrating our KNN into attention
36:13 einsum / einops
39:10 rearrange and einsum attention
41:43 building KNN attention
46:10 combining local and memory attention
49:47 KNN attention class


Смотрите видео Coding a Paper - Ep. 5: Adding KNN memory to transformers онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь ChrisMcCormickAI 22 Февраль 2024, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 915 раз и оно понравилось 35 людям.