ChatGPT has Never Seen a SINGLE Word (Despite Reading Most of The Internet). Meet LLM Tokenizers.

Published: 26 July 2023
on channel: Jay Alammar

11,085

483

Despite processing internet-scale text data, large language models never see words as we do. Yes, they consume text, but another piece of software called a tokenizer is what actually takes in the text and translates it into a different format that the language model actually operates on. In this video, Jay goes examines a language model tokenizer to give you a sense of how they work.

Follow our upcoming book, Hands-On Large Language Models, for more details about tokenizers and LLMs in general.
Updates on the book coming on https://jayalammar.substack.com/
My co-author: / maartengr / https://maartengrootendorst.substack....
Early access on https://www.oreilly.com/library/view/...

---

Twitter: / jayalammar
Blog: https://jalammar.github.io/
Mailing List: https://jayalammar.substack.com/

---

0:00 Introduction
0:41 We're writing: Hands-On Large Language Models
1:13 Generating text with ChatGPT Cohere Command
2:42 Looking at the generation code
5:03 What is the actual input to a language model?
7:14 What is the actual output of a language model generate?
7:50 The tokenizer's lookup table and embeddings inside a model
9:07 Looking at the model, tokenizer
12:27 Summary

Watch video ChatGPT has Never Seen a SINGLE Word (Despite Reading Most of The Internet). Meet LLM Tokenizers. online without registration, duration hours minute second in high quality. This video was added by user Jay Alammar 26 July 2023, don't forget to share it with your friends and acquaintances, it has been viewed on our site 11,085 once and liked it 483 people.

14,119

901