in this video, we go under the hood of the gemini and gemma-7b and gemma-2b tokenizer. we look at the large vocabulary and the impact that it has on the size of the model, and how Google has put a focus on people, places, culture, languages and things over efficient vocabulary and frequent sub-words. in this video chris introduced his new tokenizer benchmark test, dataset and tokenizer visualizer tools
github
---------------
https://github.com/chrishayuk/tokeniz...
Watch video How the Gemma/Gemini Tokenizer Works - Gemma/Gemini vs GPT-4 vs Mistral online without registration, duration hours minute second in high quality. This video was added by user Chris Hay 25 February 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 1,679 once and liked it 62 people.