Want to learn more? I’m launching a 6-week live BootCamp for AI Builders.
👉 Learn more: https://maven.com/s/course/13437a45a7
Save 50% at checkout with the code FOUNDER50
This is the 6th video in a series on using large language models (LLMs) in practice. Here, I review key aspects of developing a foundation LLM based on the development of models such as GPT-3, Llama, Falcon, and beyond.
More Resources:
▶️ Series Playlist: https://www.youtube.com/playlist?list... Read more: https://towardsdatascience.com/how-to...
[1] BloombergGPT: https://arxiv.org/pdf/2303.17564.pdf
[2] Llama 2: https://ai.meta.com/research/publicat...
[3] LLM Energy Costs: https://www.statista.com/statistics/1...
[4] arXiv:2005.14165 [cs.CL]
[5] Falcon 180b Blog: https://huggingface.co/blog/falcon-180b
[6] arXiv:2101.00027 [cs.CL]
[7] Alpaca Repo: https://github.com/gururise/AlpacaDat...
[8] arXiv:2303.18223 [cs.CL]
[9] arXiv:2112.11446 [cs.CL]
[10] arXiv:1508.07909 [cs.CL]
[11] SentencePience: https://github.com/google/sentencepie...
[12] Tokenizers Doc: https://huggingface.co/docs/tokenizer...
[13] arXiv:1706.03762 [cs.CL]
[14] Andrej Karpathy Lecture: • Let's build GPT: from scratch, in cod...
[15] Hugging Face NLP Course: https://huggingface.co/learn/nlp-cour...
[16] arXiv:1810.04805 [cs.CL]
[17] arXiv:1910.13461 [cs.CL]
[18] arXiv:1603.05027 [cs.CV]
[19] arXiv:1607.06450 [stat.ML]
[20] arXiv:1803.02155 [cs.CL]
[21] arXiv:2203.15556 [cs.CL]
[22] Trained with Mixed Precision Nvidia: https://docs.nvidia.com/deeplearning/...
[23] DeepSpeed Doc: https://www.deepspeed.ai/training/
[24] https://paperswithcode.com/method/wei...
[25] https://towardsdatascience.com/what-i...
[26] arXiv:2001.08361 [cs.LG]
[27] arXiv:1803.05457 [cs.AI]
[28] arXiv:1905.07830 [cs.CL]
[29] arXiv:2009.03300 [cs.CY]
[30] arXiv:2109.07958 [cs.CL]
[31] https://huggingface.co/blog/evaluatin...
[32] https://www.cs.toronto.edu/~hinton/ab...
--
Homepage: https://shawhintalebi.com/
Book a call: https://calendly.com/shawhintalebi
Intro - 0:00
How much does it cost? - 1:30
4 Key Steps - 3:55
Step 1: Data Curation - 4:19
1.1: Data Sources - 5:31
1.2: Data Diversity - 7:45
1.3: Data Preparation - 9:06
Step 2: Model Architecture (Transformers) - 13:17
2.1: 3 Types of Transformers - 15:13
2.2: Other Design Choices - 18:27
2.3: How big do I make it? - 22:45
Step 3: Training at Scale - 24:20
3.1: Training Stability - 26:52
3.2: Hyperparameters - 28:06
Step 4: Evaluation - 29:14
4.1: Multiple-choice Tasks - 30:22
4.2: Open-ended Tasks - 32:59
What's next? - 34:31
Watch video How to Build an LLM from Scratch | An Overview online without registration, duration hours minute second in high quality. This video was added by user Shaw Talebi 05 October 2023, don't forget to share it with your friends and acquaintances, it has been viewed on our site 258,140 once and liked it 5.9 thousand people.