How to Train LLMs to "Think" (o1 & DeepSeek-R1)

Опубликовано: 17 Февраль 2025
на канале: Shaw Talebi

9,118

352

🗞️ Get exclusive access to AI resources and project ideas: https://the-data-entrepreneurs.kit.co...
🧑‍🎓 Learn AI in 6 weeks by building it: https://maven.com/shaw-talebi/ai-buil...
--
Here, I discuss the technical details behind the recent “advanced reasoning” models trained on large-scale reinforcement learning i.e. o1 and DeepSeek-R1.

📰 Read more: https://shawhin.medium.com/how-to-tra...

References
[1] https://openai.com/index/learning-to-...
[2] arXiv:2501.12948 [cs.CL]
[3] • Deep Dive into LLMs like ChatGPT
[4] https://huggingface.co/datasets/open-...
[5] https://discovery.ucl.ac.uk/id/eprint...

Intro - 0:00
OpenAI's o1 - 0:33
Test-time Compute - 1:33
"Thinking" Tokens - 3:50
DeepSeek Paper - 5:58
Reinforcement Learning - 7:22
R1-Zero: Prompt Template - 9:28
R1-Zero: Reward - 10:53
R1-Zero: GRPO (technical) - 12:53
R1-Zero: Results - 20:00
DeepSeek R1 - 23:32
Step 1: SFT with CoT - 24:47
Step 2: R1-Zero Style RL - 26:14
Step 3: SFT with Mixed Data - 27:03
Step 4: RL & RLHF - 28:26
Accessing DeepSeek Models - 29:18
Conclusions - 30:10

Homepage: https://www.shawhintalebi.com/

Смотрите видео How to Train LLMs to "Think" (o1 & DeepSeek-R1) онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Shaw Talebi 17 Февраль 2025, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 9,118 раз и оно понравилось 352 людям.

189,921

1.7 тыс