How to Train LLMs to "Think" (o1 & DeepSeek-R1)

Published: 17 February 2025
on channel: Shaw Talebi
9,118
352

🗞️ Get exclusive access to AI resources and project ideas: https://the-data-entrepreneurs.kit.co...
🧑‍🎓 Learn AI in 6 weeks by building it: https://maven.com/shaw-talebi/ai-buil...
--
Here, I discuss the technical details behind the recent “advanced reasoning” models trained on large-scale reinforcement learning i.e. o1 and DeepSeek-R1.

📰 Read more: https://shawhin.medium.com/how-to-tra...

References
[1] https://openai.com/index/learning-to-...
[2] arXiv:2501.12948 [cs.CL]
[3]    • Deep Dive into LLMs like ChatGPT  
[4] https://huggingface.co/datasets/open-...
[5] https://discovery.ucl.ac.uk/id/eprint...

Intro - 0:00
OpenAI's o1 - 0:33
Test-time Compute - 1:33
"Thinking" Tokens - 3:50
DeepSeek Paper - 5:58
Reinforcement Learning - 7:22
R1-Zero: Prompt Template - 9:28
R1-Zero: Reward - 10:53
R1-Zero: GRPO (technical) - 12:53
R1-Zero: Results - 20:00
DeepSeek R1 - 23:32
Step 1: SFT with CoT - 24:47
Step 2: R1-Zero Style RL - 26:14
Step 3: SFT with Mixed Data - 27:03
Step 4: RL & RLHF - 28:26
Accessing DeepSeek Models - 29:18
Conclusions - 30:10

Homepage: https://www.shawhintalebi.com/


Watch video How to Train LLMs to "Think" (o1 & DeepSeek-R1) online without registration, duration hours minute second in high quality. This video was added by user Shaw Talebi 17 February 2025, don't forget to share it with your friends and acquaintances, it has been viewed on our site 9,118 once and liked it 352 people.