QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)

Published: 27 February 2024
on channel: Shaw Talebi

72,314

2.4k

Get exclusive access to AI resources and project ideas: https://the-data-entrepreneurs.kit.co...

In this video, I discuss fine-tuning an LLM using QLoRA (i.e. Quantized Low-rank Adaptation). Example code is provided for training a custom YouTube comment responder using Mistral-7b-Instruct.

More Resources:
▶️ Series Playlist:    • Large Language Models (LLMs)
🎥 Fine-tuning with OpenAI:    • 3 Ways to Make a Custom AI Assistant ...

📰 Read more: https://medium.com/towards-data-scien...
💻 Colab: https://colab.research.google.com/dri...
💻 GitHub: https://github.com/ShawhinT/YouTube-B...
🤗 Model: https://huggingface.co/shawhin/shawgp...
🤗 Dataset: https://huggingface.co/datasets/shawh...

[1] Fine-tuning LLMs:    • Fine-tuning Large Language Models (LL...
[2] ZeRO paper: https://arxiv.org/abs/1910.02054
[3] QLoRA paper: https://arxiv.org/abs/2305.14314
[4] Phi-1 paper: https://arxiv.org/abs/2306.11644
[5] LoRA paper: https://arxiv.org/abs/2106.09685

--
Homepage: https://www.shawhintalebi.com/

Intro - 0:00
Fine-tuning (recap) - 0:45
LLMs are (computationally) expensive - 1:22
What is Quantization? - 4:49
4 Ingredients of QLoRA - 7:10
Ingredient 1: 4-bit NormalFloat - 7:28
Ingredient 2: Double Quantization - 9:54
Ingredient 3: Paged Optimizer - 13:45
Ingredient 4: LoRA - 15:40
Bringing it all together - 18:24
Example code: Fine-tuning Mistral-7b-Instruct for YT Comments - 20:35
What's Next? - 35:22

Watch video QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code) online without registration, duration hours minute second in high quality. This video was added by user Shaw Talebi 27 February 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 72,314 once and liked it 2.4 thousand people.

42,502

1.2K