Multimodal AI: LLMs that can see (and hear)

Опубликовано: 20 Ноябрь 2024
на канале: Shaw Talebi

5,362

223

🗞️ Get exclusive access to AI resources and project ideas: https://the-data-entrepreneurs.kit.co...
🧑‍🎓 Learn AI in 6 weeks by building it: https://maven.com/shaw-talebi/ai-buil...
--
Multimodal (Large) Language Models expand an LLM's text-only capabilities to include other modalities. Here are three ways to do this.

Resources:
📰 Blog: https://towardsdatascience.com/multim...
▶️ LLM Playlist: • Fine-tuning Large Language Models (LL...
💻 GitHub Repo: https://github.com/ShawhinT/YouTube-B...

References:
[1] Multimodal Machine Learning: https://arxiv.org/abs/1705.09406
[2] A Survey on Multimodal Large Language Models: https://arxiv.org/abs/2306.13549
[3] Visual Instruction Tuning: https://arxiv.org/abs/2304.08485
[4] GPT-4o System Card: https://arxiv.org/abs/2410.21276
[5] Janus: https://arxiv.org/abs/2410.13848
[6] Learning Transferable Visual Models From Natural Language Supervision: https://arxiv.org/abs/2103.00020
[7] Flamingo: https://arxiv.org/abs/2204.14198
[8] Mini-Omni2: https://arxiv.org/abs/2410.11190
[9] Emu3: https://arxiv.org/abs/2409.18869
[10] Chameleon: https://arxiv.org/abs/2405.09818

--
Homepage: https://www.shawhintalebi.com

Introduction - 0:00
Multimodal LLMs - 1:49
Path 1: LLM + Tools - 4:24
Path 2: LLM + Adapaters - 7:20
Path 3: Unified Models - 11:19
Example: LLaMA 3.2 for Vision Tasks (Ollama) - 13:24
What's next? - 19:58

Смотрите видео Multimodal AI: LLMs that can see (and hear) онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Shaw Talebi 20 Ноябрь 2024, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 5,362 раз и оно понравилось 223 людям.

189,921

1.7 тыс