Multi-Modal RAG: Chat with Text and Images in Documents

Published: 12 July 2024
on channel: Prompt Engineering
10,463
312

In this video, I'll show you how to build an end-to-end multi-modal RAG system using GPT-4 and LLAMA Index. We'll cover data collection, creating vector stores for text and images, and building a retrieval pipeline. Perfect for those interested in enhancing large language models with multi-modal data.

LINKS:
Colabl: https://tinyurl.com/25sb2rtu
Architecture: https://tinyurl.com/4x9x9bsc
Multi-modal RAG - Previous Video:    • Multi-modal RAG: Chat with Docs conta...  

💻 RAG Beyond Basics Course:
https://prompt-s-site.thinkific.com/c...

Let's Connect:
🦾 Discord:   / discord  
☕ Buy me a Coffee: https://ko-fi.com/promptengineering
|🔴 Patreon:   / promptengineering  
💼Consulting: https://calendly.com/engineerprompt/c...
📧 Business Contact: [email protected]
Become Member: http://tinyurl.com/y5h28s6h

💻 Pre-configured localGPT VM: https://bit.ly/localGPT (use Code: PromptEngineering for 50% off).

Signup for Newsletter, localgpt:
https://tally.so/r/3y9bb0

TIMESTAMPS:
00:00 Introduction to Multi-Modal RAG Systems
00:23 Overview of the Architecture
02:57 Setting Up the Environment
03:54 Data Collection and Preparation
04:28 Generating Image Descriptions with GPT-4
08:10 Creating Multi-Modal Vector Stores
09:41 Implementing the Retrieval Pipeline
11:05 Generating Final Responses


All Interesting Videos:
Everything LangChain:    • LangChain  

Everything LLM:    • Large Language Models  

Everything Midjourney:    • MidJourney Tutorials  

AI Image Generation:    • AI Image Generation Tutorials  


Watch video Multi-Modal RAG: Chat with Text and Images in Documents online without registration, duration hours minute second in high quality. This video was added by user Prompt Engineering 12 July 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 10,463 once and liked it 312 people.