Learn best practices for multimodal prompting using Google's Gemini model family!

Опубликовано: 28 Август 2024
на канале: DeepLearningAI

2,309

Enroll now: https://bit.ly/3YPKAUa

Introducing Large Multimodal Model Prompting with Gemini, a new short course built in collaboration with Google Cloud, and taught by Erwin Huizenga, Developer Advocate for Generative AI at Google Cloud.

Large Multimodal Models (LMMs) represent a significant evolution from language models by integrating different data modalities, allowing for more comprehensive outputs based on varied input types such as text, images, and video.

For LMMs, prompt structure becomes even more important. For example, placing text inputs, such as a patient’s medical history, before image inputs like an X-ray, can improve the model’s interpretation. Conversely, for tasks like image captioning, leading with the image may yield better results. In this course, you'll explore best practices for multimodal prompting, and learn how to properly set parameters for optimized results.

Additionally, you’ll learn how to integrate Gemini with external APIs and databases using function calling, with the ability to infuse your applications with real-time data and dynamic content.

In detail, you’ll explore:

Introduction to Gemini Models: Learn the differences and use cases for Gemini Nano, Pro, Flash, and Ultra. Understand how to select optimal models based on capability, latency, and cost.

Multimodal Prompting and Parameter Control: Learn techniques for structuring effective text-image-video prompts. Fine-tune key parameters like temperature, top_p, top_k to control model creativity vs determinism.

Best Practices for Multimodal Prompting: Get hands-on experience with prompt engineering for Gemini multimodal models, and role assignment, task decomposition, and formatting.

Creating Use Cases with Images: Build engaging multimodal applications like interior design assistants and receipt itemization tools.

Developing Use Cases with Videos: Implement "needle in the haystack" semantic video search powered by Gemini's large context window.

Integrating Real-Time Data with Function Calling: Extend Gemini with external knowledge and live data via function calling and API integration.

Start building advanced AI applications that can reason across multiple data modalities today!

Note that due to technical requirements, this course features downloadable-only notebooks on the learning platform. You are free to download, review, and run these notebooks on your own.

Learn more: https://bit.ly/3YPKAUa

Смотрите видео Learn best practices for multimodal prompting using Google's Gemini model family! онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь DeepLearningAI 28 Август 2024, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 2,309 раз и оно понравилось 50 людям.

16,668

239