Join us as we explore cutting-edge techniques to optimize Large Language Models (LLMs) for inference! This event will dive into the tradeoffs between performance and cost in both LLMs and Small Language Models (SLMs). Learn how quantization, specifically Activate-aware Quantization (AWQ), compresses models while maintaining top-notch performance. We'll break down the findings from recent research and show you how to apply these techniques using Transformers. If you're interested in maximizing output while minimizing compute, this is an event you won't want to miss!
Event page: https://bit.ly/GPUOptimization
Have a question for a speaker? Drop them here:
https://app.sli.do/event/bArr6NPFLuhy...
Speakers:
Dr. Greg, Co-Founder & CEO AI Makerspace
/ gregloughane
The Wiz, Co-Founder & CTO AI Makerspace
/ csalexiuk
Apply for our new AI Engineering Bootcamp on Maven today!
https://bit.ly/aie1
For team leaders, check out!
https://aimakerspace.io/gen-ai-upskil...
Join our community to start building, shipping, and sharing with us today!
/ discord
How'd we do? Share your feedback and suggestions for future events.
https://forms.gle/ZTebEuDCY1n8J8gh9
Watch video Inference & GPU Optimization: AWQ online without registration, duration hours minute second in high quality. This video was added by user AI Makerspace 01 January 1970, don't forget to share it with your friends and acquaintances, it has been viewed on our site 46 once and liked it 3 people.