Inference & GPU Optimization: AWQ

Published: 01 January 1970
on channel: AI Makerspace

462

Join us as we explore cutting-edge techniques to optimize Large Language Models (LLMs) for inference! This event will dive into the tradeoffs between performance and cost in both LLMs and Small Language Models (SLMs). Learn how quantization, specifically Activate-aware Quantization (AWQ), compresses models while maintaining top-notch performance. We'll break down the findings from recent research and show you how to apply these techniques using Transformers. If you're interested in maximizing output while minimizing compute, this is an event you won't want to miss!

Event page: https://bit.ly/GPUOptimization

Have a question for a speaker? Drop them here:
https://app.sli.do/event/bArr6NPFLuhy...

Speakers:
Dr. Greg, Co-Founder & CEO AI Makerspace
  / gregloughane

The Wiz, Co-Founder & CTO AI Makerspace
  / csalexiuk

Apply for our new AI Engineering Bootcamp on Maven today!
https://bit.ly/aie1

For team leaders, check out!
https://aimakerspace.io/gen-ai-upskil...

Join our community to start building, shipping, and sharing with us today!
  / discord

How'd we do? Share your feedback and suggestions for future events.
https://forms.gle/ZTebEuDCY1n8J8gh9

Watch video Inference & GPU Optimization: AWQ online without registration, duration hours minute second in high quality. This video was added by user AI Makerspace 01 January 1970, don't forget to share it with your friends and acquaintances, it has been viewed on our site 46 once and liked it 3 people.