In this video, we'll run a state of the art LLM on your laptop and create a webpage you can use to interact with it. All in about 5 minutes. Seriously!
We'll be using Llama.cpp's python bindings to run the LLM on our machine and Gradio to build the webpage.
Resources mentioned in the video:
Llama.cpp python: https://github.com/abetlen/llama-cpp-...
Gradio: https://github.com/gradio-app/gradio
Qwen-2 0.5B Instruct Model (GGUF): https://huggingface.co/Qwen/Qwen2-0.5...
Llama.cpp's chat completion API: https://github.com/abetlen/llama-cpp-...
Gradio Chatbot Guide: https://www.gradio.app/guides/creatin...
Watch video Blazing Fast Local LLM Web Apps With Gradio and Llama.cpp online without registration, duration hours minute second in high quality. This video was added by user HuggingFace 26 June 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 2,344 once and liked it 55 people.