Side by side comparison of Private LLM v1.7.6 and Ollama v0.1.25 running the same model (Mistral-7B-Instruct-v0.2). The benchmark was run on an M2 Max Mac Studio with 64GB of RAM. Private LLM and Ollama were run separately to prevent any resource contention. Also note that Private LLM is serving a high quality 4-bit OmniQuant quantized version of Mistral-7B-Instruct-v0.2, while Ollama is serving a baseline q4_0 RTN quantized version of the same model. Please refer to the OmniQuant paper for a comparison of the text generation quality between two quantization methods. https://arxiv.org/abs/2308.13137
Similar side by side performance comparison with the Mixtral 8x7B Instruct-v0.1 model:
• Private LLM vs Ollama with Mixtral 8x...
Watch video Private LLM vs Ollama with Mistral-7B-Instruct-v0.2 model performance comparison online without registration, duration hours minute second in high quality. This video was added by user Private LLM 20 February 2024, don't forget to share it with your friends and acquaintances, it has been viewed on our site 1,48 once and liked it people.