Picovoice Wordmark
Start Building
Introduction
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryPicovoice picoLLMGPTQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeWeb
SummaryPicovoice LeopardAmazon TranscribeAzure Speech-to-TextGoogle ASRGoogle ASR (Enhanced)IBM Watson Speech-to-TextWhisper Speech-to-Text
FAQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeWeb
SummaryPicovoice CheetahAzure Real-Time Speech-to-TextAmazon Transcribe StreamingGoogle Streaming ASR
FAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryAmazon PollyAzure TTSElevenLabsOpenAI TTSPicovoice Orca
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice KoalaMozilla RNNoise
Introduction
AndroidCiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidCNode.jsPythoniOSWeb
SummaryPicovoice EaglepyannoteSpeechBrainWeSpeaker
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice FalconAmazon TranscribeAzure Speech-to-TextGoogle Speech-to-Textpyannote
Introduction
AndroidArduinoCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSMicrocontrollerNode.jsPythonRaspberry PiReactReact NativeSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaMicrocontrollerNode.jsPythonReactReact NativeUnityWeb
SummaryPorcupineSnowboyPocketSphinx
Wake Word TipsFAQ
Introduction
AndroidCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeUnityWeb
SummaryPicovoice RhinoGoogle DialogflowAmazon LexIBM WatsonMicrosoft LUIS
Expression SyntaxFAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryPicovoice CobraWebRTC VADSilero VAD
FAQ
Introduction
AndroidC.NETFlutteriOSNode.jsPythonReact NativeUnityWeb
AndroidC.NETFlutteriOSNode.jsPythonReact NativeUnityWeb
Introduction
C.NETNode.jsPython
C.NETNode.jsPython
FAQGlossary

LLM Compression Benchmark

Quantizing large language models (LLMs) is crucial for reducing their size and memory usage while preserving quality. This compression technique enables deploying advanced models on devices with limited computational capabilities. This benchmark evaluates the performance of Picovoice picoLLM in comparison to GPTQ, a state-of-the-art LLM compression method, across various metrics.

Methodology

Algorithms

We use the following algorithms to compress LLMs:

  • GPTQ is a popular quantization algorithm that fully reconstructs weights to closely mimic the full-precision model.
  • picoLLM Compression is Picovoice's in-house LLM compression algorithm. Given a target size, picoLLM optimally distributes available bits within and across LLM's weights.

Tasks

We evaluate the performance of GPTQ and picoLLM on the following tasks:

  • MMLU (Massively Multilingual Language Understanding) is a multiple-choice dataset that can measure the model's ability to understand natural language.
  • ARC (AI2 Reasoning Challenge) is a multiple-choice dataset that measures the models' reasoning ability.
  • Perplexity is an evaluation metric that measures the quality of language models. C4 is used to evaluate the perplexity of the models.

Models

We evaluate the performance of GPTQ and picoLLM on the following models:

  • Gemma-2b
  • Gemma-7b
  • Llama-2-7b
  • Llama-3-8b
  • Mistral-7b-v0.1
  • Phi-2

Results

The figures below depict the accuracy of each compression engine across various models for MMLU, ARC, and perplexity.

MMLU Score Comparison

ARC Score Comparison

Perplexity Comparison

Usage

The data and code used to create this benchmark are available on GitHub under the permissive Apache 2.0 license. Detailed instructions for benchmarking individual engines are provided in the following documents:

  • GPTQ Compression Benchmark
  • Picovoice picoLLM Compression Benchmark

Was this doc helpful?

Issue with this doc?

Report a GitHub Issue
LLM Compression Benchmark
  • Methodology
  • Algorithms
  • Tasks
  • Models
  • Results
  • MMLU Score Comparison
  • ARC Score Comparison
  • Perplexity Comparison
  • Usage
Voice AI
  • picoLLM On-Device LLM
  • Leopard Speech-to-Text
  • Cheetah Streaming Speech-to-Text
  • Orca Text-to-Speech
  • Koala Noise Suppression
  • Eagle Speaker Recognition
  • Falcon Speaker Diarization
  • Porcupine Wake Word
  • Rhino Speech-to-Intent
  • Cobra Voice Activity Detection
Resources
  • Docs
  • Console
  • Blog
  • Use Cases
  • Playground
Sales & Services
  • Consulting
  • Foundation Plan
  • Enterprise Plan
  • Enterprise Support
Company
  • About us
  • Careers
Follow Picovoice
  • LinkedIn
  • GitHub
  • X
  • YouTube
  • AngelList
Subscribe to our newsletter
Terms of Use
Privacy Policy
© 2019-2025 Picovoice Inc.