Picovoice Wordmark
Start Building
Introduction
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryPicovoice picoLLMGPTQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustWeb
SummaryPicovoice LeopardAmazon TranscribeAzure Speech-to-TextGoogle ASRGoogle ASR (Enhanced)IBM Watson Speech-to-TextWhisper Speech-to-Text
FAQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustWeb
SummaryPicovoice Cheetah
FAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryAmazon PollyAzure TTSElevenLabsOpenAI TTSPicovoice Orca
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice KoalaMozilla RNNoise
Introduction
AndroidCiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidCNode.jsPythoniOSWeb
SummaryPicovoice EaglepyannoteSpeechBrainWeSpeaker
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice FalconAmazon TranscribeAzure Speech-to-TextGoogle Speech-to-Textpyannote
Introduction
AndroidArduinoCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSMicrocontrollerNode.jsPythonRaspberry PiReactReact NativeRustSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaMicrocontrollerNode.jsPythonReactReact NativeRustUnityWeb
SummaryPorcupineSnowboyPocketSphinx
Wake Word TipsFAQ
Introduction
AndroidCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustUnityWeb
SummaryPicovoice RhinoGoogle DialogflowAmazon LexIBM WatsonMicrosoft LUIS
Expression SyntaxFAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiRustWebWindows
AndroidC.NETiOSNode.jsPythonRustWeb
SummaryPicovoice CobraWebRTC VAD
FAQ
Introduction
AndroidC.NETFlutteriOSNode.jsPythonReact NativeRustUnityWeb
AndroidC.NETFlutteriOSNode.jsPythonReact NativeRustUnityWeb
Introduction
C.NETNode.jsPython
C.NETNode.jsPython
FAQGlossary

LLM Compression Benchmark

Quantizing large language models (LLMs) is crucial for reducing their size and memory usage while preserving quality. This compression technique enables deploying advanced models on devices with limited computational capabilities. This benchmark evaluates the performance of Picovoice picoLLM in comparison to GPTQ, a state-of-the-art LLM compression method, across various metrics.

Methodology

Algorithms

We use the following algorithms to compress LLMs:

  • GPTQ is a popular quantization algorithm that fully reconstructs weights to closely mimic the full-precision model.
  • picoLLM Compression is Picovoice's in-house LLM compression algorithm. Given a target size, picoLLM optimally distributes available bits within and across LLM's weights.

Tasks

We evaluate the performance of GPTQ and picoLLM on the following tasks:

  • MMLU (Massively Multilingual Language Understanding) is a multiple-choice dataset that can measure the model's ability to understand natural language.
  • ARC (AI2 Reasoning Challenge) is a multiple-choice dataset that measures the models' reasoning ability.
  • Perplexity is an evaluation metric that measures the quality of language models. C4 is used to evaluate the perplexity of the models.

Models

We evaluate the performance of GPTQ and picoLLM on the following models:

  • Gemma-2b
  • Gemma-7b
  • Llama-2-7b
  • Llama-3-8b
  • Mistral-7b-v0.1
  • Phi-2

Results

The figures below depict the accuracy of each compression engine across various models for MMLU, ARC, and perplexity.

MMLU Score Comparison

MMLU Score ComparisonMMLU Score Comparison

ARC Score Comparison

ARC Score ComparisonARC Score Comparison

Perplexity Comparison

C4 Perplexity ComparisonC4 Perplexity Comparison

Usage

The data and code used to create this benchmark are available on GitHub under the permissive Apache 2.0 license. Detailed instructions for benchmarking individual engines are provided in the following documents:

  • GPTQ Compression Benchmark
  • Picovoice picoLLM Compression Benchmark

Was this doc helpful?

Issue with this doc?

Report a GitHub Issue
LLM Compression Benchmark
  • Methodology
  • Algorithms
  • Tasks
  • Models
  • Results
  • MMLU Score Comparison
  • ARC Score Comparison
  • Perplexity Comparison
  • Usage
Voice AI
  • Leopard Speech-to-Text
  • Cheetah Streaming Speech-to-Text
  • Orca Text-to-Speech
  • Koala Noise Suppression
  • Eagle Speaker Recognition
  • Falcon Speaker Diarization
  • Porcupine Wake Word
  • Rhino Speech-to-Intent
  • Cobra Voice Activity Detection
Local LLM
  • picoLLM Inference
  • picoLLM Compression
  • picoLLM GYM
Resources
  • Docs
  • Console
  • Blog
  • Use Cases
  • Playground
Sales & Services
  • Consulting
  • Foundation Plan
  • Enterprise Plan
  • Enterprise Support
Company
  • About us
  • Careers
Follow Picovoice
  • LinkedIn
  • GitHub
  • X
  • YouTube
  • AngelList
Subscribe to our newsletter
Terms of Use
Privacy Policy
© 2019-2025 Picovoice Inc.