Picovoice Wordmark
Start Building
Introduction
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryPicovoice picoLLMGPTQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustWeb
SummaryPicovoice LeopardAmazon TranscribeAzure Speech-to-TextGoogle ASRGoogle ASR (Enhanced)IBM Watson Speech-to-TextWhisper Speech-to-Text
FAQ
Introduction
AndroidC.NETFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustWeb
SummaryPicovoice Cheetah
FAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidC.NETiOSNode.jsPythonWeb
SummaryAmazon PollyAzure TTSElevenLabsOpenAI TTSPicovoice Orca
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice KoalaMozilla RNNoise
Introduction
AndroidCiOSLinuxmacOSNode.jsPythonRaspberry PiWebWindows
AndroidCNode.jsPythoniOSWeb
SummaryPicovoice EaglepyannoteSpeechBrainWeSpeaker
Introduction
AndroidCiOSLinuxmacOSPythonRaspberry PiWebWindows
AndroidCiOSPythonWeb
SummaryPicovoice FalconAmazon TranscribeAzure Speech-to-TextGoogle Speech-to-Textpyannote
Introduction
AndroidArduinoCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSMicrocontrollerNode.jsPythonRaspberry PiReactReact NativeRustSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaMicrocontrollerNode.jsPythonReactReact NativeRustUnityWeb
SummaryPorcupineSnowboyPocketSphinx
Wake Word TipsFAQ
Introduction
AndroidCChrome.NETEdgeFirefoxFlutteriOSJavaLinuxmacOSNode.jsPythonRaspberry PiReactReact NativeRustSafariUnityWebWindows
AndroidC.NETFlutteriOSJavaNode.jsPythonReactReact NativeRustUnityWeb
SummaryPicovoice RhinoGoogle DialogflowAmazon LexIBM WatsonMicrosoft LUIS
Expression SyntaxFAQ
Introduction
AndroidC.NETiOSLinuxmacOSNode.jsPythonRaspberry PiRustWebWindows
AndroidC.NETiOSNode.jsPythonRustWeb
SummaryPicovoice CobraWebRTC VAD
FAQ
Introduction
AndroidC.NETFlutteriOSNode.jsPythonReact NativeRustUnityWeb
AndroidC.NETFlutteriOSNode.jsPythonReact NativeRustUnityWeb
Introduction
C.NETNode.jsPython
C.NETNode.jsPython
FAQGlossary

Natural Language Understanding Benchmark

Understanding voice command is the core functionality of a voice assistant. The dominant approach breaks this task into speech-to-text (STT) and natural language understanding (NLU). Amazon Lex, Google Dialogflow, IBM Watson, and Microsoft LUIS use this strategy. Picovoice’s Rhino Speech-to-Intent takes a different approach. It fuses these two steps and builds an end-to-end model that directly infers meaning, including intent and entities, from voice commands. We claim that this end-to-end approach gives a significant accuracy boost. Also, it massively reduces the operating cost of voicebots and voice user interfaces (VUI).

Below is a series of benchmarks to foster these claims and track their validity over time. The benchmark is open-source and reproducible.

Methodology

Speech Corpus

We consider a voice-enabled coffee maker as the test case. We have crowd-sourced spoken utterances from 50 speakers. Each speaker contributed between 10 and 15 voice commands. Speakers have diverse accents and The corpus is gender-balanced. The utterances are recorded in quiet environments.

Noise

To simulate real-world environments, we mix utterances with noise before feeding them to NLU engines. The noise is mixed at various signal-to-noise (SNR) ratios to study the effect of noise level on the accuracy of voice assistant APIs.

Metrics

Command Acceptance Rate

The accuracy of a VUI can be measured in many ways including Precision, recall, F-score, and confusion matrices. We decided to use an intuitive metric which we call "Command Acceptance Rate". It is the percentage of voice commands that their intent and slots are correctly inferred by the voice assistant. Hence, an incorrect intent or even a single incorrect slot value is considered an error.

Operational Cost

If the voice interface of a coffee maker costs $10 a month, that coffee maker won't become a household item. Hence, we compare the annual operational cost of NLU engines per user. This cost for all NLU APIs is a function of the number of user interactions per day (i.e. usage) and is essentially uncapped as is common in Software-as-a-Service (SaaS) business models. This is different from Picovoice Rhino which offers unlimited voice interactions and predictable pricing.

Results

Command Acceptance Rate

The figure below shows the accuracy of each engine averaged over all utterances and all SNRs.

NLU accuracy comparisonNLU accuracy comparison

A more detailed view can be looking at how each engine copes with noise as is shown in SNR dependent figure below.

NLU accuracy comparison across different SNRsNLU accuracy comparison across different SNRs

Usage

The data and code used to create this benchmark are available on GitHub under the permissive Apache 2.0 license. Detailed instructions for benchmarking individual engines are provided in the following documents:

  • AWS Lex accuracy
  • GCP Dialogflow accuracy
  • IBM Watson accuracy
  • Azure LUIS accuracy
  • Picovoice Rhino accuracy

Different Use Cases

Voice assistants can do much more than brewing java. They are already used in IVR, customer service, sales, finance, healthcare, touchless interfaces, and many more verticals that can benefit from conversational AI. If you have a different use case you can still use this framework to benchmark different vendors to make a data-driven decision. All you need to do is to replace the data and labels in the GitHub repository, retrain the NLU engines for your domain of interest and simply run the benchmark again.

Was this doc helpful?

Issue with this doc?

Report a GitHub Issue
Natural Language Understanding Benchmark
  • Methodology
  • Speech Corpus
  • Noise
  • Metrics
  • Results
  • Command Acceptance Rate
  • Usage
  • Different Use Cases
Voice AI
  • Leopard Speech-to-Text
  • Cheetah Streaming Speech-to-Text
  • Orca Text-to-Speech
  • Koala Noise Suppression
  • Eagle Speaker Recognition
  • Falcon Speaker Diarization
  • Porcupine Wake Word
  • Rhino Speech-to-Intent
  • Cobra Voice Activity Detection
Local LLM
  • picoLLM Inference
  • picoLLM Compression
  • picoLLM GYM
Resources
  • Docs
  • Console
  • Blog
  • Use Cases
  • Playground
Sales & Services
  • Consulting
  • Foundation Plan
  • Enterprise Plan
  • Enterprise Support
Company
  • About us
  • Careers
Follow Picovoice
  • LinkedIn
  • GitHub
  • X
  • YouTube
  • AngelList
Subscribe to our newsletter
Terms of Use
Privacy Policy
© 2019-2025 Picovoice Inc.