Picovoice WordmarkPicovoice Console
Introduction
Introduction
AndroidC.NETFlutterlink to GoiOSJavaNvidia JetsonLinuxmacOSNodejsPythonRaspberry PiReact NativeRustWebWindows
AndroidC.NETFlutterlink to GoiOSJavaNodejsPythonReact NativeRustWeb
SummaryPicovoice LeopardAmazon TranscribeAzure Speech-to-TextGoogle ASRGoogle ASR (Enhanced)IBM Watson Speech-to-Text
FAQ
Introduction
AndroidC.NETFlutterlink to GoiOSJavaNodejsPythonReact NativeRustWeb
AndroidC.NETFlutterlink to GoiOSJavaNodejsPythonReact NativeRustWeb
FAQ
Introduction
AndroidCiOSLinuxmacOSPythonWebWindows
AndroidCiOSPythonWeb
SummaryOctopus Speech-to-IndexGoogle Speech-to-TextMozilla DeepSpeech
FAQ
Introduction
AndroidAngularArduinoBeagleBoneCChrome.NETEdgeFirefoxFlutterlink to GoiOSJavaNvidia JetsonLinuxmacOSMicrocontrollerNodejsPythonRaspberry PiReactReact NativeRustSafariUnityVueWebWindows
AndroidAngularC.NETFlutterlink to GoiOSJavaMicrocontrollerNodejsPythonReactReact NativeRustUnityVueWeb
SummaryPorcupineSnowboyPocketSphinx
Wake Word TipsFAQ
Introduction
AndroidAngularBeagleBoneCChrome.NETEdgeFirefoxFlutterlink to GoiOSJavaNvidia JetsonlinuxmacOSNodejsPythonRaspberry PiReactReact NativeRustSafariUnityVueWebWindows
AndroidAngularC.NETFlutterlink to GoiOSJavaNodejsPythonReactReact NativeRustUnityVueWeb
SummaryPicovoice RhinoGoogle DialogflowAmazon LexIBM WatsonMicrosoft LUIS
Expression SyntaxFAQ
Introduction
AndroidBeagleboneCiOSNvidia JetsonLinuxmacOSPythonRaspberry PiRustWebWindows
AndroidCiOSPythonRustWeb
SummaryPicovoice CobraWebRTC VAD
FAQ
Introduction
AndroidCiOSPythonWeb
AndroidCiOSPythonWeb
SummaryPicovoice KoalaMozilla RNNoise
Introduction
AndroidAngularArduinoBeagleBoneC.NETFlutterlink to GoiOSJavaNvidia JetsonMicrocontrollerNodejsPythonRaspberry PiReactReact NativeRustUnityVueWeb
AndroidAngularCMicrocontroller.NETFlutterlink to GoiOSJavaNodejsPythonReactReact NativeRustUnityVueWeb
Picovoice SDK - FAQ
IntroductionSTM32F407G-DISC1 (Arm Cortex-M4)STM32F411E-DISCO (Arm Cortex-M4)STM32F769I-DISCO (Arm Cortex-M7)IMXRT1050-EVKB (Arm Cortex-M7)
FAQGlossary

Natural Language Understanding Benchmark


Understanding voice command is the core functionality of a voice assistant. The dominant approach breaks this task into speech-to-text (STT) and natural language understanding (NLU). Amazon Lex , Google Dialogflow , IBM Watson , and Microsoft LUIS use this strategy. Picovoice’s Rhino Speech-to-Intent takes a different approach. It fuses these two steps and builds an end-to-end model that directly infers meaning, including intent and entities, from voice commands. We claim that this end-to-end approach gives a significant accuracy boost. Also, it massively reduces the operating cost of voicebots and voice user interfaces (VUI).

Below is a series of benchmarks to foster these claims and track their validity over time. The benchmark is open-source and reproducible.

Methodology

Speech Corpus

We consider a voice-enabled coffee maker as the test case. We have crowd-sourced spoken utterances from 50 speakers. Each speaker contributed between 10 and 15 voice commands. Speakers have diverse accents and The corpus is gender-balanced. The utterances are recorded in quiet environments.

Noise

To simulate real-world environments, we mix utterances with noise before feeding them to NLU engines. The noise is mixed at various signal-to-noise (SNR) ratios to study the effect of noise level on the accuracy of voice assistant APIs.

Metrics

Command Acceptance Rate

The accuracy of a VUI can be measured in many ways including Precision, recall, F-score, and confusion matrices. We decided to use an intuitive metric which we call “Command Acceptance Rate”. It is the percentage of voice commands that their intent and slots are correctly inferred by the voice assistant. Hence, an incorrect intent or even a single incorrect slot value is considered an error.

Operational Cost

If the voice interface of a coffee maker costs $10 a month, that coffee maker won't become a household item. Hence, we compare the annual operational cost of NLU engines per user. This cost for all NLU APIs is a function of the number of user interactions per day (i.e. usage) and is essentially uncapped as is common in Software-as-a-Service (SaaS) business models. This is different from Picovoice Rhino which offers unlimited voice interactions and predictable pricing.

Results

Command Acceptance Rate

The figure below shows the accuracy of each engine averaged over all utterances and all SNRs.

NLU accuracy comparisonNLU accuracy comparison

A more detailed view can be looking at how each engine copes with noise as is shown in SNR dependent figure below.

NLU accuracy comparison across different SNRsNLU accuracy comparison across different SNRs

Operational Cost

The figure below compares the operational cost of voice recognition engines as a function of the number of user interactions per day.

NLU operating cost comparisonNLU operating cost comparison

Usage

The data and code used to create this benchmark are available on GitHub under the permissive Apache 2.0 license. Detailed instructions for benchmarking individual engines are provided in the following documents:

  • AWS Lex accuracy
  • GCP Dialogflow accuracy
  • IBM Watson accuracy
  • Azure LUIS accuracy
  • Picovoice Rhino accuracy

Pricing data for speech recognition and NLU APIs are available on the providers’ websites:

  • AWS Lex pricing
  • GCP Dialogflow pricing
  • IBM Watson pricing
  • Azure LUIS pricing
  • Picovoice pricing

Different Use Cases

Voice assistants can do much more than brewing java. They are already used in IVR, customer service, sales, finance, healthcare, touchless interfaces, and many more verticals that can benefit from conversational AI. If you have a different use case you can still use this framework to benchmark different vendors to make a data-driven decision. All you need to do is to replace the data and labels in the GitHub repository, retrain the NLU engines for your domain of interest and simply run the benchmark again.

Was this doc helpful?

Issue with this doc?

Report a GitHub Issue
Natural Language Understanding Benchmark
  • Methodology
  • Speech Corpus
  • Noise
  • Metrics
  • Results
  • Command Acceptance Rate
  • Operational Cost
  • Usage
  • Different Use Cases
Platform
  • Leopard Speech-to-Text
  • Cheetah Streaming Speech-to-Text
  • Octopus Speech-to-Index
  • Porcupine Wake Word
  • Rhino Speech-to-Intent
  • Cobra Voice Activity Detection
Resources
  • Docs
  • Console
  • Blog
  • Demos
Sales
  • Pricing
  • Starter Tier
  • Enterprise
Company
  • Careers
Follow Picovoice
  • LinkedIn
  • GitHub
  • Twitter
  • Medium
  • YouTube
  • AngelList
Subscribe to our newsletter
Terms of Use
Privacy Policy
© 2019-2022 Picovoice Inc.