Picovoice WordmarkPicovoice Console
Introduction
Introduction
AndroidC.NETFlutterlink to GoiOSJavaNvidia JetsonLinuxmacOSNodejsPythonRaspberry PiReact NativeRustWebWindows
AndroidC.NETFlutterlink to GoiOSJavaNodejsPythonReact NativeRustWeb
SummaryPicovoice LeopardAmazon TranscribeAzure Speech-to-TextGoogle ASRGoogle ASR (Enhanced)IBM Watson Speech-to-Text
FAQ
Introduction
AndroidC.NETFlutterlink to GoiOSJavaNodejsPythonReact NativeRustWeb
AndroidC.NETFlutterlink to GoiOSJavaNodejsPythonReact NativeRustWeb
FAQ
Introduction
AndroidCiOSLinuxmacOSPythonWebWindows
AndroidCiOSPythonWeb
SummaryOctopus Speech-to-IndexGoogle Speech-to-TextMozilla DeepSpeech
FAQ
Introduction
AndroidAngularArduinoBeagleBoneCChrome.NETEdgeFirefoxFlutterlink to GoiOSJavaNvidia JetsonLinuxmacOSMicrocontrollerNodejsPythonRaspberry PiReactReact NativeRustSafariUnityVueWebWindows
AndroidAngularC.NETFlutterlink to GoiOSJavaMicrocontrollerNodejsPythonReactReact NativeRustUnityVueWeb
SummaryPorcupineSnowboyPocketSphinx
Wake Word TipsFAQ
Introduction
AndroidAngularBeagleBoneCChrome.NETEdgeFirefoxFlutterlink to GoiOSJavaNvidia JetsonlinuxmacOSNodejsPythonRaspberry PiReactReact NativeRustSafariUnityVueWebWindows
AndroidAngularC.NETFlutterlink to GoiOSJavaNodejsPythonReactReact NativeRustUnityVueWeb
SummaryPicovoice RhinoGoogle DialogflowAmazon LexIBM WatsonMicrosoft LUIS
Expression SyntaxFAQ
Introduction
AndroidBeagleboneCiOSNvidia JetsonLinuxmacOSPythonRaspberry PiRustWebWindows
AndroidCiOSPythonRustWeb
SummaryPicovoice CobraWebRTC VAD
FAQ
Introduction
AndroidCiOSPythonWeb
AndroidCiOSPythonWeb
SummaryPicovoice KoalaMozilla RNNoise
Introduction
AndroidAngularArduinoBeagleBoneC.NETFlutterlink to GoiOSJavaNvidia JetsonMicrocontrollerNodejsPythonRaspberry PiReactReact NativeRustUnityVueWeb
AndroidAngularCMicrocontroller.NETFlutterlink to GoiOSJavaNodejsPythonReactReact NativeRustUnityVueWeb
Picovoice SDK - FAQ
IntroductionSTM32F407G-DISC1 (Arm Cortex-M4)STM32F411E-DISCO (Arm Cortex-M4)STM32F769I-DISCO (Arm Cortex-M7)IMXRT1050-EVKB (Arm Cortex-M7)
FAQGlossary

Voice Activity Detection Benchmark


Voice activity detection (VAD) is the recognition of human speech within a stream of audio. Voice activity detection is one of the main building blocks of speech-enabled applications. VAD accuracy has a compounding effect on the system performance as many downstream speech processing blocks depend on it. VoIP, IVR, telemarketing, and security systems incorporate voice activity detection.

Picovoice’s Cobra VAD engine is a cross-platform, extremely efficient, and real-time VAD that achieves best-in-class accuracy. Below is a series of benchmarks to validate the accuracy claims.

Methodology

Engines

We compare the accuracy of Cobra with the voice activity detector used in WebRTC . The VAD that Google developed for the WebRTC project is reportedly one of the best.

Speech Corpus

We use LibriSpeech (test-clean portion) as the speech corpus. It provides a diverse number of speakers and is gender-balanced.

Noise

The real challenge in building a performant VAD is resilience to noise. To test out the effect of noise, we mix noise with speech data before feeding it to VAD engines. For this purpose, we use the DEMAND dataset that contains noise recordings in diverse environments.

Metric

We use the receiver operator characteristics curve. The ROC curve is a known tool for inspecting the performance of binary classifiers across different decision thresholds. It allows the designer to study the interplay of detection rate vs false positive rate.

Results

VAD accuracy comparisonVAD accuracy comparison

Usage

The data and code used to create this benchmark are available on GitHub under the permissive Apache 2.0 license. Detailed instructions for benchmarking individual engines are in the following documents:

  • Picovoice Cobra
  • WebRTC VAD

Was this doc helpful?

Issue with this doc?

Report a GitHub Issue
Voice Activity Detection Benchmark
  • Methodology
  • Engines
  • Speech Corpus
  • Noise
  • Metric
  • Results
  • Usage
Platform
  • Leopard Speech-to-Text
  • Cheetah Streaming Speech-to-Text
  • Octopus Speech-to-Index
  • Porcupine Wake Word
  • Rhino Speech-to-Intent
  • Cobra Voice Activity Detection
Resources
  • Docs
  • Console
  • Blog
  • Demos
Sales
  • Pricing
  • Starter Tier
  • Enterprise
Company
  • Careers
Follow Picovoice
  • LinkedIn
  • GitHub
  • Twitter
  • Medium
  • YouTube
  • AngelList
Subscribe to our newsletter
Terms of Use
Privacy Policy
© 2019-2022 Picovoice Inc.